online read us now
Paper details
Number 4 - December 2017
Volume 27 - 2017
Efficient storage, retrieval and analysis of poker hands: An adaptive data framework
Marcin Gorawski, Michal Lorek
Abstract
In online gambling, poker hands are one of the most popular and fundamental units of the game state and can be considered
objects comprising all the events that pertain to the single hand played. In a situation where tens of millions of poker hands
are produced daily and need to be stored and analysed quickly, the use of relational databases no longer provides high
scalability and performance stability. The purpose of this paper is to present an efficient way of storing and retrieving
poker hands in a big data environment. We propose a new, read-optimised storage model that offers significant data
access improvements over traditional database systems as well as the existing Hadoop file formats such as ORC, RCFile
or SequenceFile. Through index-oriented partition elimination, our file format allows reducing the number of file splits
that needs to be accessed, and improves query response time up to three orders of magnitude in comparison with other
approaches. In addition, our file format supports a range of new indexing structures to facilitate fast row retrieval at a
split level. Both index types operate independently of the Hive execution context and allow other big data computational
frameworks such as MapReduce or Spark to benefit from the optimized data access path to the hand information. Moreover,
we present a detailed analysis of our storage model and its supporting index structures, and how they are organised in the
overall data framework. We also describe in detail how predicate based expression trees are used to build effective file-level
execution plans. Our experimental tests conducted on a production cluster, holding nearly 40 billion hands which span
over 4000 partitions, show that multi-way partition pruning outperforms other existing file formats, resulting in faster query
execution times and better cluster utilisation.
Keywords
big data, storage model design, data architecture, data access path optimization