[PDR15] A Theoretical and Experimental Comparison of Filter-based Equijoins in MapReduce
Revue Internationale avec comité de lecture :
Journal Large-scale Data and Knowledge-Centered Systems (TLDKS),
pp. 41-80,
2015
Mots clés: map reduce, big data, join
Résumé:
MapReduce has become an increasingly popular framework
for large-scale data processing. However, complex operations such as
\emph{joins} are quite expensive and require sophisticated techniques.
In this paper, we review state-of-the-art strategies for joining several
relations in a MapReduce environment and study their extension with \emph{filter-based
approaches}. The general objective of filters is to eliminate non-matching data as early as possible
in order to
reduce the I/O, communication and CPU costs. We examine the impact of systematically
adding filters as early as possible in MapReduce join algorithms,
both analytically with cost models and practically with evaluations.
The study covers binary joins,
multi-way joins and recursive joins, and addresses the case of large
inputs that gives rise to the most intricate challenges.