Scientific journal

ISSN 1814-2400

INFORMATION SCIENCE AND CONTROL SYSTEMS

Grigor’ev Yu.A., Plutenko A.D.

ANALYSIS OF TABLE JOIN EXECUTION TIME IN A PARALLEL ROW-STORAGE DATABASE SYSTEM USING MAPREDUCE MODEL

Analysis of existing research work indicates that preference for implementation of queries to structured data is given to parallel DBMS. MapReduce (MR) is perceived as supplementary to DBMS technology. We attempt to figure out behavior pattern of parallel row-storage DBMS and MR system Hadoop on the example of Join task depending on the variation of the parameters that in other authors’ experiments do not vary or differ from ours. Previously, we have developed process models for table joins in the parallel row-storage DBMS and MR-system. This article presents the results of experiments performed on these models. The models were set up for various scalability schemes for MR (number of nodes) and DMBS (data volume in a node) and fragmentation of the joined tables by the primary key. The following parameters were varied: queried data selectivity, number of sorted resulting records and cardinality of the grouping attribute. The modeling results showed that with the increase of the stored data volume parallel DBMS starts losing against MR-system at certain thresholds.

Keywords: DBMS, SQL, MapReduce technology, table join request, query execution time estimate, execution time comparison