MODEL OF QUERIES EXECUTION PROCESSES TO DATA WAREHOUSE ON THE PARALLEL COMPUTING PLATFORM SPARK
The execution analysis of SQL queries Q3, Q17 from the TPC-H test in the Spark environment has been performed. A mathematical model based on the analysis of these processes has been developed in order to estimate the time needed to execute queries to the data warehouse for the method of Bloom Filter Cascade Application (BFCA). Based on the results of the full-scale experiments, the parameters of the developed model have been calibrated and its adequacy has been analyzed.
Keywords: SQL, Apache Spark, Bloom filter, TPC-H, Big Data, modeling