Журнал «Современная Наука»

Russian (CIS)English (United Kingdom)
MOSCOW +7(495)-142-86-81

The methodology of modular pipeline data processing based on Spark SQL and Spark MLlib with the integration of programming languages

Monastyrev Vitaly Victorovich  (Graduate student, Peter the Great St. Petersburg Polytechnic University )

Molodyakov Sergey Aleksandrovich  (Doctor of technical Sciences, Professor, Peter the Great St. Petersburg Polytechnic University)

A methodology for constructing a data processing architecture based on Spark SQL and Spark MLlib with the possibility of integrating various programming languages is proposed. Thanks to the use of such an architecture, it is possible to modularly build the data processing process, where each step is a separate and independent part that can be added or removed from the processing process. An example of conveyor-modular processing is presented. A processing pipeline is organized using Spark MLlib. Spark SQL is used to organize queries and to process data. The structure of its processing classes is considered in Scala using the Transform and Estimator base classes of the Spark MLlib library. An example of a processing pipeline is given, which begins with data preparation and ends with training a machine learning model. In the Python language, an example of the implementation of the code of the model to which the conversion takes place directly from the pipeline is presented. The possibility of implementing data processing in one language and model training in another is shown.

Keywords:big data, machine learning, Spark, pipeline, Spark SQL, Spark MLlib.

 

Read the full article …



Citation link:
Monastyrev V. V., Molodyakov S. A. The methodology of modular pipeline data processing based on Spark SQL and Spark MLlib with the integration of programming languages // Современная наука: актуальные проблемы теории и практики. Серия: Естественные и Технические Науки. -2022. -№06/2. -С. 119-124 DOI 10.37882/2223-2966.2022.06-2.26
LEGAL INFORMATION:
Reproduction of materials is permitted only for non-commercial purposes with reference to the original publication. Protected by the laws of the Russian Federation. Any violations of the law are prosecuted.
© ООО "Научные технологии"