Category: Data Engineering
-
Avro vs Parquet: Comparing Two Popular Data Storage Formats
Introduction When it comes to storing and processing large volumes of data, choosing the right data storage format is crucial. Two popular formats that often come up in discussions are Avro and Parquet. In this article, we’ll compare Avro and Parquet, examining their features, advantages, limitations, and use cases. So, let’s dive in and explore…
-

ORC vs RC vs Parquet vs Avro: A Comprehensive Comparison of Popular Data Storage Formats
When it comes to big data processing, selecting the right file format is crucial. The file format chosen affects the performance, storage, and processing of the data. Four popular file formats for big data storage and processing are ORC, RC, Parquet, and Avro. In this blog, we will compare these file formats, their advantages and…
-
How to install Spark on mac
Apache Spark is a distributed computing framework used for processing large-scale data. It can be used to perform analytics, machine learning, and data processing tasks. In this blog, we will walk through the steps to install Spark on a Mac. Step 1: Install Java Spark requires Java 8 or later to be installed on your…
-
Introduction to Apache Spark
Apache Spark is a distributed computing system that can process large amounts of data efficiently and quickly. The project was developed by the Apache Software Foundation in 2009 at UC Berkeley’s AMPLab with the aim of improving the performance of Hadoop MapReduce, the then-popular big data processing framework. However, as the project progressed, Spark emerged…
-
How to read csv with spark
To read a CSV file in Spark, you can use the read method of the SparkSession object, which is the entry point to Spark’s SQL functionality. Here is an example code snippet: In this example, we are using the format method to specify that the file is in CSV format, and the option method to…
-

Data Engineering for Digital Transformation: Strategies and Best Practices
As a seasoned data engineering professional, I have seen the power of data and its ability to drive digital transformation. In today’s world, data is king, and businesses that effectively manage and leverage data are the ones that succeed. However, data management is not an easy feat, and it requires a strategic approach to ensure…
