Databricks Certified Associate Developer for Apache Spark — Question 200

A data analyst at an e-commerce company needs to process daily sales data. The data consists of approximately 50,000 records stored in a single CSV file, totaling about 20 MB. The analyst needs to perform aggregations and generate a summary report.

Which approach could the data analyst use in this situation?

Answer options

Correct answer: B

Explanation

The correct answer is B because using a local Python script with the pandas library is efficient for analyzing a manageable CSV file size of 20 MB. Options A, C, and D suggest more complex solutions that are unnecessary for the volume of data, as they are better suited for larger datasets or real-time processing needs.