Databricks Certified Associate Developer for Apache Spark — Question 183

A data engineer has been asked to produce a Parquet table which is overwritten every day with the latest data. The downstream consumer of this Parquet table has a hard requirement that the data in this table is produced with all records sorted by the market_time field.

Which line of Spark code will produce a Parquet table that meets these requirements?

Answer options

Correct answer: C

Explanation

The correct answer is C because it sorts the records by market_time and uses coalesce(1) to ensure all data is written to a single file. Options A and B do not use coalesce(1), which could lead to multiple output files, and option D sorts only within partitions, which does not guarantee the overall sort order required for the final Parquet table.