Designing and Implementing Enterprise-Scale Analytics Using Microsoft Azure and Power BI — Question 91
You are creating an external table by using an Apache Spark pool in Azure Synapse Analytics. The table will contain more than 20 million rows partitioned by date. The table will be shared with the SQL engines.
You need to minimize how long it takes for a serverless SQL pool to execute a query data against the table.
In which file format should you recommend storing the table data?
Answer options
- A. CSV
- B. Delta
- C. JSON
- D. Apache Parquet
Correct answer: D
Explanation
The Apache Parquet format is optimized for performance and supports efficient data compression and encoding schemes, making it ideal for large datasets and reducing query execution time. CSV and JSON formats do not provide the same level of performance optimization, and while Delta is useful for certain scenarios, Parquet is generally more efficient for read-heavy operations like querying in serverless SQL pools.