Data Engineering on Microsoft Azure — Question 28

You are implementing a batch dataset in the Parquet format.
Data files will be produced be using Azure Data Factory and stored in Azure Data Lake Storage Gen2. The files will be consumed by an Azure Synapse Analytics serverless SQL pool.
You need to minimize storage costs for the solution.
What should you do?

Answer options

Correct answer: A

Explanation

Using Snappy compression for the files effectively reduces storage costs while maintaining a good balance between compression speed and decompression efficiency. The other options do not directly address storage optimization; OPENROWSET and creating an external table focus on data access rather than storage, and storing all data as strings can actually increase the size of the files, leading to higher costs.