Data Engineering on Microsoft Azure — Question 139
You are designing a folder structure for the files in an Azure Data Lake Storage Gen2 account. The account has one container that contains three years of data.
You need to recommend a folder structure that meets the following requirements:
✑ Supports partition elimination for queries by Azure Synapse Analytics serverless SQL pools
✑ Supports fast data retrieval for data from the current month
✑ Simplifies data security management by department
Which folder structure should you recommend?
Answer options
- A. \Department\DataSource\YYYY\MM\DataFile_YYYYMMDD.parquet
- B. \DataSource\Department\YYYYMM\DataFile_YYYYMMDD.parquet
- C. \DD\MM\YYYY\Department\DataSource\DataFile_DDMMYY.parquet
- D. \YYYY\MM\DD\Department\DataSource\DataFile_YYYYMMDD.parquet
Correct answer: A
Explanation
Option A is the correct answer because it organizes the data by department and time, which aids in partition elimination and fast retrieval of current month data. The other options either do not prioritize departmental organization or do not support efficient querying through the specified time structure, making them less suitable for the requirements.