Data Engineering on Microsoft Azure — Question 38
You have an Azure Synapse Analytics Apache Spark pool named Pool1.
You plan to load JSON files from an Azure Data Lake Storage Gen2 container into the tables in Pool1. The structure and data types vary by file.
You need to load the files into the tables. The solution must maintain the source data types.
What should you do?
Answer options
- A. Use a Conditional Split transformation in an Azure Synapse data flow.
- B. Use a Get Metadata activity in Azure Data Factory.
- C. Load the data by using the OPENROWSET Transact-SQL command in an Azure Synapse Analytics serverless SQL pool.
- D. Load the data by using PySpark.
Correct answer: D
Explanation
The correct answer is D because PySpark allows you to efficiently load JSON files while maintaining their original data types, handling the varying structures effectively. Options A and B do not directly facilitate the loading process with type preservation, and option C involves a serverless SQL pool, which may not support the required data type integrity for JSON files.