A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The comp…

Question

A company stores datasets in JSON format and .csv format in an Amazon S3 bucket. The company has Amazon RDS for Microsoft SQL Server databases, Amazon DynamoDB tables that are in provisioned capacity mode, and an Amazon Redshift cluster. A data engineering team must develop a solution that will give data scientists the ability to query all data sources by using syntax similar to SQL.
Which solution will meet these requirements with the LEAST operational overhead?

Accepted Answer

Correct answer: A. A. Use AWS Glue to crawl the data sources. Store metadata in the AWS Glue Data Catalog. Use Amazon Athena to query the data. Use SQL for structured data sources. Use PartiQL for data that is stored in JSON format. — The correct answer is A because it utilizes AWS Glue to catalog the data and Amazon Athena to query it, which allows for SQL-like queries with minimal setup and maintenance. Option B introduces Redshift Spectrum, which adds complexity and operational overhead. Option C involves transforming data to different formats, increasing operational tasks, while option D requires additional setup for Lake Formation, making it less efficient than option A.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 58

Answer options

Correct answer: A

Explanation