AWS Certified Data Analytics – Specialty — Question 9

A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
✑ Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
✑ One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data? (Choose three.)

Answer options

Correct answer: A, D, E

Explanation

Option A is correct because AWS Glue crawlers can automatically scan data in S3 and infer the schema, which is essential for daily transformations. Option D is also correct as AWS Glue workflows and jobs are designed to handle ETL processes efficiently for scheduled data. Option E is accurate since Amazon EMR is well-suited for processing large volumes of archived data. Options B and C are less cost-effective for the described daily data transformation needs, and F is not appropriate for the requirements stated.