AWS Certified Data Analytics – Specialty — Question 9
A media company wants to perform machine learning and analytics on the data residing in its Amazon S3 data lake. There are two data transformation requirements that will enable the consumers within the company to create reports:
✑ Daily transformations of 300 GB of data with different file formats landing in Amazon S3 at a scheduled time.
✑ One-time transformations of terabytes of archived data residing in the S3 data lake.
Which combination of solutions cost-effectively meets the company's requirements for transforming the data? (Choose three.)
Answer options
- A. For daily incoming data, use AWS Glue crawlers to scan and identify the schema.
- B. For daily incoming data, use Amazon Athena to scan and identify the schema.
- C. For daily incoming data, use Amazon Redshift to perform transformations.
- D. For daily incoming data, use AWS Glue workflows with AWS Glue jobs to perform transformations.
- E. For archived data, use Amazon EMR to perform data transformations.
- F. For archived data, use Amazon SageMaker to perform data transformations.
Correct answer: A, D, E
Explanation
Option A is correct because AWS Glue crawlers can automatically scan data in S3 and infer the schema, which is essential for daily transformations. Option D is also correct as AWS Glue workflows and jobs are designed to handle ETL processes efficiently for scheduled data. Option E is accurate since Amazon EMR is well-suited for processing large volumes of archived data. Options B and C are less cost-effective for the described daily data transformation needs, and F is not appropriate for the requirements stated.