AWS Certified Data Analytics – Specialty — Question 31

A company is planning to do a proof of concept for a machine learning (ML) project using Amazon SageMaker with a subset of existing on-premises data hosted in the company's 3 TB data warehouse. For part of the project, AWS Direct Connect is established and tested. To prepare the data for ML, data analysts are performing data curation. The data analysts want to perform multiple step, including mapping, dropping null fields, resolving choice, and splitting fields. The company needs the fastest solution to curate the data for this project.
Which solution meets these requirements?

Answer options

Correct answer: C

Explanation

The correct answer is C because using AWS DMS to ingest data into Amazon S3 combined with AWS Glue for data curation is efficient and takes advantage of fully managed services optimized for these tasks. Option A involves additional complexity with Apache Spark and EMR, which may not be as fast. Option B requires custom ETL jobs on-premises, which can be slower and less scalable. Option D is less efficient due to the manual backup and shipping process, resulting in longer delays.