AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 144
An ML engineer wants to use, prepare, and load data from Amazon S3 for analytics. The ML engineer must run an extract, transform, and load (ETL) job to discover the schema of the data and to store the metadata.
Which solution will meet these requirements with the LEAST manual effort?
Answer options
- A. Use AWS Glue to run the ETL job. Use the job to discover the schema and to store the associated metadata in the AWS Glue Data Catalog.
- B. Create an Amazon SageMaker Data Wrangler flow to run the ETL job. Use the job to discover the schema and to store the associated metadata in an S3 bucket.
- C. Create an ETL pipeline by using Amazon Athena integrated with AWs Step Functions. Use the pipeline to run the ETL job to discover the schema and to store the associated metadata in an S3 bucket.
- D. Launch an Amazon EC2 instance that includes the scikit-learn library to run the ETL job. Use the job to discover the schema and to store the associated metadata in Amazon Redshift.
Correct answer: A
Explanation
The correct answer is A because AWS Glue is specifically designed for ETL processes and can automatically discover schemas and store metadata in the AWS Glue Data Catalog with minimal manual input. The other options may require more setup or do not utilize the most efficient AWS services for this task, leading to increased manual effort.