AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 157
A company needs to analyze a large dataset that is stored in Amazon S3 in Apache Parquet format. The company wants to use one-hot encoding for some of the columns.
The company needs a no-code solution to transform the data. The solution must store the transformed data back to the same S3 bucket for model training.
Which solution will meet these requirements?
Answer options
- A. Configure an AWS Glue DataBrew project that connects to the data. Use the DataBrew interactive interface to create a recipe that performs the one-hot encoding transformation. Create a job to apply the transformation and to write the output back to an S3 bucket.
- B. Configure an AWS Glue Data Catalog table that points to the data. Use Amazon Athena to write SQL commands to perform the one-hot encoding transformation. Configure Athena to write the query results back to an S3 bucket.
- C. Configure an AWS Glue Data Catalog table that points to the data. Create an AWS Glue ETL interactive notebook. Use the notebook to perform the one-hot encoding transformation. Run the configured cells and write the results back to an S3 bucket.
- D. Configure an Amazon Redshift cluster to access the data by using Redshift Spectrum. Use SQL commands to perform the one-hot encoding transformation within Amazon Redshift. Configure Amazon Redshift to write the results back to an S3 bucket.
Correct answer: A
Explanation
The correct answer is A because AWS Glue DataBrew provides a no-code solution that allows users to perform data transformations, including one-hot encoding, through an interactive interface. Option B requires SQL coding, while C uses a notebook that involves coding as well, and D relies on Redshift, which does not meet the no-code requirement.