AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 50
A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.
What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?
Answer options
- A. Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.
- B. Create an Apache Spark job that uses a custom processing script on Amazon EMR.
- C. Create a SageMaker processing job by calling the SageMaker Python SDK.
- D. Create a data flow in SageMaker Data Wrangler. Configure a transform step.
Correct answer: D
Explanation
The correct answer is D because SageMaker Data Wrangler provides a user-friendly interface that simplifies data preprocessing tasks, such as dropping unnecessary columns, with minimal effort. The other options, while valid, involve more complex setups or manual coding that require additional effort and time.