AWS Certified Machine Learning – Specialty — Question 328
A bank has collected customer data for 10 years in CSV format. The bank stores the data in an on-premises server. A data science team wants to use Amazon SageMaker to build and train a machine learning (ML) model to predict churn probability. The team will use the historical data. The data scientists want to perform data transformations quickly and to generate data insights before the team builds a model for production.
Which solution will meet these requirements with the LEAST development effort?
Answer options
- A. Upload the data into the SageMaker Data Wrangler console directly. Perform data transformations and generate insights within Data Wrangler.
- B. Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the S3 bucket into SageMaker Data Wrangler. Perform data transformations and generate insights within Data Wrangler.
- C. Upload the data into the SageMaker Data Wrangler console directly. Allow SageMaker and Amazon QuickSight to access the data that is in an Amazon S3 bucket. Perform data transformations in Data Wrangler and save the transformed data into a second S3 bucket. Use QuickSight to generate data insights.
- D. Upload the data into an Amazon S3 bucket. Allow SageMaker to access the data that is in the bucket. Import the data from the bucket into SageMaker Data Wrangler. Perform data transformations in Data Wrangler. Save the data into a second S3 bucket. Use a SageMaker Studio notebook to generate data insights.
Correct answer: B
Explanation
SageMaker Data Wrangler does not support direct local file uploads to its console for large datasets, requiring the data to first be placed in a supported storage service like Amazon S3. Option B is correct because SageMaker Data Wrangler provides built-in data visualization and insight generation tools, which eliminates the need for additional development work like creating Amazon QuickSight dashboards or writing custom visualization code in SageMaker Studio notebooks.