AWS Certified Machine Learning – Specialty — Question 301
A company wants to forecast the daily price of newly launched products based on 3 years of data for older product prices, sales, and rebates. The time-series data has irregular timestamps and is missing some values.
Data scientist must build a dataset to replace the missing values. The data scientist needs a solution that resamples the data daily and exports the data for further modeling.
Which solution will meet these requirements with the LEAST implementation effort?
Answer options
- A. Use Amazon EMR Serverless with PySpark.
- B. Use AWS Glue DataBrew.
- C. Use Amazon SageMaker Studio Data Wrangler.
- D. Use Amazon SageMaker Studio Notebook with Pandas.
Correct answer: C
Explanation
Amazon SageMaker Studio Data Wrangler provides built-in, low-code transformations specifically designed for time-series data, allowing users to easily resample data and impute missing values with minimal effort. While options like PySpark on EMR, AWS Glue DataBrew, or Pandas in a SageMaker Notebook can accomplish these tasks, they require writing custom code or building complex pipelines, leading to higher implementation effort.