A data scientist needs to create a model for predictive maintenance. The model will be ba…

Question

A data scientist needs to create a model for predictive maintenance. The model will be based on historical data to identify rare anomalies in the data. The historical data is stored in an Amazon S3 bucket. The data scientist needs to use Amazon SageMaker Data Wrangler to ingest the data. The data scientist also needs to perform exploratory data analysis (EDA) to understand the statistical properties of the data. Which solution will meet these requirements with the LEAST amount of compute resources?

Accepted Answer

Correct answer: C. C. Import the data by using the First K option. Infer the value of K from domain knowledge. — Using the 'First K' sampling option in Amazon SageMaker Data Wrangler is the most resource-efficient method because it loads only the initial block of rows, avoiding the full dataset scans required by Randomized or Stratified options. The 'None' option imports the entire dataset, which consumes the most compute resources. Determining K based on domain knowledge ensures the sample is large enough for exploratory data analysis without over-consuming resources.

AWS Certified Machine Learning – Specialty — Question 332

Answer options

Correct answer: C

Explanation