An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether…

Question

An ecommerce company has developed a XGBoost model in Amazon SageMaker to predict whether a customer will return a purchased item. The dataset is imbalanced. Only 5% of customers return items. A data scientist must find the hyperparameters to capture as many instances of returned items as possible. The company has a small budget for compute. How should the data scientist meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: B. B. Tune the csv_weight hyperparameter and the scale_pos_weight hyperparameter by using automatic model tuning (AMT). Optimize on {"HyperParameterTuningJobObjective": {"MetricName": "validation'll", "Type": "Maximize"}}. — Tuning only a select few hyperparameters like scale_pos_weight and csv_weight is much more cost-effective than tuning all hyperparameters because it reduces the search space and requires fewer training runs. Furthermore, maximizing validation AUC (which the intended metric in option B represents) is ideal for imbalanced datasets, whereas accuracy is a poor metric for imbalanced data, and minimizing F1 would yield sub-optimal model performance.

AWS Certified Machine Learning – Specialty — Question 308

Answer options

Correct answer: B

Explanation