AWS Certified Machine Learning – Specialty — Question 159

A global financial company is using machine learning to automate its loan approval process. The company has a dataset of customer information. The dataset contains some categorical fields, such as customer location by city and housing status. The dataset also includes financial fields in different units, such as account balances in US dollars and monthly interest in US cents.
The company's data scientists are using a gradient boosting regression model to infer the credit score for each customer. The model has a training accuracy of
99% and a testing accuracy of 75%. The data scientists want to improve the model's testing accuracy.
Which process will improve the testing accuracy the MOST?

Answer options

Correct answer: A

Explanation

Option A is correct because one-hot encoding and standardization are effective techniques for handling categorical and financial variables, respectively, which can lead to improved model performance. The other options either use inappropriate methods for the data types or do not address the necessary transformations that can enhance model accuracy.