AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 178

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The ML engineer is using a dataset that contains two string variables: location and job_seniority_level. The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values.

The ML engineer must perform preprocessing on the variables.

Which solution will meet this requirement?

Answer options

Correct answer: B

Explanation

Option B is correct because one-hot encoding is suitable for the location variable with a small number of distinct values, allowing it to be represented as binary columns. Ordinal encoding is appropriate for job_seniority_level since it has more than 10 distinct values and can maintain a meaningful order. The other options either use inappropriate encoding methods or scaling methods that do not fit the nature of the categorical data.