An ML engineer is building a logistic regression model to predict customer churn for subs…

Question

An ML engineer is building a logistic regression model to predict customer churn for subscription services. The ML engineer is using a dataset that contains two string variables: location and job_seniority_level. The location variable has 3 distinct values, and the job_seniority_level variable has over 10 distinct values. The ML engineer must perform preprocessing on the variables. Which solution will meet this requirement?

Accepted Answer

Correct answer: B. B. Apply one-hot encoding to location. Apply ordinal encoding to job_seniority_level — Option B is correct because one-hot encoding is suitable for the location variable with a small number of distinct values, allowing it to be represented as binary columns. Ordinal encoding is appropriate for job_seniority_level since it has more than 10 distinct values and can maintain a meaningful order. The other options either use inappropriate encoding methods or scaling methods that do not fit the nature of the categorical data.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 178

Answer options

Correct answer: B

Explanation