AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 135
An ML engineer wants to use a set of survey responses as training data for an ML classifier. All the survey responses are either “yes” or “no.”
The ML engineer needs to convert the responses into a feature that will produce better model training results. The ML engineer must not increase the dimensionality of the dataset.
Which methods will meet these requirements? (Choose two.)
Answer options
- A. Binary encoding
- B. Label encoding
- C. One-hot encoding
- D. Statistical imputation
- E. Tokenization
Correct answer: A, B
Explanation
Binary encoding and label encoding are suitable methods for converting categorical data into numeric format without increasing dimensionality, which is essential for this scenario. One-hot encoding, on the other hand, would increase dimensionality as it creates additional binary columns for each category, while statistical imputation and tokenization do not specifically address the need to encode the binary responses.