AWS Certified Machine Learning – Specialty — Question 366
A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question.
A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model.
Which solution will meet these requirements?
Answer options
- A. Perform one-hot encoding on every possible option for each question of the survey.
- B. Perform binning on all the answers each respondent selected for each question.
- C. Use Amazon Mechanical Turk to create categorical labels for each set of possible responses.
- D. Use Amazon Textract to create numeric features for each set of possible responses.
Correct answer: A
Explanation
One-hot encoding is the correct technique here because it converts categorical multi-select survey responses into binary features (0 or 1) for each possible option, which is ideal for logistic regression. Binning is used to group continuous numerical values into intervals, making it unsuitable for this categorical data. Amazon Mechanical Turk and Amazon Textract are incorrect because they are used for human-in-the-loop labeling and OCR document processing, respectively, rather than standard tabular feature engineering.