AWS Certified Machine Learning – Specialty — Question 366

A company distributes an online multiple-choice survey to several thousand people. Respondents to the survey can select multiple options for each question.

A machine learning (ML) engineer needs to comprehensively represent every response from all respondents in a dataset. The ML engineer will use the dataset to train a logistic regression model.

Which solution will meet these requirements?

Answer options

Correct answer: A

Explanation

One-hot encoding is the correct technique here because it converts categorical multi-select survey responses into binary features (0 or 1) for each possible option, which is ideal for logistic regression. Binning is used to group continuous numerical values into intervals, making it unsuitable for this categorical data. Amazon Mechanical Turk and Amazon Textract are incorrect because they are used for human-in-the-loop labeling and OCR document processing, respectively, rather than standard tabular feature engineering.