An organization is developing a feature repository and is electing to one-hot encode all…

Question

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?

Accepted Answer

Correct answer: E. E. One-hot encoding is a potentially problematic categorical variable strategy for some machine learning algorithms. — The correct answer is E because one-hot encoding can create a high-dimensional feature space that some algorithms struggle with, potentially leading to overfitting or poor performance. Options A, B, C, and D present inaccuracies or misunderstandings about one-hot encoding and its application in machine learning. For instance, one-hot encoding is widely supported (A), is not inherently tied to the target variable's values (B), and is commonly used in various contexts (D).

Databricks Certified Machine Learning Associate — Question 8

Answer options

Correct answer: E

Explanation