Databricks Certified Machine Learning Associate — Question 8

An organization is developing a feature repository and is electing to one-hot encode all categorical feature variables. A data scientist suggests that the categorical feature variables should not be one-hot encoded within the feature repository.
Which of the following explanations justifies this suggestion?

Answer options

Correct answer: E

Explanation

The correct answer is E because one-hot encoding can create a high-dimensional feature space that some algorithms struggle with, potentially leading to overfitting or poor performance. Options A, B, C, and D present inaccuracies or misunderstandings about one-hot encoding and its application in machine learning. For instance, one-hot encoding is widely supported (A), is not inherently tied to the target variable's values (B), and is commonly used in various contexts (D).