Google Cloud Professional Machine Learning Engineer — Question 331

You are creating a deep neural network classification model using a dataset with categorical input values. Certain columns have a cardinality greater than 10,000 unique values. How should you encode these categorical values as input into the model?

Answer options

Correct answer: B

Explanation

One-hot hash bucketing is effective for handling high cardinality categorical features as it reduces dimensionality while preserving information. Converting to integer values (A) could lead to ordinal relationships that don't exist, and mapping to boolean vectors (C) may not capture the distinct categories effectively. Run-length encoding (D) is not suitable for categorical data as it is primarily used for compressing sequences.