Google Cloud Professional Machine Learning Engineer — Question 75
You need to train a natural language model to perform text classification on product descriptions that contain millions of examples and 100,000 unique words. You want to preprocess the words individually so that they can be fed into a recurrent neural network. What should you do?
Answer options
- A. Create a hot-encoding of words, and feed the encodings into your model.
- B. Identify word embeddings from a pre-trained model, and use the embeddings in your model.
- C. Sort the words by frequency of occurrence, and use the frequencies as the encodings in your model.
- D. Assign a numerical value to each word from 1 to 100,000 and feed the values as inputs in your model.
Correct answer: B
Explanation
The correct answer is B because using pre-trained word embeddings allows the model to leverage semantic relationships between words, improving classification performance. Options A and D create sparse representations that may not capture meaning effectively, while C focuses only on frequency, ignoring context and relationships.