You work on a regression problem in a natural language processing domain, and you have 10…

Question

You work on a regression problem in a natural language processing domain, and you have 100M labeled examples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?

Accepted Answer

Correct answer: D. D. Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used. — The correct answer is D because increasing the complexity of the model can help it learn more intricate patterns in the data, which is necessary when the training performance is poor. Option A is not effective as it doesn't address the model's learning capability, B may help but is not an immediate solution, and C is focused on reducing overfitting rather than addressing the underperformance on the training set.

Google Cloud Professional Data Engineer — Question 113

Answer options

Correct answer: D

Explanation