AWS Certified Machine Learning – Specialty — Question 205
A company is building an application that can predict spam email messages based on email text. The company can generate a few thousand human-labeled datasets that contain a list of email messages and a label of "spam" or "not spam" for each email message. A machine learning (ML) specialist wants to use transfer learning with a Bidirectional Encoder Representations from Transformers (BERT) model that is trained on English Wikipedia text data.
What should the ML specialist do to initialize the model to fine-tune the model with the custom data?
Answer options
- A. Initialize the model with pretrained weights in all layers except the last fully connected layer.
- B. Initialize the model with pretrained weights in all layers. Stack a classifier on top of the first output position. Train the classifier with the labeled data.
- C. Initialize the model with random weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
- D. Initialize the model with pretrained weights in all layers. Replace the last fully connected layer with a classifier. Train the classifier with the labeled data.
Correct answer: D
Explanation
The correct answer is D because initializing the model with pretrained weights in all layers allows it to leverage the knowledge gained from the large dataset it was trained on, making it more effective when fine-tuning for the specific task. Replacing the last fully connected layer with a classifier is necessary to adapt the model for the new output labels. Options A and B do not fully utilize the pretrained model's capabilities, and option C starting with random weights would not be effective for transfer learning.