AWS Certified Machine Learning – Specialty — Question 178
A machine learning (ML) specialist needs to extract embedding vectors from a text series. The goal is to provide a ready-to-ingest feature space for a data scientist to develop downstream ML predictive models. The text consists of curated sentences in English. Many sentences use similar words but in different contexts. There are questions and answers among the sentences, and the embedding space must differentiate between them.
Which options can produce the required embedding vectors that capture word context and sequential QA information? (Choose two.)
Answer options
- A. Amazon SageMaker seq2seq algorithm
- B. Amazon SageMaker BlazingText algorithm in Skip-gram mode
- C. Amazon SageMaker Object2Vec algorithm
- D. Amazon SageMaker BlazingText algorithm in continuous bag-of-words (CBOW) mode
- E. Combination of the Amazon SageMaker BlazingText algorithm in Batch Skip-gram mode with a custom recurrent neural network (RNN)
Correct answer: A, C
Explanation
The Amazon SageMaker seq2seq algorithm is effective for generating embeddings that capture the sequential nature of text, making it suitable for differentiating contexts. Similarly, the Amazon SageMaker Object2Vec algorithm is designed to create embeddings that effectively represent contextual relationships in sentences. The other options, while useful for certain tasks, do not provide the same level of contextual differentiation necessary for this specific requirement.