AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 204
A music streaming company constantly streams song ratings from an application to an Amazon S3 bucket. The company wants to use the ratings as an input for training and inference of an Amazon SageMaker AI model.
The company has an AWS Glue Data Catalog that is configured with the S3 bucket as the source. An ML engineer needs to implement a solution to create a repository for this data. The solution must ensure that the data stays synchronized during batch training and real-time inference.
Which solution will meet these requirements?
Answer options
- A. Ingest data into SageMaker Feature Store from the S3 bucket. Apply tags and indexes.
- B. Use Amazon Athena. Create tables by using CREATE TABLE AS SELECT (CTAS) queries to group data.
- C. Use AWS Lake Formation. Apply tag-based control on the data.
- D. Use the Generate Data Insights function in SageMaker Data Wrangler.
Correct answer: A
Explanation
The correct answer is A because ingesting data into SageMaker Feature Store allows the ratings to be organized and accessed efficiently for both batch training and real-time inference. Option B is not suitable as Amazon Athena is primarily for querying data rather than creating a synchronized data repository. Option C, while useful for managing permissions, does not directly address the need for synchronization in training and inference. Option D focuses on generating insights rather than establishing a robust data repository.