AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 1
Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?
Answer options
- A. Amazon EMR Spark jobs
- B. Amazon Kinesis Data Streams
- C. Amazon DynamoDB
- D. AWS Lake Formation
Correct answer: D
Explanation
AWS Lake Formation is designed to simplify the process of collecting and managing data from various sources, making it the most suitable option for aggregating the datasets in this scenario. Other options like Amazon EMR Spark jobs and Amazon Kinesis Data Streams serve different purposes, such as processing data or real-time streaming, which do not specifically address the need for data aggregation from multiple sources.