AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 1

Case study -
An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.
The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.
Which AWS service or feature can aggregate the data from the various data sources?

Answer options

Correct answer: D

Explanation

AWS Lake Formation is designed to simplify the process of collecting and managing data from various sources, making it the most suitable option for aggregating the datasets in this scenario. Other options like Amazon EMR Spark jobs and Amazon Kinesis Data Streams serve different purposes, such as processing data or real-time streaming, which do not specifically address the need for data aggregation from multiple sources.