AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 19

An ML engineer needs to process thousands of existing CSV objects and new CSV objects that are uploaded. The CSV objects are stored in a central Amazon S3 bucket and have the same number of columns. One of the columns is a transaction date. The ML engineer must query the data based on the transaction date.
Which solution will meet these requirements with the LEAST operational overhead?

Answer options

Correct answer: A

Explanation

Option A is the best solution because using Amazon Athena with a CREATE TABLE AS SELECT statement provides a straightforward method to query the data directly from the central S3 bucket with minimal operational overhead. The other options involve additional steps such as replication, setting up jobs, or using Lambda functions, which introduce more complexity and maintenance efforts.