A company is planning to create several ML prediction models. The training data is stored…

Question

A company is planning to create several ML prediction models. The training data is stored in Amazon S3. The entire dataset is more than 5 ТВ in size and consists of CSV, JSON, Apache Parquet, and simple text files.
The data must be processed in several consecutive steps. The steps include complex manipulations that can take hours to finish running. Some of the processing involves natural language processing (NLP) transformations. The entire process must be automated.
Which solution will meet these requirements?

Accepted Answer

Correct answer: D. D. Use Amazon SageMaker Pipelines to create a pipeline of data processing steps. Automate the pipeline by using Amazon EventBridge. — The correct answer, D, is suitable because Amazon SageMaker Pipelines is designed for creating and automating complex workflows for ML models, making it ideal for processing large datasets in a sequenced manner. The other options either do not provide a comprehensive automation solution or are not specifically tailored for handling extensive data processing workflows effectively.

AWS Certified Machine Learning Engineer – Associate (MLA-C01) — Question 38

Answer options

Correct answer: D

Explanation