AWS Certified Big Data – Specialty — Question 53
A customer has a machine learning workflow that consists of multiple quick cycles of reads-writes-reads on
Amazon S3. The customer needs to run the workflow on EMR but is concerned that the reads in subsequent cycles will miss new data critical to the machine learning from the prior cycles.
How should the customer accomplish this?
Answer options
- A. Turn on EMRFS consistent view when configuring the EMR cluster.
- B. Use AWS Data Pipeline to orchestrate the data processing cycles.
- C. Set hadoop.data.consistency = true in the core-site.xml file.
- D. Set hadoop.s3.consistency = true in the core-site.xml file.
Correct answer: A
Explanation
The correct answer is A because enabling EMRFS consistent view ensures that all read operations on S3 will see the most recent data, which is crucial for the customer's workflow. Options B and C do not directly address the consistency of reads and writes in S3, and option D is incorrect as it refers to a non-existent property for ensuring consistency with Amazon S3.