A company has developed an Apache Hive script to batch process data stared in Amazon S3.…

Question

A company has developed an Apache Hive script to batch process data stared in Amazon S3. The script needs to run once every day and store the output in
Amazon S3. The company tested the script, and it completes within 30 minutes on a small local three-node cluster.
Which solution is the MOST cost-effective for scheduling and executing the script?

Accepted Answer

Correct answer: A. A. Create an AWS Lambda function to spin up an Amazon EMR cluster with a Hive execution step. Set KeepJobFlowAliveWhenNoSteps to false and disable the termination protection flag. Use Amazon CloudWatch Events to schedule the Lambda function to run daily. — The correct answer is A because using AWS Lambda to trigger an Amazon EMR cluster is a cost-effective approach since it only incurs costs when the cluster is running. Other options like B and C involve higher running costs due to the persistent nature of the EMR cluster or using more resources than necessary. Option D is not suitable as running Hive in AWS Lambda may not be feasible due to execution time limits and resource constraints.

AWS Certified Data Analytics – Specialty — Question 28

Answer options

Correct answer: A

Explanation