AWS Certified Data Engineer – Associate (DEA-C01) — Question 13
A data engineer must orchestrate a series of Amazon Athena queries that will run every day. Each query can run for more than 15 minutes.
Which combination of steps will meet these requirements MOST cost-effectively? (Choose two.)
Answer options
- A. Use an AWS Lambda function and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
- B. Create an AWS Step Functions workflow and add two states. Add the first state before the Lambda function. Configure the second state as a Wait state to periodically check whether the Athena query has finished using the Athena Boto3 get_query_execution API call. Configure the workflow to invoke the next query when the current query has finished running.
- C. Use an AWS Glue Python shell job and the Athena Boto3 client start_query_execution API call to invoke the Athena queries programmatically.
- D. Use an AWS Glue Python shell script to run a sleep timer that checks every 5 minutes to determine whether the current Athena query has finished running successfully. Configure the Python shell script to invoke the next query when the current query has finished running.
- E. Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the Athena queries in AWS Batch.
Correct answer: A, B
Explanation
Option A is correct because using AWS Lambda with the Athena Boto3 client allows for efficient programmatic invocation of Athena queries without the need for continuous polling. Option B complements this by utilizing AWS Step Functions to manage the workflow, including waiting for the query to finish before starting the next, ensuring cost-effectiveness and structured execution. The other options either introduce unnecessary complexity or do not leverage the most efficient AWS services for this task.