A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The pro…

Question

A data engineer is using AWS Glue ETL jobs to process data at frequent intervals. The processed data is then copied into Amazon S3. The ETL jobs run every 15 minutes. The AWS Glue Data Catalog partitions need to be updated automatically after the completion of each job.
Which solution will meet these requirements MOST cost-effectively?

Accepted Answer

Correct answer: D. D. Use the AWS Glue Data Catalog to manage the data catalog. Update the AWS Glue ETL code to include the enableUpdateCatalog and partitionKeys arguments. — Option D is correct because it directly modifies the AWS Glue ETL code to ensure that the Data Catalog is updated efficiently after job execution. Options A and B, while valid, introduce additional complexity and potential costs with workflows and extra features. Option C suggests using an Apache Hive metastore, which is not necessary when the AWS Glue Data Catalog is available and suited for the task.

AWS Certified Data Analytics – Specialty — Question 154

Answer options

Correct answer: D

Explanation