A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3…

Question

A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files. The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline. The company needs to improve the performance of the second pipeline. Which solution will meet this requirement MOST cost-effectively?

Accepted Answer

Correct answer: C. C. Use the AWS Glue DynamicFrame grouping option. — Using the AWS Glue DynamicFrame grouping option allows for more efficient data processing by grouping related records, which can significantly enhance performance without incurring high additional costs. Increasing the worker count or switching to a larger worker type may lead to higher costs and doesn't guarantee the same level of efficiency. Enabling auto scaling may also increase expenses without directly addressing the underlying performance issues.

AWS Certified Data Engineer – Associate (DEA-C01) — Question 216

Answer options

Correct answer: C

Explanation