AWS Certified Data Engineer – Associate (DEA-C01) — Question 216
A sales company uses AWS Glue ETL to collect, process, and ingest data into an Amazon S3 bucket. The AWS Glue pipeline creates a new file in the S3 bucket every hour. File sizes vary from 200 KB to 300 KB. The company wants to build a sales prediction model by using data from the previous 5 years. The historic data includes 44,000 files.
The company builds a second AWS Glue ETL pipeline by using the smallest worker type. The second pipeline retrieves the historic files from the S3 bucket and processes the files for downstream analysis. The company notices significant performance issues with the second ETL pipeline.
The company needs to improve the performance of the second pipeline.
Which solution will meet this requirement MOST cost-effectively?
Answer options
- A. Use a larger worker type.
- B. Increase the number of workers in the AWS Glue ETL jobs.
- C. Use the AWS Glue DynamicFrame grouping option.
- D. Enable AWS Glue auto scaling.
Correct answer: C
Explanation
Using the AWS Glue DynamicFrame grouping option allows for more efficient data processing by grouping related records, which can significantly enhance performance without incurring high additional costs. Increasing the worker count or switching to a larger worker type may lead to higher costs and doesn't guarantee the same level of efficiency. Enabling auto scaling may also increase expenses without directly addressing the underlying performance issues.