AWS Certified Data Analytics – Specialty — Question 151

A gaming company is building a serverless data lake. The company is ingesting streaming data into Amazon Kinesis Data Streams and is writing the data to
Amazon S3 through Amazon Kinesis Data Firehose. The company is using 10 MB as the S3 buffer size and is using 90 seconds as the buffer interval. The company runs an AWS Glue ETL job to merge and transform the data to a different format before writing the data back to Amazon S3.
Recently, the company has experienced substantial growth in its data volume. The AWS Glue ETL jobs are frequently showing an OutOfMemoryError error.
Which solutions will resolve this issue without incurring additional costs? (Choose two.)

Answer options

Correct answer: D, E

Explanation

The correct answers, D and E, effectively address the OutOfMemoryError by optimizing file handling. Option D allows for the merging of small files directly within the Glue job, reducing memory strain, while option E increases the buffer size and interval, allowing more data to be processed at once. Options A, B, and C either involve additional processing steps, which may not resolve the memory issue effectively, or could incur costs.