AWS Certified Data Analytics – Specialty — Question 24
A media company has been performing analytics on log data generated by its applications. There has been a recent increase in the number of concurrent analytics jobs running, and the overall performance of existing jobs is decreasing as the number of new jobs is increasing. The partitioned data is stored in
Amazon S3 One Zone-Infrequent Access (S3 One Zone-IA) and the analytic processing is performed on Amazon EMR clusters using the EMR File System
(EMRFS) with consistent view enabled. A data analyst has determined that it is taking longer for the EMR task nodes to list objects in Amazon S3.
Which action would MOST likely increase the performance of accessing log data in Amazon S3?
Answer options
- A. Use a hash function to create a random string and add that to the beginning of the object prefixes when storing the log data in Amazon S3.
- B. Use a lifecycle policy to change the S3 storage class to S3 Standard for the log data.
- C. Increase the read capacity units (RCUs) for the shared Amazon DynamoDB table.
- D. Redeploy the EMR clusters that are running slowly to a different Availability Zone.
Correct answer: C
Explanation
Increasing the read capacity units (RCUs) for the shared Amazon DynamoDB table can enhance performance by allowing more concurrent read operations, which is crucial when there are multiple analytics jobs. The other options, such as changing the storage class or redeploying clusters, do not directly address the bottleneck in object listing performance and may not yield the same improvement in this scenario.