A customer is collecting clickstream data using Amazon Kinesis and is grouping the events…

Question

A customer is collecting clickstream data using Amazon Kinesis and is grouping the events by IP address into
5-minute chunks stored in Amazon S3.
Many analysts in the company use Hive on Amazon EMR to analyze this data. Their queries always reference a single IP address. Data must be optimized for querying based on IP address using Hive running on Amazon
EMR.
What is the most efficient method to query the data with Hive?

Accepted Answer

Correct answer: A. A. Store an index of the files by IP address in the Amazon DynamoDB metadata store for EMRFS. — The correct answer is A because storing an index in DynamoDB allows for efficient lookups by IP address, which is essential for the analysts' queries. The other options, while they may optimize storage or organization, do not provide the same level of querying efficiency for Hive when referencing a single IP address.

AWS Certified Big Data – Specialty — Question 13

Answer options

Correct answer: A

Explanation