A city has been collecting data on its public bicycle share program for the past three ye…

Question

A city has been collecting data on its public bicycle share program for the past three years. The 5PB dataset currently resides on Amazon S3. The data contains the following datapoints:
✑ Bicycle origination points
✑ Bicycle destination points
✑ Mileage between the points
✑ Number of bicycle slots available at the station (which is variable based on the station location)
✑ Number of slots available and taken at a given time
The program has received additional funds to increase the number of bicycle stations available. All data is regularly archived to Amazon Glacier.
The new bicycle stations must be located to provide the most riders access to bicycles.
How should this task be performed?

Accepted Answer

Correct answer: B. B. Use the Amazon Redshift COPY command to move the data from Amazon S3 into Redshift and perform a SQL query that outputs the most popular bicycle stations. — The correct answer is B because using Amazon Redshift allows for efficient querying of large datasets, making it suitable for identifying the most popular bicycle stations. Option A is incorrect as moving data to EBS and using Hadoop may not be the most efficient method for this specific task. Option C is not suitable as transferring data into Kinesis does not directly address the need for analyzing station popularity. Option D, while using EMR, still focuses on optimization rather than directly identifying popular stations through SQL queries.

AWS Certified Big Data – Specialty — Question 29

Answer options

Correct answer: B

Explanation