AWS Certified Big Data – Specialty — Question 7

An Amazon EMR cluster using EMRFS has access to petabytes of data on Amazon S3, originating from multiple unique data sources. The customer needs to query common fields across some of the data sets to be able to perform interactive joins and then display results quickly.
Which technology is most appropriate to enable this capability?

Answer options

Correct answer: C

Explanation

The correct answer is C, Pig, as it is designed for processing large datasets and can handle complex data transformations and queries efficiently. While Presto is also a query engine, it may not be as optimized for the specific needs of EMR clusters with EMRFS compared to Pig. MicroStrategy and R Studio are primarily used for data visualization and statistical analysis, which are not suited for the interactive joining of large datasets as required in this scenario.