Google Cloud Professional Data Engineer — Question 31

Your company has recently grown rapidly and now ingesting data at a significantly higher rate than it was previously. You manage the daily batch MapReduce analytics jobs in Apache Hadoop. However, the recent increase in data has meant the batch jobs are falling behind. You were asked to recommend ways the development team could increase the responsiveness of the analytics without increasing costs. What should you recommend they do?

Answer options

Correct answer: B

Explanation

The best solution is to rewrite the job in Apache Spark because it is designed for faster data processing compared to MapReduce, especially for iterative algorithms and real-time analytics. While rewriting in Pig or Hive may not yield the performance improvements needed, and increasing the cluster size would result in higher costs, Spark provides a way to maintain or even reduce costs while improving responsiveness.