Google Cloud Professional Data Engineer — Question 72

Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?

Answer options

Correct answer: A

Explanation

Option A is correct because redefining the schema to evenly distribute reads and writes helps to optimize performance in Bigtable by preventing hotspots. Option B is incorrect as simply increasing the cluster size does not necessarily resolve performance issues. Option C is wrong because using a single row key may create bottlenecks, while option D could lead to inefficient distribution of data and performance degradation.