Google Cloud Associate Data Practitioner — Question 26

You are working on a data pipeline that will validate and clean incoming data before loading it into BigQuery for real-time analysis. You want to ensure that the data validation and cleaning is performed efficiently and can handle high volumes of data. What should you do?

Answer options

Correct answer: C

Explanation

The correct answer is C because Dataflow is specifically designed for processing large volumes of data efficiently and can handle validation and transformation in a streaming manner. Option A involves external processing, which may not be as efficient, while B, although a good choice, does not leverage the full capabilities of a dedicated data processing service like Dataflow. Option D may lead to higher costs and slower performance since it requires loading raw data and then processing it within BigQuery.