An external customer provides you with a daily dump of data from their database. The data…

Question

An external customer provides you with a daily dump of data from their database. The data flows into Google Cloud Storage GCS as comma-separated values
(CSV) files. You want to analyze this data in Google BigQuery, but the data could have rows that are formatted incorrectly or corrupted. How should you build this pipeline?

Accepted Answer

Correct answer: D. D. Run a Google Cloud Dataflow batch pipeline to import the data into BigQuery, and push errors to another dead-letter table for analysis. — The best approach is to use a Google Cloud Dataflow batch pipeline to import the data into BigQuery while directing any errors to a dead-letter table for analysis, ensuring you can review and correct corrupted data. Options A and C do not provide comprehensive error handling, while option B focuses on monitoring rather than data ingestion and error management.

Google Cloud Professional Data Engineer — Question 20

Answer options

Correct answer: D

Explanation