You work for an online retail company. Your company collects customer purchase data in CS…

Question

You work for an online retail company. Your company collects customer purchase data in CSV files and pushes them to Cloud Storage every 10 minutes. The data needs to be transformed and loaded into BigQuery for analysis. The transformation involves cleaning the data, removing duplicates, and enriching it with product information from a separate table in BigQuery. You need to implement a low-overhead solution that initiates data processing as soon as the files are loaded into Cloud Storage. What should you do?

Accepted Answer

Correct answer: C. C. Use Dataflow to implement a streaming pipeline using an OBJECT_FINALIZE notification from Pub/Sub to read the data from Cloud Storage, perform the transformations, and write the data to BigQuery. — The correct answer is C because using Dataflow with a streaming pipeline allows for immediate processing of data as it is uploaded to Cloud Storage, making it efficient for real-time analysis. Option A involves additional overhead with Dataproc and Composer, which is not as efficient for immediate processing. Option B schedules a batch load that does not provide immediate data availability, and option D introduces unnecessary complexity with Cloud Data Fusion and Cloud Run, which is not needed for this scenario.

Google Cloud Associate Data Practitioner — Question 21

Answer options

Correct answer: C

Explanation