Google Cloud Professional Data Engineer — Question 168

You want to schedule a number of sequential load and transformation jobs. Data files will be added to a Cloud Storage bucket by an upstream process. There is no fixed schedule for when the new data arrives. Next, a Dataproc job is triggered to perform some transformations and write the data to BigQuery. You then need to run additional transformation jobs in BigQuery. The transformation jobs are different for every table. These jobs might take hours to complete. You need to determine the most efficient and maintainable workflow to process hundreds of tables and provide the freshest data to your end users. What should you do?

Answer options

Correct answer: D

Explanation

Option D is the correct choice as it allows for separate DAGs tailored to each table, which provides flexibility for varied transformation jobs. It also uses a Cloud Storage trigger to start the process as soon as new data arrives, ensuring timely updates. Options A and B are less efficient due to the single DAG approach, which would not handle the complexity of different jobs per table well, while option C lacks the ability to run separate DAGs for each table.