Google Cloud Professional Data Engineer — Question 155
You currently use a SQL-based tool to visualize your data stored in BigQuery. The data visualizations require the use of outer joins and analytic functions. Visualizations must be based on data that is no less than 4 hours old. Business users are complaining that the visualizations are too slow to generate. You want to improve the performance of the visualization queries while minimizing the maintenance overhead of the data preparation pipeline. What should you do?
Answer options
- A. Create materialized views with the allow_non_incremental_definition option set to true for the visualization queries. Specify the max_staleness parameter to 4 hours and the enable_refresh parameter to true. Reference the materialized views in the data visualization tool.
- B. Create views for the visualization queries. Reference the views in the data visualization tool.
- C. Create a Cloud Function instance to export the visualization query results as parquet files to a Cloud Storage bucket. Use Cloud Scheduler to trigger the Cloud Function every 4 hours. Reference the parquet files in the data visualization tool.
- D. Create materialized views for the visualization queries. Use the incremental updates capability of BigQuery materialized views to handle changed data automatically. Reference the materialized views in the data visualization tool.
Correct answer: A
Explanation
The correct answer is A because creating materialized views with the specified parameters allows for efficient querying while ensuring the data is at least 4 hours old, thus improving performance. Options B and C do not provide the necessary optimization needed for performance improvement, and option D, while useful, does not meet the requirement for having data that is no less than 4 hours old.