Google Cloud Professional Cloud Architect — Question 162

Your company is designing its data lake on Google Cloud and wants to develop different ingestion pipelines to collect unstructured data from different sources.
After the data is stored in Google Cloud, it will be processed in several data pipelines to build a recommendation engine for end users on the website. The structure of the data retrieved from the source systems can change at any time. The data must be stored exactly as it was retrieved for reprocessing purposes in case the data structure is incompatible with the current processing pipelines. You need to design an architecture to support the use case after you retrieve the data. What should you do?

Answer options

Correct answer: D

Explanation

The correct answer is D because storing the unstructured data in a Cloud Storage bucket allows for the raw data to be preserved exactly as it was received, accommodating changes in data structure. This is essential for reprocessing if the current pipelines can't handle the new formats. Options A and C involve processing the data before storing it, risking the loss of the original format, while option B suggests storing the data in BigQuery, which is not suited for raw unstructured data storage.