You are planning to load some of your existing on-premises data into BigQuery on Google C…

Question

You are planning to load some of your existing on-premises data into BigQuery on Google Cloud. You want to either stream or batch-load data, depending on your use case. Additionally, you want to mask some sensitive data before loading into BigQuery. You need to do this in a programmatic way while keeping costs to a minimum. What should you do?

Accepted Answer

Correct answer: C. C. Create your pipeline with Dataflow through the Apache Beam SDK for Python, customizing separate options within your code for streaming, batch processing, and Cloud DLP. Select BigQuery as your data sink. — The correct answer is C because it allows for a customizable pipeline using Dataflow and Apache Beam, where you can specify both streaming and batch processing while integrating Cloud DLP for data protection. Option A, while effective, does not provide the same level of customization for processing modes. Option B delays de-identification until after data is already in BigQuery, which is not ideal for sensitive data. Option D does not involve any data masking before loading, which is a key requirement.

Google Cloud Professional Data Engineer — Question 170

Answer options

Correct answer: C

Explanation