Google Cloud Professional Machine Learning Engineer — Question 98

You work for a retailer that sells clothes to customers around the world. You have been tasked with ensuring that ML models are built in a secure manner. Specifically, you need to protect sensitive customer data that might be used in the models. You have identified four fields containing sensitive data that are being used by your data science team: AGE, IS_EXISTING_CUSTOMER, LATITUDE_LONGITUDE, and SHIRT_SIZE. What should you do with the data before it is made available to the data science team for training purposes?

Answer options

Correct answer: A

Explanation

The correct answer is A because tokenizing the fields with hashed dummy values ensures that sensitive data is protected while still allowing the data science team to work with the necessary information for model training. Option B is incorrect as PCA does not specifically secure sensitive data; it merely compresses it. Option C, while it reduces detail, does not adequately protect sensitive information. Option D removes the fields entirely, which could hinder the model's performance due to lack of necessary data.