Google Cloud Professional Machine Learning Engineer — Question 88
One of your models is trained using data provided by a third-party data broker. The data broker does not reliably notify you of formatting changes in the data. You want to make your model training pipeline more robust to issues like this. What should you do?
Answer options
- A. Use TensorFlow Data Validation to detect and flag schema anomalies.
- B. Use TensorFlow Transform to create a preprocessing component that will normalize data to the expected distribution, and replace values that don’t match the schema with 0.
- C. Use tf.math to analyze the data, compute summary statistics, and flag statistical anomalies.
- D. Use custom TensorFlow functions at the start of your model training to detect and flag known formatting errors.
Correct answer: A
Explanation
The correct answer is A because TensorFlow Data Validation is specifically designed to identify and flag schema anomalies, enhancing the robustness of the training pipeline. Option B, while useful for preprocessing, does not proactively detect schema changes. Option C focuses on statistical analysis rather than schema validation, and option D relies on custom functions that may not cover all formatting issues.