Google Cloud Professional Machine Learning Engineer — Question 224
You work for a telecommunications company. You’re building a model to predict which customers may fail to pay their next phone bill. The purpose of this model is to proactively offer at-risk customers assistance such as service discounts and bill deadline extensions. The data is stored in BigQuery and the predictive features that are available for model training include:
- Customer_id
- Age
- Salary (measured in local currency)
- Sex
- Average bill value (measured in local currency)
- Number of phone calls in the last month (integer)
- Average duration of phone calls (measured in minutes)
You need to investigate and mitigate potential bias against disadvantaged groups, while preserving model accuracy.
What should you do?
Answer options
- A. Determine whether there is a meaningful correlation between the sensitive features and the other features. Train a BigQuery ML boosted trees classification model and exclude the sensitive features and any meaningfully correlated features.
- B. Train a BigQuery ML boosted trees classification model with all features. Use the ML.GLOBAL_EXPLAIN method to calculate the global attribution values for each feature of the model. If the feature importance value for any of the sensitive features exceeds a threshold, discard the model and tram without this feature.
- C. Train a BigQuery ML boosted trees classification model with all features. Use the ML.EXPLAIN_PREDICT method to calculate the attribution values for each feature for each customer in a test set. If for any individual customer, the importance value for any feature exceeds a predefined threshold, discard the model and train the model again without this feature.
- D. Define a fairness metric that is represented by accuracy across the sensitive features. Train a BigQuery ML boosted trees classification model with all features. Use the trained model to make predictions on a test set. Join the data back with the sensitive features, and calculate a fairness metric to investigate whether it meets your requirements.
Correct answer: D
Explanation
Option D is the correct approach because it defines a fairness metric based on accuracy related to sensitive features, allowing for a comprehensive evaluation of bias while maintaining model accuracy. The other options focus either on excluding features or retraining models based on feature importance without properly integrating a fairness assessment, which does not effectively address the bias issue.