Google Cloud Professional Data Engineer — Question 6
You want to use a database of information about tissue samples to classify future tissue samples as either normal or mutated. You are evaluating an unsupervised anomaly detection method for classifying the tissue samples. Which two characteristic support this method? (Choose two.)
Answer options
- A. There are very few occurrences of mutations relative to normal samples.
- B. There are roughly equal occurrences of both normal and mutated samples in the database.
- C. You expect future mutations to have different features from the mutated samples in the database.
- D. You expect future mutations to have similar features to the mutated samples in the database.
- E. You already have labels for which samples are mutated and which are normal in the database.
Correct answer: A, C
Explanation
The correct answers A and C are valid because unsupervised anomaly detection is effective when there is an imbalance, such as few mutations compared to normal samples, and when future mutations are expected to differ from the existing data. Options B and D are incorrect as they imply a balance or similarity that does not support the anomaly detection method, and E is wrong because having labels indicates a supervised approach rather than an unsupervised one.