IAPP Artificial Intelligence Governance Professional (AIGP) — Question 22
You are the chief privacy officer of a medical research company that would like to collect and use sensitive data about cancer patients, such as their names, addresses, race and ethnic origin, medical histories, insurance claims, pharmaceutical prescriptions, eating and drinking habits and physical activity.
The company will use this sensitive data to build an AI algorithm that will spot common attributes that will help predict if seemingly healthy people are more likely to get cancer. However, the company is unable to obtain consent from enough patients to sufficiently collect the minimum data to train its model.
Which of the following solutions would most efficiently balance privacy concerns with the lack of available data during the testing phase?
Answer options
- A. Deploy the current model and recalibrate it over time with more data.
- B. Extend the model to multimodal ingestion with text and images.
- C. Utilize synthetic data to offset the lack of patient data.
- D. Refocus the algorithm to patients without cancer.
Correct answer: C
Explanation
Using synthetic data allows the company to create a dataset that mimics real patient information without compromising privacy, which is crucial given the lack of consent. Other options, such as recalibrating the current model or expanding it to multimodal inputs, do not address the fundamental issue of insufficient real data and privacy concerns. Refocusing on non-cancer patients would not fulfill the original objective of predicting cancer risk.