ISACA Certified Artificial Intelligence Auditor (CAIA) — Question 29
Which of the following strategies used by modelers to enhance data accuracy has the GREATEST risk of bias and information loss?
Answer options
- A. Filling blank attributes in records with the mean, median, or mode within a grouping
- B. Placing numerical data into bins or buckets for a manageable quantity of correlations and result analyses
- C. Separating multiple data attributes within one field into individual attribute columns
- D. Identifying and deleting duplicate entries in the data set
Correct answer: A
Explanation
Option A is correct because filling in missing values with statistical measures can lead to distortion of the data distribution, thus introducing bias. The other options, while they have their own risks, are generally focused on organization or management of data rather than directly altering the values themselves, which makes them less risky in terms of bias and information loss.