Databricks Certified Generative AI Engineer Associate — Question 62
When developing an LLM application, it’s crucial to ensure that the data used for training the model complies with licensing requirements to avoid legal risks.
Which action is most appropriate to avoid legal risks?
Answer options
- A. Only use data explicitly labeled with an open license and ensure the license terms are followed.
- B. Any LLM outputs are reasonable to use because they do not reveal the original sources of data directly.
- C. Reach out to the data curators directly to gain written consent for using their data.
- D. Use any publicly available data as public data does not have legal restrictions.
Correct answer: A
Explanation
Option A is correct because using data that is explicitly labeled with an open license ensures compliance with licensing terms, thus avoiding legal issues. Option B is misleading as LLM outputs may still derive from copyrighted material, posing potential risks. Option C, while a good practice, may not always be feasible or necessary if open-licensed data is available. Option D is incorrect since publicly available data can still have legal restrictions that must be adhered to.