Databricks Certified Generative AI Engineer Associate — Question 9
A Generative Al Engineer is creating an LLM-based application. The documents for its retriever have been chunked to a maximum of 512 tokens each. The Generative Al Engineer knows that cost and latency are more important than quality for this application. They have several context length levels to choose from.
Which will fulfill their need?
Answer options
- A. context length 514; smallest model is 0.44GB and embedding dimension 768
- B. context length 2048: smallest model is 11GB and embedding dimension 2560
- C. context length 32768: smallest model is 14GB and embedding dimension 4096
- D. context length 512: smallest model is 0.13GB and embedding dimension 384
Correct answer: D
Explanation
The correct answer is D because it offers a context length of 512 tokens, which matches the chunk size of the documents, and has the smallest model size at 0.13GB, making it cost-effective and suitable for low latency. Options A, B, and C either exceed the token limit or have significantly larger model sizes, which would increase costs and latency, contrary to the engineer's priorities.