AWS Certified AI Practitioner (AIF-C01) — Question 158
What is tokenization used for in natural language processing (NLP)?
Answer options
- A. To encrypt text data
- B. To compress text files
- C. To break text into smaller units for processing
- D. To translate text between languages
Correct answer: C
Explanation
Tokenization is the process of dividing text into smaller units, known as tokens, which can be words or phrases, making it easier for algorithms to analyze and process language data. The other options do not accurately describe tokenization; encrypting text data, compressing files, and translating languages serve different functions unrelated to breaking text into units.