AWS Certified AI Practitioner (AIF-C01) — Question 158

What is tokenization used for in natural language processing (NLP)?

Answer options

Correct answer: C

Explanation

Tokenization is the process of dividing text into smaller units, known as tokens, which can be words or phrases, making it easier for algorithms to analyze and process language data. The other options do not accurately describe tokenization; encrypting text data, compressing files, and translating languages serve different functions unrelated to breaking text into units.