Vocabulary
The set of unique tokens known to a language model or NLP system.
Description
In natural language processing, vocabulary refers to the set of unique tokens that a model or system recognizes. This can include words, subwords, or characters, depending on the tokenization method used. The vocabulary is typically built from the training data and has a significant impact on the model's ability to understand and generate text. The size and composition of the vocabulary can affect model performance, memory usage, and the ability to handle out-of-vocabulary words.
Examples
- 📚 Word-level vocabulary
- 🧩 Subword vocabulary (e.g., in BERT or GPT models)
- 🔤 Character-level vocabulary
Applications
Related Terms
Featured

RemoveSynthID
Reduce invisible SynthID signals while keeping images clear and private.

AI Influencer Generator
Sceneform.ai is an AI platform for creating realistic virtual influencers, UGC ads, talking avatars, and short-form social videos at scale.

CoSupport AI
AI-powered platform for automating customer support

