Vocabulary
The set of unique tokens known to a language model or NLP system.
Description
In natural language processing, vocabulary refers to the set of unique tokens that a model or system recognizes. This can include words, subwords, or characters, depending on the tokenization method used. The vocabulary is typically built from the training data and has a significant impact on the model's ability to understand and generate text. The size and composition of the vocabulary can affect model performance, memory usage, and the ability to handle out-of-vocabulary words.
Examples
- π Word-level vocabulary
- π§© Subword vocabulary (e.g., in BERT or GPT models)
- π€ Character-level vocabulary
Applications
Related Terms
Featured

Kimi AI
Kimi AI - K2 chatbot for long-context coding and research

Un AI my text
βWhere AI Gets Its Human Touch.β

Abacus AI
The World's First Super Assistant for Professionals and Enterprises

CoSupport AI
AI-powered platform for automating customer support

Animon AI
Create anime videos for free

TurboLearn AI
AI Note Taker & Study Tools

Winston AI
The most trusted AI detector

Hailuo AI
AI Video Generator from Text & Image

Genspark AI
Your All-in-One AI Workspace

ChatGPT Atlas
The browser with ChatGPT built in

