Vocabulary
The set of unique tokens known to a language model or NLP system.
Description
In natural language processing, vocabulary refers to the set of unique tokens that a model or system recognizes. This can include words, subwords, or characters, depending on the tokenization method used. The vocabulary is typically built from the training data and has a significant impact on the model's ability to understand and generate text. The size and composition of the vocabulary can affect model performance, memory usage, and the ability to handle out-of-vocabulary words.
Examples
- π Word-level vocabulary
- π§© Subword vocabulary (e.g., in BERT or GPT models)
- π€ Character-level vocabulary
Applications
Related Terms
Featured

Hailuo AI
AI Video Generator from Text & Image

AI PDF Assistant
AI PDF Assistant is an intelligent recommendation tool

Abacus AI
The World's First Super Assistant for Professionals and Enterprises

Blackbox AI
Accelerate development with Blackbox AI's multi-model platform

Animon AI
Create anime videos for free

Un AI my text
βWhere AI Gets Its Human Touch.β

Sora 2
Transform Ideas into Stunning Videos with Sora 2

Genspark AI
Your All-in-One AI Workspace

Kimi AI
Kimi AI - K2 chatbot for long-context coding and research

Winston AI
The most trusted AI detector

ChatGPT Atlas
The browser with ChatGPT built in

