Guide: LLM token calculator & cost estimate
↑ Back to toolAlso called an LLM token counter, GPT token calculator, or tiktoken calculator online — this page helps you estimate prompt tokens, compare illustrative API cost per million tokens, and learn words-to-tokens rules of thumb. Count tokens for pasted text using the same modes as Spoold's context budget tool: approximate (bytes ÷ 4), cl100k, or o200k via tiktoken-compatible BPE in the browser. Dollar amounts in tables are illustrative planning estimates — not live API quotes. For billing, use your provider's dashboard and tokenizer. Layout inspiration: token-calculator.net.
Quick read: start with Understanding tokenization if you are new to BPE counts, then use the cost table for rough budgets.
Similar tools
Understanding tokenization
Large language models do not read raw characters directly the way humans skim text. They consume tokens: integer IDs produced by a tokenizer that maps byte or text fragments to a fixed vocabulary. API pricing, context limits, and latency discussions are almost always phrased in tokens, not words or Unicode code points.
From text to tokens (BPE-style)
Modern chat models typically use subword tokenization (often byte-pair encoding, BPE, or close variants). The tokenizer learns frequent chunks of characters—common words may become one token, rare words split into several, and punctuation or spaces often merge with neighbors. That is why "hello" might be one token while "tokenization" might be two or three, depending on the vocabulary.
- Frequent tokens — Short, common strings in the training data tend to get dedicated IDs (fewer tokens per word on average).
- Rare tokens — Long words, rare names, or typos may split into many small pieces.
- Structure — Braces, quotes, and indentation in code or JSON usually add extra tokens beyond "plain English" prose.
Tokens vs words vs characters
- Words — Human-oriented; token counts rarely match word counts 1:1.
- Characters — Longer than token count for Latin text, but not a substitute for BPE: combining marks, emoji, and CJK can map to very different token densities.
- Tokens — What the model and billing pipeline use. Always prefer tokenizer output for budget math.
Encoding modes in this calculator
This page uses js-tiktoken in the browser for exact encodings when you choose cl100k_base or o200k_base. Approximate mode divides UTF-8 byte length by 4 (a fast planning heuristic; it is not BPE-accurate).
| Mode | What it is | Best for |
|---|---|---|
| Approximate (bytes ÷ 4) | Rough token estimate from UTF-8 length | Quick sizing when exact BPE is not required |
| cl100k_base | Tiktoken encoding used for many GPT-3.5 / GPT-4–class chat models | Matching "classic" OpenAI-style token counts |
| o200k_base | Tiktoken encoding aligned with GPT-4o / newer o-series–style vocabularies | Closer counts for 4o-era prompts when you select this mode |
Visualization: in approximate mode, token "chips" are evenly sliced segments for display only. In cl100k/o200k, chips reflect real BPE pieces (IDs shown when available).
Input tokens, output tokens, and billing
Providers typically bill input (prompt) tokens and output (completion) tokens separately. Output is often priced higher per token. The calculator lets you set an expected output length so totals reflect prompt + reply planning—not prompt alone. Your real bill also depends on whether the provider charges for cached prompt prefixes, tool calls, or system wrappers, so treat numbers as estimates.
Context windows and limits
A context window is the maximum number of tokens the model can process in one forward pass—often counting input + output together (exact rules vary by API). Exceeding the limit causes truncation, errors, or forced summarization. When you pack RAG chunks, system prompts, and user messages, sum their tokenizer counts with the same encoding you deploy against.
Why another model can disagree
Different families use different vocabularies (Gemini, Claude, Llama, etc.). A token count from cl100k is informative for OpenAI-compatible pipelines but may not match Anthropic or Google tokenizers byte-for-byte. For production budgets, run the provider's official tokenizer or API "count tokens" endpoint when available.
How to use the token calculator
- Paste your text into the editor above.
- View token count using approximate, cl100k, or o200k encodings (cl100k/o200k use tiktoken-compatible BPE in the browser).
- Compare costs using the dropdown estimate and the multi-provider table above—always verify on the provider site.
- Optimize prompts using the visualization and the word/token guide to spot where length piles up.
Word-to-token conversion guide
Ratios are rules of thumb. Always run representative text through the tokenizer above—especially for code, JSON, or non-English text. Inspiration: token-calculator.net.
| Content type | Example | Typical ratio | ~1,000 words | Notes |
|---|---|---|---|---|
| English text | Hello world | ~1.3 tokens/word | ~1,300–1,500 | Standard prose averages about 1.3 tokens per word for Latin script. |
| Code (Python/JS) | def func(): | ~2–3 tokens/word | ~2,000–3,000 | Symbols, operators, and indentation usually add tokens compared to prose. |
| Chinese / Japanese | 你好世界 | ~2+ tokens/char | ~2,000+ | CJK characters often map to multiple tokens; counts vary by tokenizer. |
| Technical writing | API endpoint | ~1.5 tokens/word | ~1,500–1,800 | Abbreviations and domain terms can merge or split unpredictably. |
| JSON / XML | {"key":"value"} | ~3–4 tokens/word | ~3,000–4,000 | Braces, quotes, and structure characters often consume extra tokens. |
FAQ
What is a token in LLMs?
A token is a chunk of text from the model’s tokenizer (often BPE). It can be a word, part of a word, or punctuation. Billing is usually per token, not per character. For a deeper walkthrough, see the “Understanding tokenization” section on this page.
What is the difference between cl100k and o200k here?
Both are tiktoken encodings in the browser: cl100k_base matches many GPT-3.5 / GPT-4–class chat models; o200k_base aligns with GPT-4o / newer o-series–style vocabularies. Token counts differ between them—choose the mode that matches the stack you are estimating. Other providers (Anthropic, Google, etc.) use different tokenizers entirely.
Why does token count matter?
APIs charge by tokens, context windows are measured in tokens, and latency often scales with prompt length. Fewer tokens usually means lower cost and more room for the reply.
What metrics does this calculator show?
Tokens (via tiktoken-style encodings or a rough byte-based estimate), words, characters, and illustrative cost rows. Your exact bill depends on the provider’s tokenizer and pricing.
Is there a fee to use this page?
No. Spoold runs the calculator in your browser. You are not calling paid APIs from this tool by default.
How is my text handled?
Processing is client-side in this tool. Don’t paste secrets or PII you wouldn’t put in a local script.
Roughly how many tokens is 1,000 words?
For English prose, often about 1,300–1,500 tokens, but code, JSON, or CJK text can be very different—use the calculator on a sample.
What is a context window?
The maximum number of tokens (input + output combined, depending on the API) the model can attend to in one request. Exceeding it causes truncation or errors.
What is cached input pricing?
Some providers discount repeated prompt prefixes when you enable prompt caching. The same logical text may bill at cached rates on subsequent calls—check your provider’s docs.
How can I reduce API cost?
Shorten prompts, compress structured data, set max output tokens, pick a smaller model when quality allows, and reuse stable prefixes to benefit from caching where supported.
Which providers are in the comparison table?
The table includes illustrative rows for OpenAI, Anthropic, Google, xAI, Mistral, Cohere, DeepSeek, Meta-hosted Llama, Perplexity, and Amazon Bedrock—plus more models over time. Rates and model names change; always confirm on the provider’s pricing page.
More questions? See Contact.