Guide: LLM token calculator & cost estimate
↑ Back to toolAlso called an LLM token counter, GPT token calculator, or tiktoken calculator online — this page helps you estimate prompt tokens, compare illustrative API cost per million tokens, and learn words-to-tokens rules of thumb. Count tokens for pasted text using the same modes as Spoold's context budget tool: approximate (bytes ÷ 4), cl100k, or o200k via tiktoken-compatible BPE in the browser. Dollar amounts in tables are illustrative planning estimates — not live API quotes. For billing, use your provider's dashboard and tokenizer. Layout inspiration: token-calculator.net.
Quick read: start with Understanding tokenization if you are new to BPE counts; see what 10K tokens looks like for pages/chat/vision scale; then use the cost table for rough budgets.
What does 10,000 tokens look like?
A visual scale for thinking about token budgets — handy when someone says "stay under 10k" or you are comparing a prompt to a page of prose. Figures below are rules of thumb for English-like text; code, JSON, and other languages differ. Framing aligns with public token "cheatsheet" guides such as this token cheatsheet.
| Scale | ≈ 10,000 tokens |
|---|---|
| Words & characters | ≈ 7,500 words · ≈ 40,000 characters Heuristic: 1 token ≈ ¾ word ≈ 4 chars (English prose) |
| Printed pages | ~15 pages single-spaced · ~30 pages double-spaced — think one dense book chapter. |
| Conversation | ≈ 45–50 minutes of two-way chat (rough), depending on turns and verbosity. |
| Code footprint | On the order of a few thousand lines of commented application code (language and style change the ratio a lot). |
| JSON / data | ~350 KB of raw JSON is in the same ballpark — useful when planning vector chunks or ETL. |
| Images (vision) | A 1024 × 1024 photo under OpenAI-style rules is often about 85 tokens at detail:"low" vs ~765 tokens at detail:"high" (tiling) — crop, resize, or pair with a caption/URL to stay lean. Use Vision token estimator for your exact pixels. |
| Docs / slides | A 15-slide deck at ~75 words/slide is roughly 1,500 tokens of slide text alone — OCR'd scans chunk differently; embed for RAG in slices. |
| Support / cases | Ballpark: many short notes (e.g. dozens to hundreds of brief case summaries) can land near 10k tokens total — good for clustering or agent-style workflows over a corpus. |
Understanding token usage (short version)
- What is a token? — The model's tokenizer splits text into pieces (not always whole words). English often averages near ~0.75 words per token, but it varies widely.
- Why tokens matter — Context limits, latency, and cost are usually expressed in tokens. Efficient use affects budget and responsiveness at scale.
- Optimizing — Be concise, prefer structured formats when they help, lower image detail when quality allows, chunk large documents for RAG, and monitor usage in your provider dashboard.
For BPE mechanics and encoding modes, see Understanding tokenization below.
Similar tools
Understanding tokenization
Large language models do not read raw characters directly the way humans skim text. They consume tokens: integer IDs produced by a tokenizer that maps byte or text fragments to a fixed vocabulary. API pricing, context limits, and latency discussions are almost always phrased in tokens, not words or Unicode code points.
From text to tokens (BPE-style)
Modern chat models typically use subword tokenization (often byte-pair encoding, BPE, or close variants). The tokenizer learns frequent chunks of characters—common words may become one token, rare words split into several, and punctuation or spaces often merge with neighbors. That is why "hello" might be one token while "tokenization" might be two or three, depending on the vocabulary.
- Frequent tokens — Short, common strings in the training data tend to get dedicated IDs (fewer tokens per word on average).
- Rare tokens — Long words, rare names, or typos may split into many small pieces.
- Structure — Braces, quotes, and indentation in code or JSON usually add extra tokens beyond "plain English" prose.
Tokens vs words vs characters
- Words — Human-oriented; token counts rarely match word counts 1:1.
- Characters — Longer than token count for Latin text, but not a substitute for BPE: combining marks, emoji, and CJK can map to very different token densities.
- Tokens — What the model and billing pipeline use. Always prefer tokenizer output for budget math.
Encoding modes in this calculator
This page uses js-tiktoken in the browser for exact encodings when you choose cl100k_base or o200k_base. Approximate mode divides UTF-8 byte length by 4 (a fast planning heuristic; it is not BPE-accurate).
| Mode | What it is | Best for |
|---|---|---|
| Approximate (bytes ÷ 4) | Rough token estimate from UTF-8 length | Quick sizing when exact BPE is not required |
| cl100k_base | Tiktoken encoding used for many GPT-3.5 / GPT-4–class chat models | Matching "classic" OpenAI-style token counts |
| o200k_base | Tiktoken encoding aligned with GPT-4o / newer o-series–style vocabularies | Closer counts for 4o-era prompts when you select this mode |
Visualization: in approximate mode, token "chips" are evenly sliced segments for display only. In cl100k/o200k, chips reflect real BPE pieces (IDs shown when available).
Input tokens, output tokens, and billing
Providers typically bill input (prompt) tokens and output (completion) tokens separately. Output is often priced higher per token. The calculator lets you set an expected output length so totals reflect prompt + reply planning—not prompt alone. Your real bill also depends on whether the provider charges for cached prompt prefixes, tool calls, or system wrappers, so treat numbers as estimates.
Context windows and limits
A context window is the maximum number of tokens the model can process in one forward pass—often counting input + output together (exact rules vary by API). Exceeding the limit causes truncation, errors, or forced summarization. When you pack RAG chunks, system prompts, and user messages, sum their tokenizer counts with the same encoding you deploy against.
Why another model can disagree
Different families use different vocabularies (Gemini, Claude, Llama, etc.). A token count from cl100k is informative for OpenAI-compatible pipelines but may not match Anthropic or Google tokenizers byte-for-byte. For production budgets, run the provider's official tokenizer or API "count tokens" endpoint when available.
How to use the token calculator
- Paste your text into the editor above.
- View token count using approximate, cl100k, or o200k encodings (cl100k/o200k use tiktoken-compatible BPE in the browser).
- Compare costs using the dropdown estimate and the multi-provider table above—always verify on the provider site.
- Optimize prompts using the visualization and the word/token guide to spot where length piles up.
Word-to-token conversion guide
Ratios are rules of thumb. Always run representative text through the tokenizer above—especially for code, JSON, or non-English text. Inspiration: token-calculator.net.
| Content type | Example | Typical ratio | ~1,000 words | Notes |
|---|---|---|---|---|
| English text | Hello world | ~1.3 tokens/word | ~1,300–1,500 | Standard prose averages about 1.3 tokens per word for Latin script. |
| Code (Python/JS) | def func(): | ~2–3 tokens/word | ~2,000–3,000 | Symbols, operators, and indentation usually add tokens compared to prose. |
| Chinese / Japanese | 你好世界 | ~2+ tokens/char | ~2,000+ | CJK characters often map to multiple tokens; counts vary by tokenizer. |
| Technical writing | API endpoint | ~1.5 tokens/word | ~1,500–1,800 | Abbreviations and domain terms can merge or split unpredictably. |
| JSON / XML | {"key":"value"} | ~3–4 tokens/word | ~3,000–4,000 | Braces, quotes, and structure characters often consume extra tokens. |
FAQ
What is a token in LLMs?
A token is a chunk of text from the model’s tokenizer (often BPE). It can be a word, part of a word, or punctuation. Billing is usually per token, not per character. For a deeper walkthrough, see the “Understanding tokenization” section on this page.
What is the difference between cl100k and o200k here?
Both are tiktoken encodings in the browser: cl100k_base matches many GPT-3.5 / GPT-4–class chat models; o200k_base aligns with GPT-4o / newer o-series–style vocabularies. Token counts differ between them—choose the mode that matches the stack you are estimating. Other providers (Anthropic, Google, etc.) use different tokenizers entirely.
Why does token count matter?
APIs charge by tokens, context windows are measured in tokens, and latency often scales with prompt length. Fewer tokens usually means lower cost and more room for the reply.
What metrics does this calculator show?
Tokens (via tiktoken-style encodings or a rough byte-based estimate), words, characters, and illustrative cost rows. Your exact bill depends on the provider’s tokenizer and pricing.
Is there a fee to use this page?
No. Spoold runs the calculator in your browser. You are not calling paid APIs from this tool by default.
How is my text handled?
Processing is client-side in this tool. Don’t paste secrets or PII you wouldn’t put in a local script.
Roughly how many tokens is 1,000 words?
For English prose, often about 1,300–1,500 tokens, but code, JSON, or CJK text can be very different—use the calculator on a sample.
What is a context window?
The maximum number of tokens (input + output combined, depending on the API) the model can attend to in one request. Exceeding it causes truncation or errors.
What is cached input pricing?
Some providers discount repeated prompt prefixes when you enable prompt caching. The same logical text may bill at cached rates on subsequent calls—check your provider’s docs.
How can I reduce API cost?
Shorten prompts, compress structured data, set max output tokens, pick a smaller model when quality allows, and reuse stable prefixes to benefit from caching where supported.
Which providers are in the comparison table?
The table includes illustrative rows for OpenAI, Anthropic, Google, xAI, Mistral, Cohere, DeepSeek, Meta-hosted Llama, Perplexity, and Amazon Bedrock—plus more models over time. Rates and model names change; always confirm on the provider’s pricing page.
More questions? See Contact.