Spoold Logo
Home

Token calculator & cost

Guide

More AI tools

Uses the same token encodings as Token & context budget above. Cost rows are illustrative — always confirm pricing on your provider's site.

Tokens
0
Words
0
Chars (no spaces)
0
Total characters
0

(Exact tokenizer loads on client)

Est. input cost
$0.0000
Est. output cost
$0.0038
Total (illustrative)
$0.0038

Compare token costs

Illustrative USD per 1M tokens and estimated charges for your current 0 input tokens and 256 expected output tokens. Prices change—confirm on each provider’s pricing page.

OpenAI (10 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
GPT-5.41M$2.5$0.25$15$0.00$0.00$0.00384$0.00384
GPT-5 mini400K$0.25$0.025$2$0.00$0.00$0.00051$0.00051
GPT-5 nano400K$0.1$0.01$0.4$0.00$0.00$0.00010$0.00010
GPT-4.11M$2$0.5$8$0.00$0.00$0.00205$0.00205
GPT-4.1 mini1M$0.4$0.1$1.6$0.00$0.00$0.00041$0.00041
GPT-4o128K$2.5$1.25$10$0.00$0.00$0.00256$0.00256
GPT-4o mini128K$0.15$0.075$0.6$0.00$0.00$0.00015$0.00015
o3200K$10$2.5$40$0.00$0.00$0.0102$0.0102
o3-mini200K$1.1$0.55$4.4$0.00$0.00$0.00113$0.00113
o4-mini200K$1.1$0.275$4.4$0.00$0.00$0.00113$0.00113

Anthropic (5 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Claude Opus 4.6200K$5$0.5$25$0.00$0.00$0.00640$0.00640
Claude Sonnet 4.6200K$3$0.3$15$0.00$0.00$0.00384$0.00384
Claude Sonnet 4200K$3$0.3$15$0.00$0.00$0.00384$0.00384
Claude Haiku 4.5200K$1$0.1$5$0.00$0.00$0.00128$0.00128
Claude 3.7 Sonnet200K$3$0.3$15$0.00$0.00$0.00384$0.00384

Google (5 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Gemini 3 Pro (Preview)1M$2$0.2$12$0.00$0.00$0.00307$0.00307
Gemini 3 Flash (Preview)1M$0.5$0.05$3$0.00$0.00$0.00077$0.00077
Gemini 2.5 Pro2M$1.25$0.125$10$0.00$0.00$0.00256$0.00256
Gemini 2.5 Flash1M$0.3$0.03$2.5$0.00$0.00$0.00064$0.00064
Gemini 2.0 Flash1M$0.1$0.025$0.4$0.00$0.00$0.00010$0.00010

xAI (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Grok 3131K$3$0.75$15$0.00$0.00$0.00384$0.00384
Grok 3 mini131K$0.3$0.075$0.5$0.00$0.00$0.00013$0.00013

Mistral (3 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Mistral Large 2128K$2$0.5$6$0.00$0.00$0.00154$0.00154
Mistral Small32K$0.2$0.05$0.6$0.00$0.00$0.00015$0.00015
Codestral256K$0.3$0.075$0.9$0.00$0.00$0.00023$0.00023

Cohere (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Command R+128K$2.5$2.5$10$0.00$0.00$0.00256$0.00256
Command R128K$0.5$0.5$1.5$0.00$0.00$0.00038$0.00038

DeepSeek (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
DeepSeek-V3128K$0.27$0.07$1.1$0.00$0.00$0.00028$0.00028
DeepSeek-R1128K$0.55$0.14$2.19$0.00$0.00$0.00056$0.00056

Meta (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Llama 3.3 70B (hosted API)128K$0.72$0.36$0.72$0.00$0.00$0.00018$0.00018
Llama 3.1 405B (hosted API)128K$3.5$1.75$3.5$0.00$0.00$0.00090$0.00090

Perplexity (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Sonar Pro200K$3$3$15$0.00$0.00$0.00384$0.00384
Sonar127K$1$1$1$0.00$0.00$0.00026$0.00026

Amazon Bedrock (2 models)

ModelContextInput /1MCached /1MOutput /1MYour inputYour cached*Your outputTotal†
Bedrock: Claude Sonnet (example)200K$3$0.3$15$0.00$0.00$0.00384$0.00384
Amazon Nova Pro300K$0.8$0.2$3.2$0.00$0.00$0.00082$0.00082

*If your full prompt were billed at cached input rates (same as your input token count).
†Total = input (standard) + output; not mixing cached + input in one line—compare scenarios to your provider’s rules.

Guide: LLM token calculator & cost estimate

↑ Back to tool

Also called an LLM token counter, GPT token calculator, or tiktoken calculator online — this page helps you estimate prompt tokens, compare illustrative API cost per million tokens, and learn words-to-tokens rules of thumb. Count tokens for pasted text using the same modes as Spoold's context budget tool: approximate (bytes ÷ 4), cl100k, or o200k via tiktoken-compatible BPE in the browser. Dollar amounts in tables are illustrative planning estimates — not live API quotes. For billing, use your provider's dashboard and tokenizer. Layout inspiration: token-calculator.net.

Quick read: start with Understanding tokenization if you are new to BPE counts, then use the cost table for rough budgets.

Similar tools

Understanding tokenization

Large language models do not read raw characters directly the way humans skim text. They consume tokens: integer IDs produced by a tokenizer that maps byte or text fragments to a fixed vocabulary. API pricing, context limits, and latency discussions are almost always phrased in tokens, not words or Unicode code points.

From text to tokens (BPE-style)

Modern chat models typically use subword tokenization (often byte-pair encoding, BPE, or close variants). The tokenizer learns frequent chunks of characters—common words may become one token, rare words split into several, and punctuation or spaces often merge with neighbors. That is why "hello" might be one token while "tokenization" might be two or three, depending on the vocabulary.

  • Frequent tokens — Short, common strings in the training data tend to get dedicated IDs (fewer tokens per word on average).
  • Rare tokens — Long words, rare names, or typos may split into many small pieces.
  • Structure — Braces, quotes, and indentation in code or JSON usually add extra tokens beyond "plain English" prose.

Tokens vs words vs characters

  • Words — Human-oriented; token counts rarely match word counts 1:1.
  • Characters — Longer than token count for Latin text, but not a substitute for BPE: combining marks, emoji, and CJK can map to very different token densities.
  • Tokens — What the model and billing pipeline use. Always prefer tokenizer output for budget math.

Encoding modes in this calculator

This page uses js-tiktoken in the browser for exact encodings when you choose cl100k_base or o200k_base. Approximate mode divides UTF-8 byte length by 4 (a fast planning heuristic; it is not BPE-accurate).

ModeWhat it isBest for
Approximate (bytes ÷ 4)Rough token estimate from UTF-8 lengthQuick sizing when exact BPE is not required
cl100k_baseTiktoken encoding used for many GPT-3.5 / GPT-4–class chat modelsMatching "classic" OpenAI-style token counts
o200k_baseTiktoken encoding aligned with GPT-4o / newer o-series–style vocabulariesCloser counts for 4o-era prompts when you select this mode

Visualization: in approximate mode, token "chips" are evenly sliced segments for display only. In cl100k/o200k, chips reflect real BPE pieces (IDs shown when available).

Input tokens, output tokens, and billing

Providers typically bill input (prompt) tokens and output (completion) tokens separately. Output is often priced higher per token. The calculator lets you set an expected output length so totals reflect prompt + reply planning—not prompt alone. Your real bill also depends on whether the provider charges for cached prompt prefixes, tool calls, or system wrappers, so treat numbers as estimates.

Context windows and limits

A context window is the maximum number of tokens the model can process in one forward pass—often counting input + output together (exact rules vary by API). Exceeding the limit causes truncation, errors, or forced summarization. When you pack RAG chunks, system prompts, and user messages, sum their tokenizer counts with the same encoding you deploy against.

Why another model can disagree

Different families use different vocabularies (Gemini, Claude, Llama, etc.). A token count from cl100k is informative for OpenAI-compatible pipelines but may not match Anthropic or Google tokenizers byte-for-byte. For production budgets, run the provider's official tokenizer or API "count tokens" endpoint when available.

How to use the token calculator

  1. Paste your text into the editor above.
  2. View token count using approximate, cl100k, or o200k encodings (cl100k/o200k use tiktoken-compatible BPE in the browser).
  3. Compare costs using the dropdown estimate and the multi-provider table above—always verify on the provider site.
  4. Optimize prompts using the visualization and the word/token guide to spot where length piles up.

Word-to-token conversion guide

Ratios are rules of thumb. Always run representative text through the tokenizer above—especially for code, JSON, or non-English text. Inspiration: token-calculator.net.

Content typeExampleTypical ratio~1,000 wordsNotes
English textHello world~1.3 tokens/word~1,300–1,500Standard prose averages about 1.3 tokens per word for Latin script.
Code (Python/JS)def func():~2–3 tokens/word~2,000–3,000Symbols, operators, and indentation usually add tokens compared to prose.
Chinese / Japanese你好世界~2+ tokens/char~2,000+CJK characters often map to multiple tokens; counts vary by tokenizer.
Technical writingAPI endpoint~1.5 tokens/word~1,500–1,800Abbreviations and domain terms can merge or split unpredictably.
JSON / XML{"key":"value"}~3–4 tokens/word~3,000–4,000Braces, quotes, and structure characters often consume extra tokens.

FAQ

What is a token in LLMs?

A token is a chunk of text from the model’s tokenizer (often BPE). It can be a word, part of a word, or punctuation. Billing is usually per token, not per character. For a deeper walkthrough, see the “Understanding tokenization” section on this page.

What is the difference between cl100k and o200k here?

Both are tiktoken encodings in the browser: cl100k_base matches many GPT-3.5 / GPT-4–class chat models; o200k_base aligns with GPT-4o / newer o-series–style vocabularies. Token counts differ between them—choose the mode that matches the stack you are estimating. Other providers (Anthropic, Google, etc.) use different tokenizers entirely.

Why does token count matter?

APIs charge by tokens, context windows are measured in tokens, and latency often scales with prompt length. Fewer tokens usually means lower cost and more room for the reply.

What metrics does this calculator show?

Tokens (via tiktoken-style encodings or a rough byte-based estimate), words, characters, and illustrative cost rows. Your exact bill depends on the provider’s tokenizer and pricing.

Is there a fee to use this page?

No. Spoold runs the calculator in your browser. You are not calling paid APIs from this tool by default.

How is my text handled?

Processing is client-side in this tool. Don’t paste secrets or PII you wouldn’t put in a local script.

Roughly how many tokens is 1,000 words?

For English prose, often about 1,300–1,500 tokens, but code, JSON, or CJK text can be very different—use the calculator on a sample.

What is a context window?

The maximum number of tokens (input + output combined, depending on the API) the model can attend to in one request. Exceeding it causes truncation or errors.

What is cached input pricing?

Some providers discount repeated prompt prefixes when you enable prompt caching. The same logical text may bill at cached rates on subsequent calls—check your provider’s docs.

How can I reduce API cost?

Shorten prompts, compress structured data, set max output tokens, pick a smaller model when quality allows, and reuse stable prefixes to benefit from caching where supported.

Which providers are in the comparison table?

The table includes illustrative rows for OpenAI, Anthropic, Google, xAI, Mistral, Cohere, DeepSeek, Meta-hosted Llama, Perplexity, and Amazon Bedrock—plus more models over time. Rates and model names change; always confirm on the provider’s pricing page.

More questions? See Contact.