Skip to content

Conversation

@Higangssh
Copy link
Contributor

Summary

Added --stats flag to display token count comparison when encoding JSON to TOON.

Usage

toon data.json --stats

Output:

users[3]{id,name,role}:
1,Alice,admin
2,Bob,user
3,Charlie,user

ℹ Token estimate: 43 (JSON) → 18 (TOON)
✔ Saved ~25 tokens (58.1%)

Implementation

Uses simple approximation: Math.ceil(text.length / 4)
(GPT-style tokenizers average ~4 chars per token)

Why not use gpt-tokenizer:

  • Keeps CLI lightweight (no 2MB dependency)
  • Fast for quick feedback
  • Relative comparison is what matters here

If you prefer accuracy over simplicity, I can switch to gpt-tokenizer.

Screenshots

image

Added a small feature that seemed useful. Hope you can accept this PR if it looks good to you!


Higangssh and others added 2 commits October 31, 2025 23:09
- Add --stats boolean flag to display token count comparison
- Calculate approximate tokens using char length / 4 heuristic
- Show JSON vs TOON token counts with savings percentage
- Opt-in feature, default behavior unchanged
@johannschopplich
Copy link
Collaborator

Thanks! I've migrated to my tokenx lib for fast token estimation at 94% accuracy of a full tokenizer.

@johannschopplich johannschopplich changed the title feat(cli): add --stats flag to show token savings feat(cli): add --stats flag to show token savings Oct 31, 2025
@johannschopplich johannschopplich merged commit 2b88287 into toon-format:main Oct 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants