Inspiratiom
Ashfeather is a short story written in a poetic style. You can read it at the end of Part 2.. I want to start by discussing the impulsive inspiration which resulted in its conception - especially in relation to the cover art. You’ll notice there’s a bird. And then behind it, you’ll see a collection of symbols—glyphs, sigils, call them what you like. These weren’t randomly chosen. There’s meaning embedded in them.
Over the past week or so, I’ve been developing a side project. It might not become anything major, but it’s been a fascinating experiment. Essentially, I’ve been building an encoding and decoding system—something that takes regular human language, filters out words or constructs that carry minimal contextual weight, and compresses what remains into a denser, more efficient format.
Language Compression for LLMs
The core idea? Take a full paragraph of human-readable text, compress it without losing its underlying intent, and use that compressed version to prompt a large language model (LLM).
If you’re not already familiar with how LLMs work, here’s a brief overview:
- LLMs have a fixed memory window, called “context length.”
- Once the number of tokens (text units the model reads) exceeds this limit, older data is evicted.
- This is a big issue for long interactions or coding sessions with lots of source material.
For example, if you’re working on a codebase with 10,000+ lines and trying to get help with it, the model might initially do well—but over time, it’ll start forgetting prior inputs. This leads to hallucinations—fabricated or incorrect output that sounds plausible. Because LLMs are trained to always respond, they won’t admit confusion—they’ll just fill in the blanks, even if wrong.
Why Token Counts Matter
Every message you send and every reply the model gives is counted in tokens. If your model supports 10,000 tokens and you write 2,000 and it responds with 3,000, you’re already halfway through the memory stack.
English is verbose. Many words—articles, connectives, fluff—don’t carry much meaning. A model can still interpret the message correctly even if a lot of these are removed. That’s one vector for compression.
But here’s where it gets deeper…
Unicode, Symbols, and Cross-Lingual Efficiency
Enter Unicode. It includes character sets from languages like Chinese, Japanese, Russian, Greek, etc. Some of these characters represent complex concepts in just a single symbol.
- Chinese characters (CJK set) can compress entire phrases into one token.
- Greek symbols (Δ, β, θ) often map to common logic or scientific terms.
- Mathematical operators like ≥ or ∧ are also just one token.
So if you take a sentence like:
“The boy wants to know if he has more pennies in his jar than his friend.”
You can:
- Strip unnecessary words.
- Replace “more than” with the
>
symbol.
- Replace repeated nouns with symbolic references.
This transforms verbose natural language into a kind of symbolic shorthand. To humans, it’s borderline unreadable. But to the LLM, it’s still interpretable—especially if we define a key or context schema.
Toward a Unified Symbolic Compression Layer
I’ve been experimenting with building a unified compression method—a “Promptese,” if you will. The idea is that a paragraph goes in, and a compressed symbolic stream comes out. You might not be able to read it, but the LLM can.
If you compress a 1,000-token input down to 400–500 tokens, you’re drastically increasing how much information can fit into context. That’s huge for agents, pipelines, or advanced workflows where every token matters.
But Why Not Train the LLM?
Sure—you can fine-tune a small LLM on this system, train it on your own compressed language, and have it decode that format. But unless you’re deploying the model at scale, it’s not that useful. Most people are using hosted models (OpenAI, Anthropic, etc.), so they can’t retrain the base model.
That’s why this approach focuses on external compression. It works with existing models. It’s a symbolic wrapper. Think of it as zipping your thoughts before you send them.
Wrapping Up
This is what I’ve been building. A language not for humans, but for AI. A hidden script of meaning woven in glyphs.
And in part two, I’ll walk through how the system works, as well as drop Ashfeather.
READ PART 2