Once python app.py is running, head to http://localhost:7860 in your browser. You’ll see two tabs.
This is where the action is.
As you type or adjust the slider, a status banner updates live:
On the right you’ll see:
Once the result appears, 👍 Helpful and 👎 Not helpful buttons show up below the metrics. Click either one to rate the result — the feedback is saved instantly. A note field then slides in where you can optionally type what worked well or didn’t (e.g. “lost key dates”, “too short”, “great summary”) and hit Save note. Both the rating and the note are stored with the run and visible in the History tab.
Every run saves automatically in the background. You don’t need to do anything.
Below the input box there’s a Show Token Highlights button. Click it and each token in your input gets rendered as a colour-coded chip — useful for seeing exactly where your budget is going. The panel updates live as you type. Click again to hide it.
Click Model Settings at the top of the tab to expand the accordion. Pick a model from the dropdown (or type a custom HuggingFace model ID) and hit Load Model. The current model is unloaded from memory first, then the new one loads — no restart needed. The status box confirms when it’s ready.
Available presets: Qwen2.5-1.5B-Instruct (default), Qwen2.5-0.5B-Instruct, SmolLM2-1.7B-Instruct, Phi-3.5-mini-instruct, Llama-3.2-1B-Instruct.
Below the compression model section in the same accordion, there’s a separate Embedder Model dropdown. The embedder is what computes the quality score — changing it affects how accurately that score reflects meaning retention.
When you select a model from the dropdown, an info panel updates immediately to explain the trade-off:
Hit Load Embedder to apply the selection. The previous embedder is unloaded from memory before the new one loads.
Click over here to see everything that’s been compressed so far.
The table loads automatically when you open the tab. Hit Refresh to pull in the latest runs. At the top you’ll find the average quality score and compression ratio across all sessions — a quick way to see how the tool is performing over time.
By default the table shows: id, timestamp, model, compression_ratio, quality_score, feedback. Open the Column visibility accordion above the table to toggle any additional columns on or off — changes apply instantly without a refresh.
Click any row in the table and a word-level diff panel opens below it. Words are colour-coded:
Click a row to select it, then hit Delete Selected Row. The table refreshes and the aggregate stats update automatically.