flow-st8: local and private voice to text — no subscription, no screenshot of your screen

It all started at a workshop with Leandro Rezende from UX Unicórnio. He was using a voice-to-text tool that completely changed how he interacted with AI. You speak instead of typing — and that changes everything when you are building a long, detailed spec for an agent. The idea was immediate: I tried it, signed up for a trial of a well-known tool, and the shift was real. Without the friction of the keyboard in the middle, the prompts became more complete, more faithful to what I was actually thinking. The work became much more agile. When the trial ended, the question was simple: why pay?
The problem with the alternatives
Before building anything, I went to understand what existed. Several voice-to-text tools on the market take constant screenshots of your screen while you speak. The argument is "giving context" to the transcription (Context Awareness). What is not clear at installation is that those screenshots are being sent to external servers.
In practice: you are building a confidential spec, working on a sensitive project, drafting a proposal — and the tool is photographing your screen in a loop and sending it to the cloud. A design choice, let's say, rather peculiar. Fortunately, after strong community criticism, the "opt-in for training" on these tools is no longer enabled by default.
Beyond privacy, there was memory consumption. Using these tools constantly, usage reached 700–800 MB — for something that stays open all day in the background, that is quite a lifestyle choice.
What flow-st8 is
flow-st8 is my local version of what I needed: a voice dictation tool for Windows, offline, no subscription. Press Ctrl+Win, speak, press again — the text is transcribed by Whisper (OpenAI's open source model) and pasted wherever the cursor is. Works in any app: Claude, GPT, Slack, email, documents.
What it does NOT do — and this matters: does not take screenshots of your screen; does not send audio to external servers; does not require a subscription or internet; does not store anything outside your machine.
What it does: local transcription with OpenAI Whisper; 200–300 MB memory usage in constant use; NVIDIA GPU: transcription in ~1s | without GPU: lightweight models available; automatic silence filtering via VAD — no cuts mid-speech. For those without a dedicated GPU, flow-st8 offers several Whisper models — from tiny (39 MB, fast) to large-v3-turbo (excellent quality). You calibrate based on your hardware.
The numbers that matter
Three months of intensive use with AI agents: speech is ~3x faster than typing; specs that took 20–30 minutes to type are ready in 8–10 minutes of speech + revision; memory usage: 200–300 MB vs. 700–800 MB for commercial alternatives; monthly cost: R$0.
The main gain is not speed — it is quality of thinking. When you are not worried about typing, you say what you are really thinking: all the details, connections, nuances. The spec gets better. The agent output gets better.
Spec-Driven Development with voice
The use case that most transformed my day-to-day is building specs for AI agents. In Spec-Driven Development, the quality of the spec defines the quality of the result. The more precise, contextualized, and detailed, the better the agent performs. The problem is that good specs are long — and typing long specs is where most people cut corners.
With local voice-to-text, that trade-off disappears. You speak the complete spec, iterate by voice, refine in real time. The cycle became much faster than any typing method.
And knowing that nothing is leaving the machine changes the willingness to use it without filter. You speak more freely when you know no one is listening — or photographing.
On lasting — or not
Worth being honest: AI interfaces are gradually adding native voice input. Claude has it, GPT has it, others will too. When that becomes universal, flow-st8 loses most of its argument.
For now, it still makes sense — terminals, IDEs, code chats like Cursor, email, anywhere outside AI interfaces. The utility exists today, is real, and did not take long to build.
When it stops making sense, it changes or stops. That is fine. This is what distinguishes a tool built with awareness from one built with ego: you do not need to defend it forever. You only need it to be worth the time you invested — and that time already was.
What I learned building this
I did not create this from scratch as a lone genius. I saw the problem, used someone else's solution, understood what was underneath, and built the version that solved what bothered me: privacy, memory consumption, and zero cost.
The most honest way to innovate: start from something that works, understand why it works, and improve what matters to you. The repository is public and documented. If you use Windows, work with AI agents, and were mildly disturbed by the screenshot part — you will probably recognize yourself in this story.