← Journal
brand

A "privacy mode" toggle on a cloud dictation tool is theater. You flip it, a reassuring label lights up, and your audio still leaves the machine the instant you press record. The switch didn't change the one fact that defines your privacy. It changed a setting on a server you'll never see.

This isn't an accusation against any one product. It's a description of how the category is built. And once you see it, you can't unsee it — because the marketing word "private" has been stretched over architectures that have almost nothing in common.

What "privacy mode" actually does

On most cloud dictation tools, privacy mode governs retention: how long the vendor keeps your audio after it has been transcribed. Turn it on, and the company promises to delete the recording quickly, or to not store it at all. That's a real, meaningful promise. It is also not the thing most people think they're buying.

It's worth being precise here, because the leading cloud tool is explicit about this. Its privacy mode is something you opt into (and is auto-enabled for Enterprise), and even with it on, transcription still happens in the cloud — the toggle changes retention, not location. That's not a gotcha; it's stated design. The point is just that "privacy mode" is governing the destination's memory, not removing the destination.

What privacy mode almost never does is change where the transcription happens. The audio is still captured on your device, encrypted, and streamed to the vendor's cloud, where a speech-to-text model turns it into words and sends them back. The toggle adjusts what happens to the recording at the destination. It does nothing about the trip.

So the honest way to read most "privacy mode" switches is: your voice still goes to the cloud — we just promise to forget it faster.

Follow the audio

Here is the actual path a spoken sentence takes through a typical cloud dictation tool:

ON YOUR DEVICE 1 Microphone captures your spoken audio 2 The app, on your machine opens an encrypted connection and streams the audio out your voice leaves the machine here ON COMPANIES' SERVERS — OUT OF YOUR HANDS 3 The vendor's cloud holds your audio in memory to transcribe it 4 A third-party speech API often a separate company entirely 5 An LLM clean-up pass another model, frequently another vendor 6 Your text comes back and appears in your document — the only part you ever see
The journey of one spoken sentence through a typical cloud dictation tool. Privacy mode only shortens how long step 3 keeps a copy.
  1. Your microphone captures audio on your device.
  2. The app opens an encrypted connection and streams that audio out — through your local network, your ISP, and the public internet.
  3. It arrives at the vendor's cloud, where it's held in memory (and sometimes on disk) long enough to run the speech-to-text model.
  4. That model is frequently a third-party API — a separate speech company, or a large AI provider. So the audio is now on a second vendor's infrastructure.
  5. Many tools then send the transcribed text to another model — an LLM — to clean up filler words, fix punctuation, or reformat. Another hop, another vendor.
  6. The finished text comes back and appears in your document.

Count the places your voice or its transcript now exists on hardware you don't own: at minimum the primary vendor, usually a speech sub-processor, often an LLM provider, plus the cloud platforms and analytics services disclosed in the fine print. Each of those is a place that can log, cache, be misconfigured, or be compromised. Even with perfect encryption in transit, transmission itself is a surface — the audio passes through infrastructure no single privacy policy fully controls. (Voibe: voice data privacy; Typeless sub-processor analysis.)

Privacy mode shortens how long step 3 keeps a copy. The rest of the journey happens exactly the same way whether the toggle is on or off.

A policy is not an architecture

This is the distinction the whole category blurs.

"Zero retention." "We never train on your data." "Deleted within 30 days." These are policies — promises about behavior. They're enforced by trust, contracts, and audits. And they can change: a terms-of-service update, an acquisition, a new "improve our models" default that ships opted-in. They can be broken: a breach, a misconfigured bucket, a sub-processor that logged more than it should. And they can be overridden: a subpoena doesn't care what your settings page says.

To be fair, the serious providers take these policies seriously. Some major cloud speech APIs default to not storing audio at all for real-time transcription, or make data logging opt-in rather than opt-out. (Microsoft Azure Speech-to-text data handling; Google Cloud Speech-to-Text data usage.) Others, by default, may log audio or route it past human reviewers unless you actively opt out. (Comparison of cloud speech-to-text privacy policies.) The defaults vary, and some are genuinely good.

But notice what even the best of those defaults is: a careful promise about audio that has already left your machine. That's the ceiling of the policy approach. It can be excellent and still be a promise.

Architecture is a different kind of guarantee. If the audio never leaves the device, there is no recording in someone else's memory to retain, no copy to leak, no file to subpoena, no sub-processor to trust, no future ToS change that can reach it. The privacy isn't asserted in a document. It's enforced by where the bytes physically are. You don't have to believe anyone.

That's the whole point: privacy as a policy is something you're promised. Privacy as architecture is something you're given.

davr's stance: keep the audio on the machine

davr offers a local transcription option built on a local Whisper engine. When it's on, speech-to-text runs on-device — the audio doesn't leave the machine. The data flow collapses from the six-hop journey above to something close to: microphone → on-device model → text in your document. There's no destination to retain anything, because there's no destination.

We'll be straight about how the pieces fit, because showing the data flow is the only honest way to claim privacy. Your words could leave the machine at two points, and davr gives you a switch for each. Local keeps the Whisper speech model on your device, so the audio never goes to OpenAI. Privacy Mode turns off the Claude cleanup pass, so the text never goes to Anthropic. Turn on both and the entire path — audio and text — stays on your machine. Leave one off and that part can still reach a provider. That's the honest version, and it's the whole point: complete on-device privacy here isn't a slogan, it's two switches you control.

What we won't do is call a retention setting "privacy" and hope you don't follow the audio. The choice davr gives you is the architectural one: the audio can stay on the device. That option existing is the difference between a promise and a guarantee.

Why the action layer raises the stakes, not lowers them

Here's the part that makes this more than a philosophical point.

davr isn't only dictation. It's an action layer. You can dictate a private message through Veil — steganographic, encrypted messaging that buries what you said inside innocent-looking cover text only your contact can decode (and Veil Local runs fully offline, no server). You can translate as you speak — say one language, output another. You can highlight text and Transform it by voice, or hand a spoken thought to Scribe to turn it into a finished post. You can ask Claude from a hotkey and get the answer dropped into whatever app you're in. Dictation is table stakes. The point is that your words drive actions, everywhere.

Sit with what that means for privacy. The more your voice does, the more of your life flows through this one pipe. Not just notes to yourself — a message you encoded because it was sensitive, replies to clients, the same thought rewritten in another language, the half-formed idea you wanted Claude to sharpen. An action layer is, by design, a tool you talk to about more things, more often, more candidly. A feature like Veil exists precisely for words you don't want exposed — which makes it the worst possible thing to route through hardware you don't control.

A category that's still arguing about whether to keep your dictation private is not ready for that. The richer the thing you build on top of voice, the less acceptable it is for the voice to be the part you can't pin down. On-device processing isn't a nice-to-have you bolt onto an action layer — it's the precondition that makes an action layer trustworthy enough to live in. You don't hand your whole working day to a pipe you have to take on faith.

That's why we lead with architecture. Not because privacy is a feature we're proud of, but because it's the floor everything else stands on.

Try it the way we described it

If "private voice dictation" should mean the audio can stay on your computer — not that a server promises to forget it — that's the version davr is built to give you. The local transcription option is there so the privacy is a property of the system, not a setting you have to trust.

You can test the claim yourself, and there's a path that costs nothing. davr is free when you bring your own API key — dictation on your own OpenAI/Anthropic account, where any cloud step runs through your provider, not ours. And if you'd rather try the managed AI features first, there's a 14-day trial with no credit card. Either way: turn on local transcription, dictate something you'd never paste into a stranger's server, and watch where it goes. The whole argument of this post is that you should be able to check.

Start free with your own key, or take the 14-day trial — no card required.

Download davr — Free