Tap the floating button, speak, tap again, and your text is inserted into the currently focused field.
No keyboard switching. Local or cloud transcription.
In Dev mode, say “command mode” and then either describe the command you want or literally spell it. The spoken text and the inserted text do not have to be the same.
command mode show files in current dir
→
ls -l .
command mode git commit minus m description
→
git commit -m "Clean up overlay timing"
A small floating button lives on top of your apps. Tap it once to start recording.
Dictate naturally. The button pulses red while it's listening.
The button turns gray while the transcription runs.
On-device using local models, or sent to OpenAI Whisper with your own API key.
The transcribed text goes into the currently focused field when the app exposes a standard Android input field. If insertion fails, it falls back to the clipboard.
Both modes use your own hardware or your own API key. I run no backend.
Phone Whisper uses Android Accessibility Service for one narrow reason: inserting dictated text into the currently focused text field across apps.
It does not replace your keyboard. It does not run background automation. It only acts after you explicitly tap the overlay button.
The app is open source. You can read exactly what it does before granting the permission.
Read the source code on GitHubNo audio leaves your phone. Transcription runs entirely on your hardware.
Audio is sent from your device straight to the OpenAI API using your own key. I don't operate a relay server.
The full source code is on GitHub. No hidden behavior.
A custom keyboard means replacing your existing keyboard entirely. That's a lot of friction. Phone Whisper leaves your keyboard alone and works on top of any app, any keyboard.
Android doesn't have a standard API for inserting text into another app's focused field. The Accessibility Service is the sanctioned way to do that cross-app. Phone Whisper uses it for exactly one thing: inserting the transcribed text.
In local mode, yes. In cloud mode, audio is sent directly to OpenAI's API from your device using your own API key. I don't have a backend and never see your audio.
It works in most apps with standard text fields. Some apps restrict text injection for security reasons, and others use custom text surfaces instead of normal input fields. In those cases, the transcribed text is copied to your clipboard as a fallback.
Termux's main terminal area is not a standard Android text field, so direct insertion may not work there. Swipe the extra keys row left or right to switch to Termux's native text input box, then dictate there.
Not yet. I'm shipping the APK directly first, tightening the experience, and deciding later whether a Play Store release makes sense.
Not really. It's an early but working MVP. The core loop already works well enough to use every day, and I'm improving it in public.