Privacy-first OCR on Android: how Subly reads bills without the cloud
A practical 2026 guide to on-device OCR on Android with ML Kit Text Recognition — the setup, the code, and the field extraction tricks that keep user data off your servers.
The moment a user points their phone at a bill, your app has a choice. You can ship that image to a cloud OCR service and have polished text back in 400 ms — at the cost of routing somebody’s private financial document through your servers and a third party. Or you can do the work on the device, write a more honest privacy policy, and never have to think about another data breach being your fault.
I picked the second path for Subly, an Android subscription tracker that lets you scan a Netflix or Spotify invoice and have the service name, amount, frequency, and next charge date filled in for you. Nothing leaves the phone. This post walks through how that works in 2026 — the setup, the code, and the field-extraction tricks I learned the hard way.
Why on-device, when the cloud is so easy
Cloud OCR APIs are a one-liner. Google Cloud Vision, AWS Textract, Azure Read — they all give you crisp results and they all want your users’ images. For a subscription tracker, those images can include:
- The name, last 4 digits, or full email of the account holder
- Address blocks at the bottom of a printed bill
- Payment method details
- The full price, line items, and dates
Nothing about that list is something I want sitting in a third-party’s logs. And from a product standpoint, “we never send your bills anywhere” is a much stronger marketing line than “we send your bills to Google Cloud and delete them within 30 days.” So the architecture follows the value: keep the image on the device, only persist the extracted fields, and don’t open an internet connection for the scan at all.
The most private byte is the one that never leaves the phone.
ML Kit Text Recognition v2: the workhorse
Google’s ML Kit Text Recognition v2 is, in 2026, the boring-but-correct answer for on-device OCR on Android. It ships with the Latin model bundled into your app (about 4 MB), runs on a Pixel-class phone in tens of milliseconds, and handles printed text well enough for receipts, invoices, and email screenshots.
What I like about it:
- No network permission required. You can ship an app with
INTERNETremoved and the OCR still works. That alone is worth the integration time. - Runs offline. Airplane-mode scans are a feature, not a bug.
- Per-app model. Bundling the model means no surprise download on first launch and no
Try Againdialog when the user’s Wi-Fi is flaky. - Free. No API costs as you scale. For an indie app, that’s the difference between launching and overthinking.
The trade-off is accuracy on edge cases: handwriting is hit-or-miss, dense thermal-printed receipts can confuse the layout, and non-Latin scripts need a different model. For the receipt/invoice/email use case, it’s more than good enough.
Setup in 2026
The Gradle dependency:
// app/build.gradle.kts
dependencies {
implementation("com.google.mlkit:text-recognition:16.0.1")
implementation("androidx.camera:camera-camera2:1.4.0")
implementation("androidx.camera:camera-lifecycle:1.4.0")
implementation("androidx.camera:camera-view:1.4.0")
}
And one line in your manifest so Play recognizes the model is bundled and downloaded ahead of time:
<application>
<meta-data
android:name="com.google.mlkit.vision.DEPENDENCIES"
android:value="ocr" />
</application>
That’s the whole setup. No API keys, no service account JSON, no signed requests.
Recognizing text from a CameraX frame
Here’s the core loop. CameraX feeds you ImageProxy frames, you wrap one in an InputImage, and the recognizer hands back a Text object with paragraphs, lines, and rectangles.
import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions
import androidx.camera.core.ImageAnalysis
import androidx.camera.core.ImageProxy
private val recognizer = TextRecognition.getClient(
TextRecognizerOptions.DEFAULT_OPTIONS,
)
@androidx.camera.core.ExperimentalGetImage
fun analyzeFrame(image: ImageProxy, onScanned: (String) -> Unit) {
val media = image.image ?: return image.close()
val input = InputImage.fromMediaImage(media, image.imageInfo.rotationDegrees)
recognizer.process(input)
.addOnSuccessListener { result ->
// Whole page as a single string, line breaks preserved.
onScanned(result.text)
}
.addOnCompleteListener { image.close() }
}
A couple of things worth knowing:
- Always call
image.close()exactly once, even on error. ML Kit’s listener pattern can bite you if you forget. result.textis the easy mode. For better extraction you’ll wantresult.textBlocksso you can keep track of which lines are next to each other on the receipt.- ML Kit returns rectangles too. They’re useful for drawing overlays (“we detected this”) and for sanity-checking that the user actually framed the bill.
Field extraction: from raw text to structured data
OCR gives you text. The interesting work is turning that text into a subscription. For Subly, that means four fields:
- Service name — “Netflix”, “Spotify”, “ChatGPT Plus”
- Amount — “₺199.00” or “$9.99”
- Billing frequency — monthly, yearly
- Next charge date — the date in the future
A pragmatic on-device pipeline looks like this:
data class ExtractedSubscription(
val service: String?,
val amount: Money?,
val cycle: BillingCycle?,
val nextDate: LocalDate?,
)
fun extract(text: String): ExtractedSubscription {
val lines = text.lines().map { it.trim() }.filter { it.isNotEmpty() }
return ExtractedSubscription(
service = findService(lines),
amount = findAmount(lines),
cycle = findCycle(lines),
nextDate = findNextDate(lines),
)
}
Each find* function is a small, testable thing. A few lessons from the actual implementation:
- Service name is the hardest one. A dictionary of common services (Netflix, Spotify, ChatGPT, YouTube Premium, Apple Music…) catches 80% of cases. For the rest, the first prominent line of the receipt or a heuristic around the
From:/Sent by:header in invoice emails works well. - Amount is the easiest. A locale-aware regex like
(?:[₺$€£])\s?\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})?covers most currencies. Always strip thousand separators before parsing. - Cycle comes from the language around the amount: “per month”, “/mo”, “monthly”, “aylık”, “yearly”, “/yr”. A small Turkish + English vocabulary covers what you need.
- Next date is two strategies: pick up explicit phrases (“next billing on”, “renews on”) if present, otherwise infer from the latest date on the receipt plus one cycle.
None of this is glamorous. All of it is on the device, so no failure means a leaked invoice.
Show the user the work
A scanner is a black box if you don’t show what it did. Subly’s pattern: as soon as a frame matches, draw a card over the camera preview with the extracted fields and an Edit button. Confidence in OCR is fragile; an Edit button is honest and free.
@Composable
fun ExtractedCard(s: ExtractedSubscription, onEdit: () -> Unit, onAccept: () -> Unit) {
Card(...) {
Text("AI detected", style = labelMono)
Text("${s.service ?: "—"} ${s.amount?.format() ?: ""}", style = title)
Text("${s.cycle?.label} · next ${s.nextDate?.format()}", style = body)
Row {
TextButton(onClick = onEdit) { Text("Edit") }
FilledButton(onClick = onAccept) { Text("Add") }
}
}
}
If two of the four fields are missing, don’t auto-add — drop the user into the manual form pre-filled with what you did get.
What you don’t get from on-device OCR
A few honest trade-offs:
- Languages. Each script needs its own model. Latin is bundled by default; Chinese, Japanese, Korean, and Devanagari are separate.
- Tabular layouts. If you need exact column-row reconstruction from a multi-column invoice, you’ll write more layout code than you expect. For most subscription receipts this isn’t an issue.
- Handwriting. Don’t promise it. Even cloud OCRs are unreliable here.
- Model size. Adding multiple language models can grow your APK noticeably. Use Play’s App Bundle and feature delivery if you care about install size.
The marketing line writes itself
A line I want to be able to say in our privacy policy: “the bill image you capture is not uploaded to our servers and is not retained after processing.” With on-device ML Kit, that line is literally how the code works. Nothing about it is marketing — it’s an architecture statement, and the privacy policy follows from it instead of the other way around.
That’s the real argument for on-device OCR in 2026. It’s not just slightly faster or slightly cheaper. It’s that “we never see your data” is a thing you can say because it’s true, not because legal said it was probably fine.
If you’re building anything in the subscription / receipts / personal-finance space and you’re not sure which way to go, default to on-device. The library is free, the privacy story is bulletproof, and the worst thing that happens is you ship a feature your competitors literally cannot match without rebuilding their stack.
Subly is now live on Google Play — privacy-first by architecture, not by promise. More about it on the app page.
// Related reading
More from the journal
Local-first Android in 2026: SQLite, Room, and keeping user data on the device
A 2026 guide to building local-first Android apps with Room and SQLite — schema design, migrations, WAL, exports, and when (and when not) to add sync.
Reliable Android reminders in 2026: WorkManager, exact alarms, and the new battery rules
How to ship reminders on Android in 2026 that actually fire — WorkManager vs AlarmManager, SCHEDULE_EXACT_ALARM, POST_NOTIFICATIONS, and the OEM quirks that still bite.
The privacy advantage of local-first apps
The most private data is the data you never collect. Local-first isn't just an architecture choice — it's the simplest privacy policy there is.