Privacy-first OCR on Android: how Subly reads bills without the cloud

A practical 2026 guide to on-device OCR on Android with ML Kit Text Recognition — the setup, the code, and the field extraction tricks that keep user data off your servers.

May 25, 2026 MFKAPPS 7 min read

The moment a user points their phone at a bill, your app has a choice. You can ship that image to a cloud OCR service and have polished text back in 400 ms — at the cost of routing somebody’s private financial document through your servers and a third party. Or you can do the work on the device, write a more honest privacy policy, and never have to think about another data breach being your fault.

I picked the second path for Subly, an Android subscription tracker that lets you scan a Netflix or Spotify invoice and have the service name, amount, frequency, and next charge date filled in for you. Nothing leaves the phone. This post walks through how that works in 2026 — the setup, the code, and the field-extraction tricks I learned the hard way.

Why on-device, when the cloud is so easy

Cloud OCR APIs are a one-liner. Google Cloud Vision, AWS Textract, Azure Read — they all give you crisp results and they all want your users’ images. For a subscription tracker, those images can include:

The name, last 4 digits, or full email of the account holder
Address blocks at the bottom of a printed bill
Payment method details
The full price, line items, and dates

Nothing about that list is something I want sitting in a third-party’s logs. And from a product standpoint, “we never send your bills anywhere” is a much stronger marketing line than “we send your bills to Google Cloud and delete them within 30 days.” So the architecture follows the value: keep the image on the device, only persist the extracted fields, and don’t open an internet connection for the scan at all.

The most private byte is the one that never leaves the phone.

ML Kit Text Recognition v2: the workhorse

Google’s ML Kit Text Recognition v2 is, in 2026, the boring-but-correct answer for on-device OCR on Android. It ships with the Latin model bundled into your app (about 4 MB), runs on a Pixel-class phone in tens of milliseconds, and handles printed text well enough for receipts, invoices, and email screenshots.

What I like about it:

No network permission required. You can ship an app with INTERNET removed and the OCR still works. That alone is worth the integration time.
Runs offline. Airplane-mode scans are a feature, not a bug.
Per-app model. Bundling the model means no surprise download on first launch and no Try Again dialog when the user’s Wi-Fi is flaky.
Free. No API costs as you scale. For an indie app, that’s the difference between launching and overthinking.

The trade-off is accuracy on edge cases: handwriting is hit-or-miss, dense thermal-printed receipts can confuse the layout, and non-Latin scripts need a different model. For the receipt/invoice/email use case, it’s more than good enough.

Setup in 2026

The Gradle dependency:

// app/build.gradle.kts
dependencies {
    implementation("com.google.mlkit:text-recognition:16.0.1")
    implementation("androidx.camera:camera-camera2:1.4.0")
    implementation("androidx.camera:camera-lifecycle:1.4.0")
    implementation("androidx.camera:camera-view:1.4.0")
}

And one line in your manifest so Play recognizes the model is bundled and downloaded ahead of time:

<application>
  <meta-data
      android:name="com.google.mlkit.vision.DEPENDENCIES"
      android:value="ocr" />
</application>

That’s the whole setup. No API keys, no service account JSON, no signed requests.

Recognizing text from a CameraX frame

Here’s the core loop. CameraX feeds you ImageProxy frames, you wrap one in an InputImage, and the recognizer hands back a Text object with paragraphs, lines, and rectangles.

import com.google.mlkit.vision.common.InputImage
import com.google.mlkit.vision.text.TextRecognition
import com.google.mlkit.vision.text.latin.TextRecognizerOptions
import androidx.camera.core.ImageAnalysis
import androidx.camera.core.ImageProxy

private val recognizer = TextRecognition.getClient(
    TextRecognizerOptions.DEFAULT_OPTIONS,
)

@androidx.camera.core.ExperimentalGetImage
fun analyzeFrame(image: ImageProxy, onScanned: (String) -> Unit) {
    val media = image.image ?: return image.close()
    val input = InputImage.fromMediaImage(media, image.imageInfo.rotationDegrees)

    recognizer.process(input)
        .addOnSuccessListener { result ->
            // Whole page as a single string, line breaks preserved.
            onScanned(result.text)
        }
        .addOnCompleteListener { image.close() }
}

A couple of things worth knowing:

Always call image.close() exactly once, even on error. ML Kit’s listener pattern can bite you if you forget.
result.text is the easy mode. For better extraction you’ll want result.textBlocks so you can keep track of which lines are next to each other on the receipt.
ML Kit returns rectangles too. They’re useful for drawing overlays (“we detected this”) and for sanity-checking that the user actually framed the bill.

Field extraction: from raw text to structured data

OCR gives you text. The interesting work is turning that text into a subscription. For Subly, that means four fields:

Service name — “Netflix”, “Spotify”, “ChatGPT Plus”
Amount — “₺199.00” or “$9.99”
Billing frequency — monthly, yearly
Next charge date — the date in the future

A pragmatic on-device pipeline looks like this:

data class ExtractedSubscription(
    val service: String?,
    val amount: Money?,
    val cycle: BillingCycle?,
    val nextDate: LocalDate?,
)

fun extract(text: String): ExtractedSubscription {
    val lines = text.lines().map { it.trim() }.filter { it.isNotEmpty() }

    return ExtractedSubscription(
        service = findService(lines),
        amount = findAmount(lines),
        cycle = findCycle(lines),
        nextDate = findNextDate(lines),
    )
}

Each find* function is a small, testable thing. A few lessons from the actual implementation:

Service name is the hardest one. A dictionary of common services (Netflix, Spotify, ChatGPT, YouTube Premium, Apple Music…) catches 80% of cases. For the rest, the first prominent line of the receipt or a heuristic around the From:/Sent by: header in invoice emails works well.
Amount is the easiest. A locale-aware regex like (?:[₺$€£])\s?\d{1,3}(?:[.,]\d{3})*(?:[.,]\d{2})? covers most currencies. Always strip thousand separators before parsing.
Cycle comes from the language around the amount: “per month”, “/mo”, “monthly”, “aylık”, “yearly”, “/yr”. A small Turkish + English vocabulary covers what you need.
Next date is two strategies: pick up explicit phrases (“next billing on”, “renews on”) if present, otherwise infer from the latest date on the receipt plus one cycle.

None of this is glamorous. All of it is on the device, so no failure means a leaked invoice.

Show the user the work

A scanner is a black box if you don’t show what it did. Subly’s pattern: as soon as a frame matches, draw a card over the camera preview with the extracted fields and an Edit button. Confidence in OCR is fragile; an Edit button is honest and free.

@Composable
fun ExtractedCard(s: ExtractedSubscription, onEdit: () -> Unit, onAccept: () -> Unit) {
    Card(...) {
        Text("AI detected", style = labelMono)
        Text("${s.service ?: "—"}  ${s.amount?.format() ?: ""}", style = title)
        Text("${s.cycle?.label} · next ${s.nextDate?.format()}", style = body)
        Row {
            TextButton(onClick = onEdit) { Text("Edit") }
            FilledButton(onClick = onAccept) { Text("Add") }
        }
    }
}

If two of the four fields are missing, don’t auto-add — drop the user into the manual form pre-filled with what you did get.

What you don’t get from on-device OCR

A few honest trade-offs:

Languages. Each script needs its own model. Latin is bundled by default; Chinese, Japanese, Korean, and Devanagari are separate.
Tabular layouts. If you need exact column-row reconstruction from a multi-column invoice, you’ll write more layout code than you expect. For most subscription receipts this isn’t an issue.
Handwriting. Don’t promise it. Even cloud OCRs are unreliable here.
Model size. Adding multiple language models can grow your APK noticeably. Use Play’s App Bundle and feature delivery if you care about install size.

The marketing line writes itself

A line I want to be able to say in our privacy policy: “the bill image you capture is not uploaded to our servers and is not retained after processing.” With on-device ML Kit, that line is literally how the code works. Nothing about it is marketing — it’s an architecture statement, and the privacy policy follows from it instead of the other way around.

That’s the real argument for on-device OCR in 2026. It’s not just slightly faster or slightly cheaper. It’s that “we never see your data” is a thing you can say because it’s true, not because legal said it was probably fine.

If you’re building anything in the subscription / receipts / personal-finance space and you’re not sure which way to go, default to on-device. The library is free, the privacy story is bulletproof, and the worst thing that happens is you ship a feature your competitors literally cannot match without rebuilding their stack.

Subly is now live on Google Play — privacy-first by architecture, not by promise. More about it on the app page.

#android #engineering #privacy #ocr #mlkit

Back to all posts Got thoughts? Say hi →

// Related reading

More from the journal

April 2, 2026 MFKAPPS 7 min read

Local-first Android in 2026: SQLite, Room, and keeping user data on the device

A 2026 guide to building local-first Android apps with Room and SQLite — schema design, migrations, WAL, exports, and when (and when not) to add sync.

#android #engineering #privacy

April 22, 2026 MFKAPPS 7 min read

Reliable Android reminders in 2026: WorkManager, exact alarms, and the new battery rules

How to ship reminders on Android in 2026 that actually fire — WorkManager vs AlarmManager, SCHEDULE_EXACT_ALARM, POST_NOTIFICATIONS, and the OEM quirks that still bite.

#android #engineering #mobile

November 10, 2025 MFKAPPS 3 min read

The privacy advantage of local-first apps

The most private data is the data you never collect. Local-first isn't just an architecture choice — it's the simplest privacy policy there is.

#privacy #engineering