On-device AI is free per call — that's the trap

What I learned. On-device LLM calls cost $0 per call — so nothing in my stack ever flagged that I was making way too many of them.

Why it matters — cloud bills are accidental call-graph auditors, and Apple’s Foundation Models took that auditor away.

The app is on-device by design: scan → match → recommend runs on the phone; the backend just syncs user state and ships a static catalog. So there was no per-call cost line to scream at me.

Then I audited it. One cold home-screen open fired six LLM calls — one quote, five narrations. 15–25 calls per typical session, several generating overlapping prose about the same wine. API pricing would have exposed that smell in a day.

The fix: consolidate to one or two structured calls per surface. The bill was never money — it was latency and battery.