What I learned. When an AI-extracted field can’t be grounded in the source text, NULL beats a plausible guess.
Where it bit me: the batch enrichment pipeline behind my on-device sommelier app. Let the model “complete” sparse shop data and it drifts into hallucinated tasting notes — they read great, but never appeared in the shop’s own description.
The fix is dumb and brutal: every extracted field runs a token-overlap check against the source text. Fails the check → it dies and becomes NULL. No fallback, no guess.
A catalog with empty fields is honest. One with invented tasting notes is broken in a way you can’t detect later — and it quietly poisons a hundred thousand rows.
Absence beats noise.