The Fallback IS the Feature
What the user sees when the AI is wrong is more important than what they see when it's right. Spec the fallback first.
Open any AI feature spec. Find the section about what happens when the model is wrong. Almost certainly missing. The spec details every successful output state. It diagrams the happy path. It probably has a screenshot of the AI giving a great answer. Nowhere does it describe what the user sees, does, or feels when the model fails.
The failure case isn't a footnote. In production, it's a significant percentage of traffic. For most generative features, somewhere between 10% and 40% of outputs are wrong in a way that matters to the user. If you didn't design that experience, you shipped a feature that breaks for every fifth user — and then you'll wonder why retention is bad.
A useful drill: write the failure section before the success section.
Start with: 'When the model returns an answer the user knows is wrong, here's what they see and what they can do.' Then write it. In specifics. Not 'they can edit.' Exactly which UI affordance, which interaction, which keypress, which API call.
If you can describe the failure flow in three concrete paragraphs, the feature is ready to design. If you cannot — if your failure flow is 'the user can retry or contact support' — the feature is not ready. Retry is not a design. Contact support is not a design. Both are admissions that you didn't think about the failure case.
If your fallback flow is 'retry' or 'contact support', you haven't designed a fallback. You've named one.
Good fallbacks share four properties.
First, **the user stays in the flow.** No modal apology. No 'something went wrong.' The user can correct, edit, or override without leaving the page they're on.
Second, **the AI's output is editable, not replaceable.** If the user gets 80% of what they wanted, they shouldn't have to start from scratch. They should be able to tweak the 20% that's wrong. This is harder than it sounds; many AI feature UX patterns force a full regenerate cycle, which is the wrong default.
Third, **the correction is captured.** If the user edits the AI output, you should record both versions: what the model produced and what the user shipped. This becomes the training signal for future model improvements, even if you don't have a fine-tuning pipeline yet. Capture now; figure out how to use it later.
Fourth, **confidence is exposed when it would help the user.** Not always — sometimes confidence is noise. But when the user is about to take an irreversible action on AI-generated output, showing 'medium confidence' is information they need. Hiding it is paternalism.
Consider an AI feature that generates a draft email response to a customer support ticket.
Bad fallback: 'AI generated a response. Click send or rewrite.' If the response is wrong, the user has to delete and rewrite from scratch. Worst case, they send the wrong response because they didn't read carefully.
Good fallback: the AI response appears in the response field, fully editable inline. The user reads, edits the half that's wrong, and sends. The send button is greyed out for 1 second after the AI populates the field, forcing a beat where the user actually reads. The original AI version is logged separately so we can train on the corrections later. If model confidence is below threshold, a small indicator appears: 'Lower confidence — please review carefully.'
The second version takes 30% more design and engineering work. It is the only one that actually ships well in production.