·12 min read
Share

Designing Better AI Chat: A Deep Dive (Part 2 of 2)

Citations, memory, thread navigation, export, and recovery patterns that turn AI chat from a demo into dependable infrastructure.

Designing Better AI Chat: A Deep Dive (Part 2 of 2) - hero illustration

If Part I asked, “Do I understand what is happening and can I steer it?”, Part II asks the next question users carry after they close the tab:

Will this still make sense when I come back tomorrow?
And when something breaks, do I get a path forward?

This is the reliability layer of chat UX: evidence, memory controls, navigation for long threads, and recovery paths that keep work moving.

Part I covered the first layer. This article is the continuation.

1) Trust is not a vibe. It is evidence.

Confidence without inspectable proof feels like branding. Users do not need more text. They need a fast path from claim to source.

Pattern: Citations

Citations turn chat from monologue into something users can audit. Inline markers reduce blind trust and make review behavior more deliberate.

Design note: if you cannot cite a claim, say so explicitly.

Interactive Demo: Citations
The James Webb Space Telescope was launched in 2021[1]. It orbits the Sun at L2[2].
1 NASA Mission Page
2 ESA Operations

→ Explore the Citations pattern

Pattern: Confidence Score

Confidence cues help users decide where to verify closely and where to move fast. Confidence UI fails when it never changes or never affects behavior.

Interactive Demo: Confidence Score
Confidence Score
Question
What is 847 × 23?
Answer
19,481
Confidence Level
98%
High Confidence

→ Explore the Confidence Score pattern

For trust-sensitive products, pair these with Trust, sources & truthfulness and the Trust Stack article.

2) Memory is the difference between a session and a relationship

Stateful chat creates a new failure mode: the system remembers what users did not mean to teach it, or forgets what they assumed was stable.

Pattern: Memory Management

Treat memory as an explicit product surface: visible entries, provenance, edit/delete, and clear boundaries for scope and retention.

Design note: the win is negotiated recall, not perfect recall.

Interactive Demo: Memory Management
3 Memoryies Stored
preference
User prefers Python over JavaScript
fact
Located in San Francisco
context
Budget cap: $200 per month

→ Explore the Memory Management pattern

See related patterns in Memory, personalization & data.

3) Long threads need navigation, not just scroll

Chat transcripts work until the task outlasts working memory. Then users need retrieval surfaces, not longer timelines.

Pattern: Conversation Summary

Summaries are recovery tools: return after a weekend, onboard a teammate, or re-anchor after a long branch. The best summaries invite correction.

Interactive Demo: Conversation Summary
Conversation (10 messages)
What are the best practices for building AI chatbots?
Great question! Key practices include: clear error handling, streaming responses for better UX, providing context about AI limitations, and implementing feedback mechanisms.
Can you explain more about streaming responses?
Streaming allows the AI to send responses incrementally as they're generated, rather than waiting for the complete response. This makes the interface feel more responsive and natural, similar to how humans type.
What about handling errors gracefully?
Error handling is crucial. Always provide clear, actionable error messages. Use retry mechanisms, show fallback options, and explain what went wrong in user-friendly language. Never expose technical details directly to users.
How do I implement trust indicators?
Trust indicators include: showing confidence levels, citing sources, displaying processing time, explaining reasoning when possible, and providing transparency about data usage and limitations.
Thanks! This is really helpful.
You're welcome! Feel free to ask if you need clarification on any of these patterns.

→ Explore the Conversation Summary pattern

Pattern: Message Pinning

Pinning stabilizes long-running threads by preserving key instructions and decisions in a visible place. It lowers rediscovery cost for both users and teams.

Interactive Demo: Message Pinning
Conversation
Pinned Messages (2)
Key practices include: clear error handling, streaming respo...
Streaming allows the AI to send responses incrementally as t...
What are the best practices for building AI chatbots?
Key practices include: clear error handling, streaming responses, providing context about AI limitations, and implementing feedback mechanisms.
Pinned
Can you explain more about streaming responses?
Streaming allows the AI to send responses incrementally as they're generated, making the interface feel more responsive.
Pinned
Thanks!

→ Explore the Message Pinning pattern

4) Export and handoff are part of the product

High-stakes work rarely ends inside chat. If handoff is clumsy, users manually rebuild context in docs, tickets, and email.

Pattern: Error Recovery Strategies

Failure is inevitable. Good recovery gives users believable next steps: retry with context, narrow scope, or switch approach without losing the thread.

Interactive Demo: Error Recovery

Error Recovery

ID: AGENT_091   LAT: 42MS   VER: 2.1.0

IDLE

Retry Maximum

2

Escalation Threshold

SensitiveBalancedLenient

Fallback Strategy

Switch to static heuristics if logic fails.

Recovery Timeline

Real-time Stream

Idle
No incidents

System initialized. Waiting for trigger...

→ Explore the Error Recovery Strategies pattern

Pattern: Human Handoff

Sometimes the best UX is not another model response. It is a clean escalation path with the right context and clear resolution ownership.

Interactive Demo: Human Handoff

→ Explore the Human Handoff pattern

More in Errors & recovery.

5) The system layer: safety, limits, and honesty

Trust is also defined by what users see when things are constrained.

Pattern: Rate Limit Warnings

Name the constraint in plain language and offer a next step. A specific warning is more trustworthy than a generic failure state.

Interactive Demo: Rate Limit Warnings
Rate Limit Status
API Requests45 / 50
5 requests remaining
Approaching Limit

Consider pacing your requests to avoid hitting the limit.

Reset Timer
5:00
Until limit resets

→ Explore the Rate Limit Warnings pattern

Pattern: Model Selection UI

When defaults fail, users need a deliberate tradeoff control: faster vs deeper, cheaper vs stronger, lighter vs more capable.

Interactive Demo: Model Selection UI
Select AI Model
Fast Model
Quick responses, good for simple tasks
Speed: 9/10
Quality: 6/10
$0.01/1k tokens
Balanced Model
Best balance of speed and quality
Speed: 7/10
Quality: 8/10
$0.05/1k tokens
Quality Model
Highest quality, slower responses
Speed: 4/10
Quality: 10/10
$0.15/1k tokens

→ Explore the Model Selection UI pattern

Related hub: Cost, models & limits.

A quick audit you can run this week

For your current chat experience, ask:

  • Can users verify claims quickly with inspectable sources?
  • Can users see and control what the system remembers over time?
  • Can users recover key moments in long threads without endless scrolling?
  • When things fail, can users recover, hand off, or continue without starting over?

If those answers are mostly yes, chat stops feeling like a clever demo and starts feeling like dependable infrastructure.

In Part I, we covered legibility, explicit control, output shaping, and branching. Together these two layers separate chat that performs from chat people can trust.

Found this useful? Share it with your network.

Share

Weekly insights in your inbox

A weekly newsletter for designers, PMs, and builders shipping AI products. Practical AI UX: patterns, real products, no hype.