Multimodal Input
Grok provides comprehensive multimodal input options through a paperclip menu, offering file upload, text content, sketching, cloud storage integration, and voice input capabilities.

Paperclip menu centralizes all content input methods, files, text, sketches, cloud storage, reducing cognitive load by grouping related actions
What's happening
Grok implements multimodal input through a comprehensive paperclip menu that reveals six input modalities: Upload a file, Add text content, Draw a sketch, Connect Google Drive, Connect Microsoft OneDrive, and Recent files. The drawing interface provides a full-featured canvas with color palette, drawing tools, and undo/redo. Text content can be added via modal dialog. All attachments appear as dismissible chips above the composer.
Patterns
Opens menu with file upload, text, sketch, and cloud storage options
Complete workflow from menu selection → creation interface → chip attachment
Direct integration with Google Drive and Microsoft OneDrive
UX Insights
- •Paperclip menu centralizes all content input methods
- •Drawing canvas offers full color palette and editing tools
- •Text content modal allows pre-composing context before adding
- •All attachments display as chips with dismiss buttons
- •Cloud storage integration removes friction for external files
- •Recent files provide quick access to previously used content
- •Clean visual separation between content inputs (paperclip) and AI tools (toolbar)
Design Decisions
Grok treats content input as a first-class feature by providing dedicated interfaces for each modality. The drawing canvas with full color palette and tools shows investment in creative input beyond simple file uploads. The "Add text content" modal lets users prepare context separately before attachment, useful for pasting documentation or formatted content. Chips provide consistent feedback across all input types (files, text, sketches).
More from the gallery
.png)
Context Chip Management
Claude allows users to attach files, images, and other context sources to conversations, displayed as removable chips.
.png)
Multimodal Input
Gemini supports multiple input types including text, images, and files in a unified composer interface, allowing users to attach and interact with various content types simultaneously.
More real-world AI UX in your inbox
Weekly gallery picks, interface patterns, and notes on how products ship AI - no spam, unsubscribe anytime.