Quick definition · 2 min AI term

Multimodal

Multimodal AI can handle more than one kind of input or output: text, images, sound, even video.

Think of it like

A person who can read, see, and listen, not just read. It works across senses, not only one.

Example

Show a multimodal model a photo and ask “what is wrong with this recipe?” and it reads the picture and replies in words.

Why it matters

Modern AI is increasingly multimodal. It is why you can now paste an image into a chat and ask about it.

Where you’ll see it

GPT-4oGeminiClaude

Related terms