Quick definition · 2 min AI term
Multimodal
Multimodal AI can handle more than one kind of input or output: text, images, sound, even video.
Think of it like
A person who can read, see, and listen, not just read. It works across senses, not only one.
Example
Show a multimodal model a photo and ask “what is wrong with this recipe?” and it reads the picture and replies in words.
Why it matters
Modern AI is increasingly multimodal. It is why you can now paste an image into a chat and ask about it.
Where you’ll see it
GPT-4oGeminiClaude
Related terms