Common Model Types: Text, Image, Audio, Video, Multimodal
Different Gen AI models are optimized for different input/output formats.
Model Type
Typical Input
Typical Output
Use Case
Text (LLM)
Text prompt
Text/code
Chatbot, summarizer, coding helper
Image generation
Text prompt
Image
Creative design, marketing visuals
Speech/audio
Audio/text
Transcript/speech
Voice bot, call analytics
Video generation
Text/image
Video clip
Education and promotional videos
Multimodal
Text + image + audio
Mixed output
Image QA, UI screenshot analysis
Choosing the right model
Match model type to user task first.
Then compare quality, speed, and cost.
For enterprise apps, prioritize controllability and security.
Learning paradigms (related ML foundation)
Paradigm
Data labels
Typical tasks
Supervised learning
Has labels
Classification, regression
Unsupervised learning
No labels
Clustering, dimensionality reduction
flowchart LR
A[Raw data] --> B{Labels available?}
B -- Yes --> C[Supervised model]
B -- No --> D[Unsupervised methods]
C --> E[Predict class/value]
D --> F[Find hidden structure]