Common Model Types: Text, Image, Audio, Video, Multimodal

Different Gen AI models are optimized for different input/output formats.

Model Type	Typical Input	Typical Output	Use Case
Text (LLM)	Text prompt	Text/code	Chatbot, summarizer, coding helper
Image generation	Text prompt	Image	Creative design, marketing visuals
Speech/audio	Audio/text	Transcript/speech	Voice bot, call analytics
Video generation	Text/image	Video clip	Education and promotional videos
Multimodal	Text + image + audio	Mixed output	Image QA, UI screenshot analysis

Choosing the right model

Match model type to user task first.
Then compare quality, speed, and cost.
For enterprise apps, prioritize controllability and security.

Learning paradigms (related ML foundation)

Paradigm	Data labels	Typical tasks
Supervised learning	Has labels	Classification, regression
Unsupervised learning	No labels	Clustering, dimensionality reduction

flowchart LR A[Raw data] --> B{Labels available?} B -- Yes --> C[Supervised model] B -- No --> D[Unsupervised methods] C --> E[Predict class/value] D --> F[Find hidden structure]

Data types you see in ML

Categorical: color, city, product type
Ordinal: low, medium, high
Continuous: salary, temperature, latency