Common Model Types: Text, Image, Audio, Video, Multimodal

Different Gen AI models are optimized for different input/output formats.

Model TypeTypical InputTypical OutputUse Case
Text (LLM)Text promptText/codeChatbot, summarizer, coding helper
Image generationText promptImageCreative design, marketing visuals
Speech/audioAudio/textTranscript/speechVoice bot, call analytics
Video generationText/imageVideo clipEducation and promotional videos
MultimodalText + image + audioMixed outputImage QA, UI screenshot analysis

Choosing the right model

Learning paradigms (related ML foundation)

ParadigmData labelsTypical tasks
Supervised learningHas labelsClassification, regression
Unsupervised learningNo labelsClustering, dimensionality reduction
flowchart LR A[Raw data] --> B{Labels available?} B -- Yes --> C[Supervised model] B -- No --> D[Unsupervised methods] C --> E[Predict class/value] D --> F[Find hidden structure]

Data types you see in ML