Evaluation: Accuracy, Latency, Cost, Safety

A practical AI system is not only accurate. It must also be fast, affordable, and safe.

MetricQuestionExample Measurement
AccuracyIs output correct?Pass rate on golden dataset
LatencyIs it fast enough?P95 response time
CostIs it sustainable?Cost per request/session
SafetyDoes it follow policy?Violation rate

Evaluation strategy

Classification metrics (ML foundation)

When your task is classification, confusion-matrix-based metrics are essential.

MetricFormulaWhen important
Accuracy(TP + TN) / TotalBalanced datasets
PrecisionTP / (TP + FP)When false alarms are costly
RecallTP / (TP + FN)When missing positives is risky
F1 score2PR / (P + R)Imbalanced datasets
flowchart TD A[Model predictions] --> B[Confusion matrix] B --> C[Accuracy] B --> D[Precision] B --> E[Recall] D --> F[F1 score] E --> F

Overfitting quick check