A practical AI system is not only accurate. It must also be fast, affordable, and safe.
| Metric | Question | Example Measurement |
|---|---|---|
| Accuracy | Is output correct? | Pass rate on golden dataset |
| Latency | Is it fast enough? | P95 response time |
| Cost | Is it sustainable? | Cost per request/session |
| Safety | Does it follow policy? | Violation rate |
When your task is classification, confusion-matrix-based metrics are essential.
| Metric | Formula | When important |
|---|---|---|
| Accuracy | (TP + TN) / Total | Balanced datasets |
| Precision | TP / (TP + FP) | When false alarms are costly |
| Recall | TP / (TP + FN) | When missing positives is risky |
| F1 score | 2PR / (P + R) | Imbalanced datasets |