Zum Inhalt

Best NER Models in 2026

NER is still "find spans in text", but production work rarely stops there. You also need labels, normalization, schema checks, latency budgets, and review loops. The right model depends on how stable your entity schema is.

My default: use spaCy for stable labels and fast pipelines. Use GLiNER when labels change often. Train a Transformer token classifier when you have labeled spans. Use LLM structured extraction when the output is a complex record, not just a list of spans.

Decision table

Need Best starting point Why
Fast known-label NER spaCy Mature pipelines, good ergonomics, fast CPU deployment, rule integration.
New labels without full training GLiNER Label-conditioned extraction works well when the label set changes.
Highest quality for stable domain labels Fine-tuned Transformer token classifier Trained sequence labeling is still strong when you have data.
Complex schema extraction LLM with structured outputs Better for nested fields, sparse attributes, and cross-sentence reasoning.
Regulated production workflow Hybrid model plus rules and review Determinism, auditability, and confidence routing matter.

Tool classes

Class Strength Weakness Good fit
spaCy NER Fast, production-friendly, rule-aware Needs training or rules for custom labels Known labels in high-throughput systems.
GLiNER Flexible labels at inference time Quality depends on label wording and domain mismatch Rapid ontology iteration and long-tail labels.
Transformer token classifier Strong supervised accuracy Requires labeled spans and retraining Stable domain extraction with enough data.
LLM extraction Schema flexibility and reasoning Higher latency, cost, and nondeterminism Complex records, nested fields, low-volume workflows.
Rules and dictionaries Deterministic precision Brittle recall Compliance, IDs, product codes, and post-filters.

How to choose

Start with the entity schema, not the model.

If labels are stable and examples are available, train or fine-tune a token classifier. It is easier to evaluate, cheaper to run, and more predictable than an LLM extraction chain.

If labels change weekly, use GLiNER or an LLM extractor while the ontology stabilizes. The goal is to learn what the schema should be before spending annotation budget.

If the output is a structured record rather than spans, use an LLM with structured outputs or a hybrid pipeline. Many business extraction tasks are not pure NER. "Find the parties, obligations, effective date, termination clause, and governing law" is document understanding with entity fields.

Production pattern

A production NER system usually has three tiers:

  1. Candidate extraction: spaCy, GLiNER, Transformer model, LLM, rules, or a combination.
  2. Normalization: map spans to canonical IDs, product codes, users, companies, or ontology entries.
  3. Validation: reject impossible labels, enforce schema constraints, deduplicate spans, and route low-confidence cases to review.

The normalization layer is where many NER systems create product value. A span that says "Apple" is only a start. The system still has to decide whether it means the company, the fruit, a product family, or a ticker.

Evaluation checklist

Measure more than entity-level F1:

  • exact span F1
  • relaxed span F1
  • label confusion matrix
  • nested entity handling
  • entity normalization accuracy
  • false positives by label
  • confidence calibration
  • latency and cost per document
  • human correction rate

Deeper reading

References