Introduction
The objective of this article is to explain how we can quickly qualify an error in an advanced Named Entity Recognition (NER) pipeline.
In this article, the NER pipeline combines:
- A NER model that extracts basic named entities (People, Location, Organization)
- A component that automatically extracts and links entities to Wikidata knowledge base
- A technical component that reconcile all the information while giving priority to the NER model for performance reason
NER model
We use here a custom-built NER model trained on a high-quality dataset.
The labels are : Person, Location, Organization

Entity Linking
We perform entity extraction & linking with an open-source component (entity-fishing) that provides a set of services (term look up, text disambiguation…) leveraging Wikidata, Wikipedia and AI models.
Kairntech runs a SaaS instance of entity-fishing that is updated every month: https://sherpa-entityfishing.kairntech.com/

NER pipeline
We build a pipeline that combines:
- NER model to detect entities (Person, Place, Organization)
- entity-fishing to extract & link entity to Wikidata entry (QID).
- A reconciliation script prioritizes the NER model over entity-fishing when it demonstrates superior text labeling performance.

For Entity Linking, we utilize rule-based mappings to label terms.
An extracted term by entity-fishing has/is:
- GeoNames ID & GPS Coord. property => the term is labeled as Location
- Instance of Human => the term is labeled as Person
- Instance of (List of 30 items) => the term is labeled as Organization
For instance “Eindhoven” is extracted by entity-fishing and since this term has a GeoName ID property the Location label will be automatically assigned.


Possible errors & limitations
This pipeline can present errors on entity extraction & linking when analyzing text.
Below is a list of 6 possible error categories:
- Entity has not been extracted by the NER model
- Entity has not been extracted by entity-fishing
- Entity has been extracted by entity-fishing but it was not classified into any of the labels (Person, Location, Organization)
- The labels identified by the NER model and entity-fishing do not correspond
- The entity span (text segment) differs between entity-fishing and the NER model
- The Wikidata link for the extracted entity is wrong
The following sections explain how to identify the error, give its root cause, and implement a solution.
Error #1: Entity has not been extracted by the NER model
- How to identify?
- apply the NER model on the selected text
- Root cause:
- lack of accuracy of NER model
- Solution:
- enrich dataset & retrain the NER model (simple)
Error #2: Entity has not been extracted by entity-fishing
- How to identify?
- Apply entity-fishing model on the selected text
- Root cause:
- stopwords like “US”
- presence or absence of hyphens in compound nouns
- unknown entity from Wikidata in the given language
- entity with no Wikipedia page
- Solution:
- add the entity (possibly with QID) in a lexicon (simple)
- enrich Wikidata yourself and wait for monthly update!
Error #3: Entity has been extracted by entity-fishing but it was not classified into any of the labels (Person, Location, Organization)
- How to identify?
- Apply entity-fishing model on the selected text and check the label
- Root cause:
- incomplete mappings rules
- Solution:
- improve mapping rules (simple)
Error #4: The labels identified by the NER model and entity-fishing do not correspond
- How to identify?
- Apply NER model and entity-fishing model on the selected text separately and check the label
- Root cause:
- NER model accuracy
- entity-fishing mapping rules
- Solution:
- improve NER model (simple)
- improve entity-fishing mapping rules (simple)
Error #5: The entity span (text segment) differs between entity-fishing and the NER model
- How to identify?
- Apply NER model and entity-fishing model on the selected text separately and check the span
- Root cause:
- NER model accuracy
- entity-fishing longest match issue like “president-elec Donald Trump” for “Donald Trump”
- Solution:
- improve NER model (simple),
- improve entity-fishing (difficult)
- add the entity (possibly with QID) in a lexicon (simple)

Error #6: The Wikidata link for the extracted entity is wrong
- How to identify?
- Check the QID of the extracted term
- Root cause:
- homonym
- error of disambiguation model
- Solution:
- refine disambiguation models… (difficult)
See also…