How to create custom NER model in Spacy

Nikita sharma
3 min readNov 30, 2019

Named Entity Recognition (NER)

NER is also known as entity identification or entity extraction. It is a process of identifying predefined entities present in a text such as person name, organisation, location, etc. It is a statistical model which is trained on a labelled data set and then used for extracting information from a given set of data.

Sometimes we want to extract the information based on our domain or industry. For example : in medical domain, we want to extract disease or symptom or medication etc, in that case we need to create our own custom NER.

Spacy

It is an open source software library for advanced Natural Language Programming (NLP).

The Spacy NER environment uses a word embedding strategy using a sub-word features and Bloom embed and 1D Convolutional Neural Network (CNN).

  • Bloom Embedding : It is similar to word embedding and more space optimised representation.It gives each word a unique representation for each distinct context it is in.
  • 1D CNN : It is applied over the input text to classify a sentence/ word into a set of predetermined categories

How Spacy works

  1. It tokenises the text, i.e. broken-up input sentence into words or word embedding
  2. Words are then broken-up into features and then aggregated to a representative number
  3. This number is then fed to fully connected neural structure, which makes a classification based on the weight assigned to each features within the text.

How to train Spacy

  • Training data : Annotated data contain both text and their labels
Nikita sharma

Data Scientist | Python programmer