Named Entity Recognition (NER)
NER is also known as entity identification or entity extraction. It is a process of identifying predefined entities present in a text such as person name, organisation, location, etc. It is a statistical model which is trained on a labelled data set and then used for extracting information from a given set of data.
Sometimes we want to extract the information based on our domain or industry. For example : in medical domain, we want to extract disease or symptom or medication etc, in that case we need to create our own custom NER.
It is an open source software library for advanced Natural Language Programming (NLP).
The Spacy NER environment uses a word embedding strategy using a sub-word features and Bloom embed and 1D Convolutional Neural Network (CNN).
- Bloom Embedding : It is similar to word embedding and more space optimised representation.It gives each word a unique representation for each distinct context it is in.
- 1D CNN : It is applied over the input text to classify a sentence/ word into a set of predetermined categories
How Spacy works
- It tokenises the text, i.e. broken-up input sentence into words or word embedding
- Words are then broken-up into features and then aggregated to a representative number
- This number is then fed to fully connected neural structure, which makes a classification based on the weight assigned to each features within the text.
How to train Spacy
- Training data : Annotated data contain both text and their labels