How to build deep neural network for custom NER with Keras
--
Introduction
In this post, we will learn how we can create a simple neural network to extract information ( NER) from unstructured text data with Keras.
Named Entity Recognition (NER)
NER is also known as entity identification or entity extraction. It is a process of identifying predefined entities present in a text such as person name, organisation, location, etc. It is a statistical model which is trained on a labelled data set and then used for extracting information from a given set of data.
Sometimes we want to extract the information based on our domain or industry. For example : in medical domain, we want to extract disease or symptom or medication etc, in that case we need to create our own custom NER.
Model Architecture
Here we will use BILSTM + CRF layers. The LSTM layer is used to filter the unwanted information and will keep only the important features/information and the CRF layer is used to deal with the sequential data.
BI-LSTM Layer
BI-LSTM is used to produce vector representation for our words. It takes each word in a sentence as an input and produce a vector representation of each word in both directions (i.e; forward and backward) where forward direction access past information and backward direction access future. It is then combined with the CRF layer
CRF Layer
CRF layer is an optimisation on top of BI-LSTM layer. It can be used to efficiently predict the current tag based on the past attributed tags. Here is a great post on why CRF layer is useful on top of BI-LSTM
Data Preprocessing
Data Format
For this example I have used this Kaggle dataset. For our model, we need a data frame that contain ‘Sentence_id’/ ‘Sentence’ column, ‘word’ column and the ‘tag’ column.