In our previous post, we discussed about getting started with knowledge graph where we only saw how to install neo4j in Docker.
In this post, we will discuss about modelling the data in a graphical way. Here we have used Stack Overflow 2018 Developer Survey from kaggle to explain how we can push the data to our graphical database.
Lets look at the data
Let’s have a look on the overview of data and know how and what columns we can use for our knowledge graph.
Let’s see few column names with description that we have in data :
Modelling Data for graphical representation
Nodes represent important entities/subjects/objects in our graphs. By having multiple types of node we can take advantage of the connected nature of the graph. For example : if our data set contains data about user and places with some metadata/attributes about both, we will create nodes for User and Place, and connect these by some Relation/edge (eg. User LIVES-IN Place) .
So for our dataset, we will try to do the same. We will identify important nodes for our dataset. In this project, we will only select few columns for our knowledge graph.
Identifying Nodes for our knowledge graph
In this node, we will see all the attributes related to a person like :