Knowledge Graph part-2 : Modelling tabular data as graph

How to build Knowledge graph from a tabular data

Nikita sharma


In our previous post, we discussed about getting started with knowledge graph where we only saw how to install neo4j in Docker.

In this post, we will discuss about modelling the data in a graphical way. Here we have used Stack Overflow 2018 Developer Survey from kaggle to explain how we can push the data to our graphical database.

Lets look at the data

Let’s have a look on the overview of data and know how and what columns we can use for our knowledge graph.

Let’s see few column names with description that we have in data :

Modelling Data for graphical representation

Nodes represent important entities/subjects/objects in our graphs. By having multiple types of node we can take advantage of the connected nature of the graph. For example : if our data set contains data about user and places with some metadata/attributes about both, we will create nodes for User and Place, and connect these by some Relation/edge (eg. User LIVES-IN Place) .

So for our dataset, we will try to do the same. We will identify important nodes for our dataset. In this project, we will only select few columns for our knowledge graph.

Identifying Nodes for our knowledge graph

Person Node

In this node, we will see all the attributes related to a person like :

  • User_id
  • code_as_hobby
  • contributes_to_open_source
  • is_student
  • employment_status
  • company_size
  • total_years_of_coding_experience