How to Visualize a Social Network in Python with a Graph Database: Flask + Docker + D3.js

How to Visualize a Social Network in Python with a Graph Database: Flask + Docker + D3.js

·

15 min read

When you think about a web application, a graph database doesn’t usually spring to mind. Instead, most people just take the familiar route of using an SQL database to store information. While this is perfectly acceptable for most use cases there are sometimes those that would see tremendous benefits by using a graph database. In this tutorial, I will show you how to make a basic web application using Flask that stores all of its information in a graph database. To be more precise we are using Memgraph DB, an in-memory database that can easily handle a lot of information and perform read/write instructions quite quickly.

Our use case is a Social Network Graph (in the code referred to as SNG for convenience) representing users and the connections between them. Usually, such a graph would contain millions of relationships and the algorithms that are performed on them don’t do well with data being stored in relational databases.

In this tutorial, I will show you step by step how to build a simple Python web application from the bottom up so you get a basic understanding of the technologies that are used. You can also find all of the code here if you don't want to work on it as you go through the tutorial. If at any point in this tutorial you have a question or something is not working for you, feel free to post on StackOverflow with the tag memgraphdb.

Prerequisites

Because we are building a complete web application there is a number of tools that you will need to install before we begin:

  • Poetry: a tool for dependency management and packaging in Python. It allows you to declare the libraries your project depends on and it will manage (install/update) them for you.
  • Flask: a very powerful web framework that provides you with tools, libraries and technologies used in web development. A Flask application can be as small as a single web page or as complex as a management interface.
  • Docker and Compose: an open platform for developing, shipping, and running applications. It enables us to separate our application from our infrastructure (host machine). If you are installing Docker on Windows, Compose will be already included. For Linux and macOS visit this site.
  • Memgraph DB: a native fully distributed in-memory graph database built to handle real-time use-cases at enterprise scale. Follow the Docker Installation instructions on the Quick Start page. While it's completely optional, I encourage you to also install Memgraph Lab so you can execute openCypher queries on the database directly and see visualized results.


Creating the Project Structure and Handling Dependencies

Sometimes standard packaging systems and dependency management in Python can be confusing for beginners so we decided to use Poetry.
To start building our project structure choose a working directory and run:

poetry new sng-demo

Now you should have a directory with the following content:

sng-demo
├── pyproject.toml
├── README.rst
├── sng_demo
│  └── __init__.py
└── tests
   ├── __init__.py
   └── test_poetry_demo.py

In this tutorial, we won’t use the testing functionalities so go on ahead and delete the directory tests as well as the file README.rst.

Now we need to add the dependencies for our project. Given that we are going to run the app inside a Docker container we don't need the dependencies installed locally, only inside the container. Copy the files project.toml and poetry.lock and place them in the root directory of the project. The only other thing we need to do about dependency management is to tell Docker how to run Poetry on startup so it can install/update all the necessary dependencies inside the container.

Dockerizing an Application

In the root directory of the project create two files, Dockerfile and docker-compose.yml. At the beginning of the Dockerfile, we specify the Python version and instruct the container to install CMake, poetry, mgclient and pymgclient. Poetry is necessary to manage our dependencies inside the container while CMake and mgclient are required for pymgclient, the Python driver for Memgraph DB.
You don’t have to focus too much on this part just copy the code to your Dockerfile:

FROM python:3.7

#Install CMake
RUN apt-get update && \
  apt-get --yes install cmake

#Install poetry
RUN pip install -U pip \
  && curl -sSL https://raw.githubusercontent.com/python-poetry/poetry/master/get-poetry.py | python
ENV PATH="${PATH}:/root/.poetry/bin"

#Install mgclient
RUN apt-get install -y git cmake make gcc g++ libssl-dev && \
  git clone https://github.com/memgraph/mgclient.git /mgclient && \
  cd mgclient && \
  git checkout 5ae69ea4774e9b525a2be0c9fc25fb83490f13bb && \
  mkdir build && \
  cd build && \
  cmake .. && \
  make && \
  make install

#Install pymgclient
RUN git clone https://github.com/memgraph/pymgclient /pymgclient && \
  cd pymgclient && \
  python3 setup.py build && \
  python3 setup.py install

Next, we define the working directory with:

WORKDIR /app
COPY poetry.lock pyproject.toml /app/

The second command will enable us to cache the project requirements and only reinstall them when pyproject.toml or poetry.lock are changed.

RUN poetry config virtualenvs.create false && \
  poetry install --no-interaction --no-ansi

We don’t need to create a virtual environment because our application is already isolated by being in a Docker container. To disable it virtualenvs.create needs to be set to false.
The second line in the command ensures that Poetry asks us no interactive questions while installing/updating dependencies and it makes the output more log friendly.

COPY . /app
EXPOSE 5000
ENTRYPOINT [ "poetry", "run" ]

This is where we essentially create all the directories and files inside of our container. The EXPOSE command informs Docker that the container listens on the specified network port at runtime.

Next, we need to create a docker-compose.yml file. Compose is a tool for defining and running multi-container Docker applications. With Compose, you use a YAML file to configure your application’s services. Then, with a single command, you create and start all the services from your configuration. For our project, we need two services. One is the web application (sng_demo) and the other a database instance (memgraph).

If you followed the instructions on how to setup Memgraph DB with Docker correctly you only need to add the following code to your docker-compose.yml file to run the container:

version: '3'
services:
  memgraph:
    image: "memgraph"
    ports:
      - "7687:7687"
  sng_demo:
    build: .
    volumes:
      - .:/app
    ports:
      - "5000:5000"
    environment:
      MG_HOST: memgraph
      MG_PORT: 7687
    depends_on:
      - memgraph

When it comes to the ports key, there is an important distinction between the HOST_PORT and the CONTAINER_PORT. The first number in the key is the HOST_PORT and it can be used to connect from your host machine to the service (for example with Memgraph Lab). The second number specifies the CONTAINER_PORT which is used for service-to-service communication. More precisely, our service sng_db can use this port to access the service memgraph and connect to the database.

The environment key contains MG_HOST and MG_PORT which represent environment variables in the service’s container. They store the memgraph service address and port which are needed to establish a database connection. The depends_on key is used to start services in dependency order because we need the database to start before the web application.

The build key allows us to tell Compose where to find the build instructions as well as the files and/or folders used during the build process. By using the volumes key, we bypass the need to constantly restart our image to load new changes to it from the host machine.

Finally, we have a dockerized project that utilizes Poetry! This approach is great for development because it enables us to run our project on completely different operating systems and environments without having to worry about compatibility issues.

Web Development with Flask

Flask is very simple to use so why not create a Hello World! page to try out our Docker+Poetry setup.
In the project root directory create a file called app.py with the following code:

from flask import Flask

app = Flask(__name__)


@app.route('/')
@app.route('/index')
def index():
   return "Hello World"

First, we imported the Flask class and then created an instance of it. The route() decorator tells Flask what URL should trigger our function. Now, we need to tell Docker how to run our app. This can be done by creating a simple script in the project root directory. Let’s call it start.sh:

#!/bin/bash
export FLASK_APP=app.py
export FLASK_ENV=development
flask run --host 0.0.0.0

Setting FLASK_ENV to development will enable the debug mode. This makes Flask use an interactive debugger and reloader.
Setting FLASK_APP to app.py specifies how to start the application.
We need to tell Docker when and how to run this script so put the following code in your Dockerfile after the line EXPOSE 5000 :

ADD start.sh /
RUN chmod +x /start.sh

The command chmod +x makes the script executable by setting the right permission.
To execute the script, add the following command after the line ENTRYPOINT [ "poetry", "run" ]:

CMD ["/start.sh"]

That’s it! Our first web page is ready so let’s start our app to make sure we don’t have any errors.
In the project root directory execute:

docker-compose build

The first build will take some time because Docker has to download and install a lot of dependencies.
After it finishes run:

docker-compose up

The URL of our web application is localhost:5000. When you open it there should be a message Hello World! which means that the app is up and running.

Now it’s time to create a more complex web page that will contain our Social Network Graph. In the project root directory create a folder called templates and in it a file with the name base.html. This will be our base HTML template for other pages. Copy the code:

<!doctype html>
<html lang="en">

<head>
    <meta charset="utf-8">
    <meta name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no">

    <link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/css/bootstrap.min.css">
    <link rel="stylesheet" href="/static/css/style.css">

    <script src="https://code.jquery.com/jquery-3.3.1.slim.min.js"></script>
    <script src="https://stackpath.bootstrapcdn.com/bootstrap/4.3.1/js/bootstrap.min.js"></script>
    <script src="https://d3js.org/d3.v4.min.js" charset="utf-8"></script>

    <title>Social Network Graph Demo</title>
</head>

<body>
    {% block content %} {% endblock %}
</body>

</html>

We also need to create an HTML file for our actual landing site that utilizes this base file and an accompanying JavaScript file. Create the HTML file in the same location with the name index.html and copy the following code into it:

{% extends 'base.html' %} {% block content %}
<div class="container">
  Hello World!
</div>
<script src="/static/js/index.js" charset="utf-8"></script>
{% endblock %}

In the project root directory create a folder called static with one subfolder called js and another called css. The js folder will contain all of the needed local JavaScript files while the css folder will contain all the CSS stylesheets. In the js folder create a file called index.js and in the css folder one called style.css. Just leave them empty for now.

If you want to find out more about web development with Flask I suggest you try out this tutorial.
Your current project structure should like this:

sng-demo
├── sng_demo
│  └── __init__.py
├── templates
│  ├── base.html
│  └── index.html
├── static
│  ├── css
│  │  └── style.css
│  └── js
│     └── index.js
├── app.py
├── docker-compose.yml
├── Dockerfile
├── poetry.lock
├── pyproject.toml
└── start.sh


The Data Model and Database Connection

In the app directory sng-demo create a folder called database. This folder will contain all of the modules that we need to communicate with the database. You can find them here and just copy their contents. They are closely related to the database driver and if you wish to examine them a bit more I suggest you look up the driver documentation here. In the app directory sng-demo create the module db_operations.py. This is where all the custom database related commands will be located.
The sng_demo directory should look like this:

sng_demo
├── __init__.py
├── db_operations.py
└── database
   ├── __init__.py
   ├── memgraph.py
   ├── connection.py
   └── models.py

We will use a very simple data model that can be easily upgraded later on.
There is only one node with the label User and each User has two properties, a numerical id and a string name. Nodes are connected with edges of the type FRIENDS:

There are several methods to populate our database (more on that here) but we will be doing it manually by executing openCypher queries so you can get a better understanding of how to communicate with the database. You will find all the necessary queries to populate the database in the files data_big.txt and data_small.txt. The former just has a larger dataset than the latter.
In the project root directory create a folder called resources and place the files in it. Now you can add an import method to your web application.
In the module db_operations.py add the following import and method:

import json

def clear(db):
  command = "MATCH (node) DETACH DELETE node"
  db.execute_query(command)

def populate_database(db, path):
  file = open(path)
  lines = file.readlines()
  file.close()
  for line in lines:
      if len(line.strip()) != 0 and line[0] != '/':
          db.execute_query(line)

The method clear() deletes any data that might have been left in the database before populating it.
The method populate_database() reads all of the openCypher queries in the specified file and executes them.
In the module app.py change the imports and method index() to:

from flask import Flask, render_template, request, jsonify, make_response
from sng_demo.database import Memgraph
from sng_demo import db_operations

app = Flask(__name__)


@app.route('/')
@app.route('/index')
def index():
    db = Memgraph()
    db_operations.clear(db)
    db_operations.populate_database(db, "resources/data_small.txt")
    return render_template('index.html')

Now every time we refresh our index page the database is cleared and repopulated with new data. While this is not suitable for the production stage, it is highly useful during development because it will enable us to make changes in the data without having to restart the whole application or working directly on the database.
If you want to examine the graph before proceeding, I suggest you open Memgraph Lab and run the query MATCH (n1)-[e:FRIENDS]-(n2) RETURN n1,n2,e;.
The result should be:



We also need a method in our app to fetch all the relevant data from the database when a client requests it.
Let’s call it get_graph() and place it in the db_operations.py module:

def get_graph(db):
   command = "MATCH (n1)-[e:FRIENDS]-(n2) RETURN n1,n2,e;"
   relationships = db.execute_and_fetch(command)

   link_objects = []
   node_objects = []
   added_nodes = []
   for relationship in relationships:
       e = relationship['e']
       data = {"source": e.nodes[0], "target": e.nodes[1]}
       link_objects.append(data)

       n1 = relationship['n1']
       if not (n1.id in added_nodes):
           data = {"id": n1.id, "name": n1.properties['name']}
           node_objects.append(data)
           added_nodes.append(n1.id)

       n2 = relationship['n2']
       if not (n2.id in added_nodes):
           data = {"id": n2.id, "name": n2.properties['name']}
           node_objects.append(data)
           added_nodes.append(n2.id)
   data = {"links": link_objects, "nodes": node_objects}

   return json.dumps(data)

First, we need to execute the openCypher query MATCH (n1)-[e:FRIENDS]-(n2) RETURN n1,n2,e; and return its results from the database. These results will contain all the edges in the graph as well as all the nodes that are connected to those edges. Nodes that don't have connections will not be returned and that's ok for now.

The results (the object relationships) are in the form of a generator which we can iterate over and access its contents by using the node/edge names specified in our initial query (n1,n2 and e).
We also need to check if a node has already been appended to the node_objects list because multiple edges can contain (point to or from) the same node. All of the objects are stored in key-value pairs suitable for later JSON conversion.
The final result is a JSON object containing:

  • links: all the relationships that are in the graph as pairs of source and target id properties,
  • nodes: all the nodes from the graph that form relationships with other nodes.

In your app.py module add the following method:

@app.route("/get-graph", methods=["POST"])
def get_graph():
   db = Memgraph()
   response = make_response(
       jsonify(db_operations.get_graph(db)), 200)
   return response

This method is responsible for responding to POST requests from the client. It returns the graph data that we fetched from the server in the previous method.

Now let's do something with this data! Copy the contents for your index.js file from here and the style.css file from here.
We also need to add the actual SVG graphic to our page so change the index.html file to:

{% extends 'base.html' %} {% block content %}
<div class="container">
    <svg class="border rounded mt-3" width="960" height="600" style="background-color:white"></svg>
</div>
<script src="/static/js/index.js" charset="utf-8"></script>
{% endblock %}

I won't go into much detail about how to use D3.js so if you want to find out more I encourage you to visit their website.

In short, we fetch all the nodes and edges from the database and add them to an SVG element. The visual representation of the graph is made by simulating how physical forces act on particles (charge and gravity). You can drag and drop the nodes, hover over them to see the value of their name property, zoom in and out of the graph and move the SVG graphic.


Additional Functionalities

Go ahead and copy the file query.js to the directory static/js and query.html to the directory templates. You can find the updated base.html file here. Copy the necessary methods from the db_operations.py module and app.py module.
After you made the changes, just open localhost:5000/query and see the results.
This page will make your life easier if you want to debug the data being fetched from the server. It returns all the nodes or edges and shows them in a JSON highlighted format.


Your current project structure should like this:

sng-demo
├── resources
│  ├── data_big.py
│  └── data_small.txt
├── sng_demo
│  ├── __init__.py
│  ├── db_operations.py
│  └── database
│     ├── __init__.py
│     ├── memgraph.py
│     ├── connection.py
│     └── models.py
├── templates
│  ├── base.html
│  ├── index.html
│  └── query.html
├── static
│   ├── css
│   │  └── style.css
│   └── js
│      ├── index.js
│      └── query.js
├── app.py
├── docker-compose.yml
├── Dockerfile
├── poetry.lock
├── pyproject.toml
└── start.sh


Conclusion

Even though graph databases have been around for a long time, they are still not considered a mainstream tool in software development. Relational database-management systems model data as a set of predetermined structures. Complex joins and self-joins are necessary when the dataset becomes too inter-related. Modern datasets require technically complex queries which are often very inefficient in real-time scenarios.

Graph databases offer powerful data modelling and analysis capabilities for many real-world problems such as social networks, business relationships, dependencies, shipping, logistics... and they have been adopted by many of the worlds leading tech companies. With this tutorial, I hope to shed some light on how easy it is to integrate a graph database in your development process and I encourage you to try it out yourself.

As I said at the beginning, feel free to ask us any questions about this tutorial or Memgraph in general on StackOverflow with the tag memgraphdb or on our Discord server. Good luck with your coding!