Exploring a Twitter Network With Memgraph in a Jupyter Notebook

Through this short tutorial, you will learn how to install Memgraph, connect to it from a Jupyter Notebook and perform data analysis using graph algorithms. You can find the original Jupyter Notebook in our open-source GitHub repository.

If at any point you experience problems with this tutorial or something is unclear to you, reach out on our Discord server. The dataset from this tutorial is also available in the form of a Playground sandbox which you can query from your browser.

1. Prerequisites

For this tutorial, you will need to install:

Jupyter: Jupyter is necessary to run the notebook available here.
Docker: Docker is used because Memgraph is a native Linux application and cannot be installed on Windows and macOS.
GQLAlchemy: A Python OGM (Object Graph Mapper) that connects to Memgraph.
Pandas: A popular data science library.

2. Installation using Docker

After you install Docker, you can set up Memgraph by running:

docker run -it -p 7687:7687 -p 3000:3000 memgraph/memgraph-platform

This command will start the download and after it finishes, run the Memgraph container.

3. Connecting to Memgraph with GQLAlchemy

We will be using the GQLAlchemy object graph mapper (OGM) to connect to Memgraph and execute Cypher queries easily. GQLAlchemy also serves as a Python driver/client for Memgraph. You can install it using:

pip install gqlalchemy

Hint: You may need to install CMake before installing GQLAlchemy.

Maybe you got confused when I mentioned Cypher. You can think of Cypher as SQL for graph databases. It contains many of the same language constructs like CREATE, UPDATE, DELETE... and it's used to query the database.

from gqlalchemy import Memgraph

memgraph = Memgraph("127.0.0.1", 7687)

Let's make sure that Memgraph is empty before we start with anything else.

memgraph.drop_database()

Now let's see if the database is empty:

results = memgraph.execute_and_fetch(
    """
    MATCH (n) RETURN count(n) AS number_of_nodes ;
    """
)
print(next(results))

Output:

{'number_of_nodes': 0}

4. Define a graph schema

We are going to create Python classes that will represent our graph schema. This way, all the objects that are returned from Memgraph will be of the correct type if the class definition can be found.

from typing import Optional
from gqlalchemy import Field, Node, Relationship


class User(Node):
    username: str = Field(index=True, unique=True, db=memgraph)


class Retweeted(Relationship, type="RETWEETED"):
    pass

5. Creating and returning nodes

We are going to create User nodes, save them to the database and return them to our program:

user1 = User(username="ivan_g_despot")
user2 = User(username="supe_katarina")

user1.save(memgraph)
user2.save(memgraph)

print(user1)
print(user2)

Output:

Now, let's try to create a node using the Cypher query language. We are going to create a node with an existing username just to check if the existence constraint on the property username is set correctly.

try:
    results = memgraph.execute(
        """
        CREATE (:User {username: "supe_katarina"});
        """
    )
except Exception:
    print("Error: A user with the username supe_katarina is already in the database.")

Output:

Error: A user with the username supe_katarina is already in the database.

6. Creating and returning relationships

We are going to create a Retweeted relationship, save it to the database and return it to our program:

retweeted = Retweeted(_start_node_id=user1._id, _end_node_id=user2._id)

retweeted.save(memgraph)

print(retweeted)

Output:

7. Importing data from CSV files

You will need to download this file which contains a simple dataset of scraped tweets. To import it into Memgraph, we will first need to copy it to the Docker container where Memgraph is running. Find the CONTAINER_ID by running:

docker ps

Copy the file with the following command (don't forget to replace CONTAINER_ID):

docker cp scraped_tweets.csv CONTAINER_ID:scraped_tweets.csv

We are going to see what our CSV file looks like with the help of the pandas library. To install it, run:

pip install pandas

Now let's see what the CSV file looks like:

import pandas as pd

data = pd.read_csv("scraped_tweets.csv")
data.head()

	source_username	target_username
0	CapeCodGiftShop	RetroCEO
1	CodeAttBot	LeeHillerLondon
2	BattlegroundHs	getwhalinvest
3	botpokemongofr1	TrevorAllenPKMN
4	AnyaSha13331181	WORLDMUSICAWARD

Now, we can execute the Cypher command LOAD CSV, which is used for loading data from CSV files:

memgraph.execute(
    """
    LOAD CSV FROM "/scraped_tweets.csv" WITH HEADER AS row
    MERGE (u1:User {username: row.source_username})
    MERGE (u2:User {username: row.target_username})
    MERGE (u1)-[:RETWEETED]->(u2);
    """
)

You can think of the LOAD CSV clause as a loop that will go over every row in the CSV file and execute the specified Cypher commands.

8. Querying the database and retrieving results

Let's make sure that our data was imported correctly by retrieving it:

results = memgraph.execute_and_fetch(
    """
    MATCH (u:User)
    RETURN u
    ORDER BY u.username DESC
    LIMIT 10;
    """
)

results = list(results)

for result in results:
    print(result["u"])

Output:

We can also check the type of the retrieved records:

u = results[0]["u"]

print("User: ", u.username)
print("Type: ", type(u))

Output:

User: zziru67 Type:

Let's try to execute the same query with the GQLAlchemy query builder:

from gqlalchemy import match

results_from_qb = (
    match()
    .node(labels="User", variable="u")
    .return_()
    .order_by("u.username DESC")
    .limit(10)
    .execute()
)
results_from_qb = list(results_from_qb)

for result in results_from_qb:
    print(result["u"])

Output:

9. Calculating PageRank

Now, let's do something clever with our graph. For example, calculating PageRank for each node and then adding a rank property that stores the PageRank value to each node:

results = memgraph.execute_and_fetch(
    """
    CALL pagerank.get()
    YIELD node, rank
    SET node.rank = rank
    RETURN node, rank
    ORDER BY rank DESC
    LIMIT 10;
    """
)

for result in results:
    print("The PageRank of node ", result["node"].username, ": ", result["rank"])

Output:

The PageRank of node WORLDMUSICAWARD : 0.13278838151391434 The PageRank of node Kidzcoolit : 0.018924764871246207 The PageRank of node HuobiGlobal : 0.011314994833838172 The PageRank of node ChloeLe39602964 : 0.010011755296388128 The PageRank of node getwhalinvest : 0.007228675936490175 The PageRank of node Cooper_Lechat : 0.005577971882231625 The PageRank of node Phemex_official : 0.005413803151353543 The PageRank of node HamleysOfficial : 0.005325936307836382 The PageRank of node bmstores : 0.00524546649693655 The PageRank of node TheStourbridge : 0.004422198431576731

Visit the Memgraph MAGE graph library (and throw us a star ⭐) and take a look at all of the graph algorithms that have been implemented. You can also implement and submit your own algorithms and utility procedures.

10. Visualizing the graph in Memgraph Lab

Open Memgraph Lab in your browser on the address localhost:3000. Execute the following Cypher query:

MATCH (n)-[r]-(m)
RETURN n, r, m
LIMIT 100;

Now apply the following graph style to make your graph look more descriptive:

@NodeStyle {
  size: Sqrt(Mul(Div(Property(node, "rank"), 1), 200000))
  border-width: 1
  border-color: #000000
  shadow-color: #1D9BF0
  shadow-size: 10
  image-url: "https://i.imgur.com/UV7Nl0i.png"
}

@NodeStyle Greater(Size(Labels(node)), 0) {
  label: Format(":{}", Join(Labels(node), " :"))
}

@NodeStyle HasLabel(node, "User") {
  color: #1D9BF0
  color-hover: Darker(#dd2222)
  color-selected: #dd2222
}

@NodeStyle HasProperty(node, "username") {
  label: AsText(Property(node, "username"))
}

@EdgeStyle {
  width: 1
}

Image 1. The radius of nodes is proportional to their PageRank value

What's next?

Now it's time for you to use Memgraph on a graph problem!

You can always check out Memgraph Playground for some cool use cases and examples. If you have any questions, or want to share your work with the rest of the community, join our Discord Server.

Exploring a Twitter Network With Memgraph in a Jupyter Notebook

Table of contents

1. Prerequisites

2. Installation using Docker

3. Connecting to Memgraph with GQLAlchemy

4. Define a graph schema

5. Creating and returning nodes

6. Creating and returning relationships

7. Importing data from CSV files

8. Querying the database and retrieving results

9. Calculating PageRank

10. Visualizing the graph in Memgraph Lab

What's next?