Table of contents
- 1. Prerequisites
- 2. Installation using Docker
- 3. Connecting to Memgraph with GQLAlchemy
- 4. Define a graph schema
- 5. Creating and returning nodes
- 6. Creating and returning relationships
- 7. Importing data from CSV files
- 8. Querying the database and retrieving results
- 9. Calculating PageRank
- 10. Visualizing the graph in Memgraph Lab
- What's next?
Through this short tutorial, you will learn how to install Memgraph, connect to it from a Jupyter Notebook and perform data analysis using graph algorithms. You can find the original Jupyter Notebook in our open-source GitHub repository.
If at any point you experience problems with this tutorial or something is unclear to you, reach out on our Discord server. The dataset from this tutorial is also available in the form of a Playground sandbox which you can query from your browser.
1. Prerequisites
For this tutorial, you will need to install:
- Jupyter: Jupyter is necessary to run the notebook available here.
- Docker: Docker is used because Memgraph is a native Linux application and cannot be installed on Windows and macOS.
- GQLAlchemy: A Python OGM (Object Graph Mapper) that connects to Memgraph.
- Pandas: A popular data science library.
2. Installation using Docker
After you install Docker, you can set up Memgraph by running:
docker run -it -p 7687:7687 -p 3000:3000 memgraph/memgraph-platform
This command will start the download and after it finishes, run the Memgraph container.
3. Connecting to Memgraph with GQLAlchemy
We will be using the GQLAlchemy object graph mapper (OGM) to connect to Memgraph and execute Cypher queries easily. GQLAlchemy also serves as a Python driver/client for Memgraph. You can install it using:
pip install gqlalchemy
Hint: You may need to install CMake before installing GQLAlchemy.
Maybe you got confused when I mentioned Cypher. You can think of Cypher as SQL
for graph databases. It contains many of the same language constructs like
CREATE
, UPDATE
, DELETE
... and it's used to query the database.
from gqlalchemy import Memgraph
memgraph = Memgraph("127.0.0.1", 7687)
Let's make sure that Memgraph is empty before we start with anything else.
memgraph.drop_database()
Now let's see if the database is empty:
results = memgraph.execute_and_fetch(
"""
MATCH (n) RETURN count(n) AS number_of_nodes ;
"""
)
print(next(results))
Output:
{'number_of_nodes': 0}
4. Define a graph schema
We are going to create Python classes that will represent our graph schema. This way, all the objects that are returned from Memgraph will be of the correct type if the class definition can be found.
from typing import Optional
from gqlalchemy import Field, Node, Relationship
class User(Node):
username: str = Field(index=True, unique=True, db=memgraph)
class Retweeted(Relationship, type="RETWEETED"):
pass
5. Creating and returning nodes
We are going to create User
nodes, save them to the database and return them
to our program:
user1 = User(username="ivan_g_despot")
user2 = User(username="supe_katarina")
user1.save(memgraph)
user2.save(memgraph)
print(user1)
print(user2)
Output:
Now, let's try to create a node using the Cypher query language. We are going to
create a node with an existing username just to check if the existence
constraint on the property username
is set correctly.
try:
results = memgraph.execute(
"""
CREATE (:User {username: "supe_katarina"});
"""
)
except Exception:
print("Error: A user with the username supe_katarina is already in the database.")
Output:
Error: A user with the username supe_katarina is already in the database.
6. Creating and returning relationships
We are going to create a Retweeted
relationship, save it to the database and
return it to our program:
retweeted = Retweeted(_start_node_id=user1._id, _end_node_id=user2._id)
retweeted.save(memgraph)
print(retweeted)
Output:
7. Importing data from CSV files
You will need to download this file which contains a simple dataset of
scraped tweets. To import it into Memgraph, we will first need to copy it to the
Docker container where Memgraph is running. Find the CONTAINER_ID
by running:
docker ps
Copy the file with the following command (don't forget to replace
CONTAINER_ID
):
docker cp scraped_tweets.csv CONTAINER_ID:scraped_tweets.csv
We are going to see what our CSV file looks like with the help of the pandas library. To install it, run:
pip install pandas
Now let's see what the CSV file looks like:
import pandas as pd
data = pd.read_csv("scraped_tweets.csv")
data.head()
source_username | target_username | |
---|---|---|
0 | CapeCodGiftShop | RetroCEO |
1 | CodeAttBot | LeeHillerLondon |
2 | BattlegroundHs | getwhalinvest |
3 | botpokemongofr1 | TrevorAllenPKMN |
4 | AnyaSha13331181 | WORLDMUSICAWARD |
Now, we can execute the Cypher command LOAD CSV
, which is used for loading
data from CSV files:
memgraph.execute(
"""
LOAD CSV FROM "/scraped_tweets.csv" WITH HEADER AS row
MERGE (u1:User {username: row.source_username})
MERGE (u2:User {username: row.target_username})
MERGE (u1)-[:RETWEETED]->(u2);
"""
)
You can think of the LOAD CSV
clause as a loop that will go over every row in
the CSV file and execute the specified Cypher commands.
8. Querying the database and retrieving results
Let's make sure that our data was imported correctly by retrieving it:
results = memgraph.execute_and_fetch(
"""
MATCH (u:User)
RETURN u
ORDER BY u.username DESC
LIMIT 10;
"""
)
results = list(results)
for result in results:
print(result["u"])
Output:
We can also check the type of the retrieved records:
u = results[0]["u"]
print("User: ", u.username)
print("Type: ", type(u))
Output:
User: zziru67 Type:
Let's try to execute the same query with the GQLAlchemy query builder:
from gqlalchemy import match
results_from_qb = (
match()
.node(labels="User", variable="u")
.return_()
.order_by("u.username DESC")
.limit(10)
.execute()
)
results_from_qb = list(results_from_qb)
for result in results_from_qb:
print(result["u"])
Output:
9. Calculating PageRank
Now, let's do something clever with our graph. For example, calculating PageRank
for each node and then adding a rank
property that stores the PageRank value
to each node:
results = memgraph.execute_and_fetch(
"""
CALL pagerank.get()
YIELD node, rank
SET node.rank = rank
RETURN node, rank
ORDER BY rank DESC
LIMIT 10;
"""
)
for result in results:
print("The PageRank of node ", result["node"].username, ": ", result["rank"])
Output:
The PageRank of node WORLDMUSICAWARD : 0.13278838151391434 The PageRank of node Kidzcoolit : 0.018924764871246207 The PageRank of node HuobiGlobal : 0.011314994833838172 The PageRank of node ChloeLe39602964 : 0.010011755296388128 The PageRank of node getwhalinvest : 0.007228675936490175 The PageRank of node Cooper_Lechat : 0.005577971882231625 The PageRank of node Phemex_official : 0.005413803151353543 The PageRank of node HamleysOfficial : 0.005325936307836382 The PageRank of node bmstores : 0.00524546649693655 The PageRank of node TheStourbridge : 0.004422198431576731
Visit the Memgraph MAGE graph library (and throw us a star ⭐) and take a look at all of the graph algorithms that have been implemented. You can also implement and submit your own algorithms and utility procedures.
10. Visualizing the graph in Memgraph Lab
Open Memgraph Lab in your browser on the address localhost:3000. Execute the following Cypher query:
MATCH (n)-[r]-(m)
RETURN n, r, m
LIMIT 100;
Now apply the following graph style to make your graph look more descriptive:
@NodeStyle {
size: Sqrt(Mul(Div(Property(node, "rank"), 1), 200000))
border-width: 1
border-color: #000000
shadow-color: #1D9BF0
shadow-size: 10
image-url: "https://i.imgur.com/UV7Nl0i.png"
}
@NodeStyle Greater(Size(Labels(node)), 0) {
label: Format(":{}", Join(Labels(node), " :"))
}
@NodeStyle HasLabel(node, "User") {
color: #1D9BF0
color-hover: Darker(#dd2222)
color-selected: #dd2222
}
@NodeStyle HasProperty(node, "username") {
label: AsText(Property(node, "username"))
}
@EdgeStyle {
width: 1
}
What's next?
Now it's time for you to use Memgraph on a graph problem!
You can always check out Memgraph Playground for some cool use cases and examples. If you have any questions, or want to share your work with the rest of the community, join our Discord Server.