Using on Disk Storage With an In-Memory Graph Database
Since Memgraph is a graph database that stores data only in memory, the GQLAlchemy library provides an on-disk storage solution for large properties not used in graph algorithms.
from gqlalchemy import Memgraph, SQLitePropertyDatabase, Node, Field
from typing import Optional
graphdb = Memgraph()
SQLitePropertyDatabase('path-to-my-db.db', graphdb)
class User(Node):
id: int = Field(unique=True, exists=True, index=True, db=graphdb)
huge_string: Optional[str] = Field(on_disk=True)
my_secret = "I LOVE DUCKS" * 1000
john = User(id=5, huge_string=my_secret).save(db)
john2 = User(id=5).load(db)
print(john2.huge_string) # prints I LOVE DUCKS, a 1000 times
What’s happening here?
graphdb
creates a connection to an in-memory graph databaseSQLitePropertyDatabase
attaches to thegraphdb
in its constructorWhen creating a definition for a node with a label
User
two properties are definedUser.id
is a required property of typeint
that creates UNIQUENESS and EXISTS constraints and an index inside MemgraphUser.huge_string
is an optional User property that is saved to and loaded from an SQLite databasemy_secret
is an example of a huge string that would unnecessarily slow down a graph databaseUser().save()
saves the node with the labelUser
in a graph database and stores thehuge_string
in theSQLitePropertyDatabase
When loading the data, the inverse happens, the node is fetched from the graph database and the
huge_string
property from theSQLitePropertyDatabase
Saving large properties in an on-disk database
Many graphs used in graph databases have nodes with a lot of metadata that isn't used in graph computations. Graph databases aren't designed to perform effectively with large properties like strings or parquet files. The problem is usually solved by using a separate SQL database or a key-value store to connect large properties with the ID of the node. Although the solution is straightforward, it is cumbersome to implement and maintain. Not to mention, you have to do it for each project from scratch. We've identified the problem and decided to take action. With the release of GQLAlchemy 1.1, you can easily define which properties will be saved in a graph database, and which in an on-disk storage solution. You can do that once, in the model definition, and never worry again if properties are saved or loaded properly from the correct database.
How does it work?
GQLAlchemy is a python library that aims to be the go-to Object Graph Mapper (OGM) -- a link between graph database objects and python objects. It is built on top of Pydantic and provides object modeling, validation, serialization and deserialization out of the box. With GQLAlchemy, you can define python classes that map to graph objects like Nodes and Relationships in graph databases. Every such class has properties or fields that hold data about the graph objects. When you want a property to be saved on disk instead of an in-memory database, you specify that with the on_disk
argument.
from gqlalchemy import Node, Field, SQLiteOnDiskPropertyDatabase
from typing import Optional
class User(Node):
graphdb_property: Optional[str] = Field()
on_disk_property: Optional[str] = Field(on_disk=True)
This instruction influences Node serialization and deserialization when it is being saved or loaded from a database. Before being able to use it, you have to specify which implementation of the OnDiskPropertyDatabase
you'd like to use. For example, we'll use the SQLite implementation.
from gqlalchemy import Memgraph, SQLiteOnDiskPropertyDatabase
db = Memgraph
SQLiteOnDiskPropertyDatabase("property_database.db", db)
Now, every time you'd save or load a graph object from a graph database, the on_disk
properties are going to be handled automatically using the SQLiteOnDiskPropertyDatabase
.
user = User(
graphdb_property="This property goes into the graph database",
on_disk_property="This property goes into the sqlite database"
).save(db)
Conclusion
Now you know how to use on-disk properties, so your in-memory graph doesn't eat up too much RAM. Graph algorithms should also run faster because most of these large properties often aren't needed for graph analytics. If you have questions about how to use the on-disk storage, visit our Discord server and drop us a message.