Using on Disk Storage With an In-Memory Graph Database

·

3 min read

Since Memgraph is a graph database that stores data only in memory, the GQLAlchemy library provides an on-disk storage solution for large properties not used in graph algorithms.

from gqlalchemy import Memgraph, SQLitePropertyDatabase, Node, Field
from typing import Optional


graphdb = Memgraph()
SQLitePropertyDatabase('path-to-my-db.db', graphdb)

class User(Node):
    id: int = Field(unique=True, exists=True, index=True, db=graphdb)
    huge_string: Optional[str] = Field(on_disk=True)

my_secret = "I LOVE DUCKS" * 1000
john = User(id=5, huge_string=my_secret).save(db)
john2 = User(id=5).load(db)
print(john2.huge_string)  # prints I LOVE DUCKS, a 1000 times

What’s happening here?

  • graphdb creates a connection to an in-memory graph database

  • SQLitePropertyDatabase attaches to the graphdb in its constructor

  • When creating a definition for a node with a label User two properties are defined

  • User.id is a required property of type int that creates UNIQUENESS and EXISTS constraints and an index inside Memgraph

  • User.huge_string is an optional User property that is saved to and loaded from an SQLite database

  • my_secret is an example of a huge string that would unnecessarily slow down a graph database

  • User().save() saves the node with the label User in a graph database and stores the huge_string in the SQLitePropertyDatabase

  • When loading the data, the inverse happens, the node is fetched from the graph database and the huge_string property from the SQLitePropertyDatabase

Saving large properties in an on-disk database

Many graphs used in graph databases have nodes with a lot of metadata that isn't used in graph computations. Graph databases aren't designed to perform effectively with large properties like strings or parquet files. The problem is usually solved by using a separate SQL database or a key-value store to connect large properties with the ID of the node. Although the solution is straightforward, it is cumbersome to implement and maintain. Not to mention, you have to do it for each project from scratch. We've identified the problem and decided to take action. With the release of GQLAlchemy 1.1, you can easily define which properties will be saved in a graph database, and which in an on-disk storage solution. You can do that once, in the model definition, and never worry again if properties are saved or loaded properly from the correct database.

How does it work?

GQLAlchemy is a python library that aims to be the go-to Object Graph Mapper (OGM) -- a link between graph database objects and python objects. It is built on top of Pydantic and provides object modeling, validation, serialization and deserialization out of the box. With GQLAlchemy, you can define python classes that map to graph objects like Nodes and Relationships in graph databases. Every such class has properties or fields that hold data about the graph objects. When you want a property to be saved on disk instead of an in-memory database, you specify that with the on_disk argument.

from gqlalchemy import Node, Field, SQLiteOnDiskPropertyDatabase
from typing import Optional


class User(Node):
    graphdb_property: Optional[str] = Field()
    on_disk_property: Optional[str] = Field(on_disk=True)

This instruction influences Node serialization and deserialization when it is being saved or loaded from a database. Before being able to use it, you have to specify which implementation of the OnDiskPropertyDatabase you'd like to use. For example, we'll use the SQLite implementation.

from gqlalchemy import Memgraph, SQLiteOnDiskPropertyDatabase


db = Memgraph
SQLiteOnDiskPropertyDatabase("property_database.db", db)

Now, every time you'd save or load a graph object from a graph database, the on_disk properties are going to be handled automatically using the SQLiteOnDiskPropertyDatabase.

user = User(
    graphdb_property="This property goes into the graph database",
    on_disk_property="This property goes into the sqlite database"
).save(db)

Conclusion

Now you know how to use on-disk properties, so your in-memory graph doesn't eat up too much RAM. Graph algorithms should also run faster because most of these large properties often aren't needed for graph analytics. If you have questions about how to use the on-disk storage, visit our Discord server and drop us a message.

Read more about Python and graph databases on memgraph.com