Working with Graph Databases and Neo4J

Although not something new to many folks, working with graph databases and Neo4J is something I have not done. I finally have a decent use case! So starting out with training sources and installation, I will be learning the pros/cons of this database technology. I will primarily be working with Neo4J. It has a couple of free versions, works great on macOS, and has a pretty large and well established community. So when I hit issues Stack Overflow will most likely have the answers for me.

Installation and Configuration

I’ll be using my typical machine for this – which is now up to macOS Big Sur 11.6.6. Neo4J has several different versions including an Enterprise Server. There is a Community Server and a Desktop edition available. You can download and install either manually. I’m going to go for the server edition because the training I’m looking at uses that edition. I may install Desktop later on and see what the differences are.

I’m also going to install Community Server with Homebrew and not do the manual thing. Once installed, you can run the server from the command line with “brew services start neo4j” or if you want to run it in the foreground use this command: /usr/local/opt/neo4j/bin/neo4j console from Terminal.

I didn’t have to touch the configuration to get things working – yay! The server started up right away and I could login to the browser based environment. You’ll get prompted for a userid/pwd combo (neo4j/neo4j) and you will have to update that on the first login.

One handy feature when running the server in the foreground is that it tells you where all the directories are on the filesystem. This is a homebrew install so your paths might not match if you manually installed.

There’s not much more to configuration, which is quite nice. Yay, again!

Working with Neo4J Community Server

Ne04J Browser GUI

Neo4J has a slick browser GUI that allows you to see what’s going on, execute queries, load data, etc. It is fairly intuitive. When writing queries in “cypher” there is decent syntax checking that includes deprecation warnings. Those warnings include links to documentation so you do not have to look up fixes and solutions on your own.

The training I am using is for version 3 of Neo4J and I’m using version 4.4.7 so there are quite a few deprecation warnings, but that’s ok. Solving for these issues makes me have to learn more and not just follow the training in zombie mode. The course I am using so far is Linkedin Learning’s DB Clinic: Neo4J. There are a ton of tutorials on the web and Neo4J has excellent documentation and training too.

I did finally figure out that Community Server is limited to a single database. This is something new in V 4 of Neo4J. RTFM I guess…Desktop can have multiple databases. Performance is quite snappy. Loading up data is pretty quick. I added 3646089 labels, created 3646089 nodes, set 3646089 properties, created 3646089 relationships, in just 130920 ms.

Importing and Working with Data in Neo4J

It is simple, but I suppose a bit archaic, to import data into Neo4J. The simplest way is via CSV files. Files to be imported must be in a specific directory. See the screen capture above with the directory structure, but keep in mind that these directories are for the Homebrew installation and may not match a manual installation. Place files into the import directory. You can use the LOAD_CSV command to accomplish the task, however it is a good idea to model out your relationships before importing data in case you have any CONSTRAINTS you may need – for example any uniqueness constraints. Neo4J’s create constraint syntax changed with version 4 to this:

CREATE CONSTRAINT FOR (a:Accident) REQUIRE a.accidentIndex IS UNIQUE

Bulk loading techniques have also changed. Because Neo4J loads up everything into memory before creating all the relationships between objects, bulk loading can consume more memory than is available. To handle this issue prior to version 4 you would use “USING PERIODIC COMMIT”, but this method is deprecated. When importing data, wrap the persistence in a CALL block like this:

:auto LOAD CSV WITH HEADERS FROM "file:///CA_DRU_proj_2010-2060.csv" AS row
CALL {
    WITH row
    MATCH (c:County {code: toInteger(row.`County Code`)})
    CREATE (po: PopulationPrediction)
    SET po.year = toInteger(row.year),
        po.race = row.`Race name`,
        po.age = toInteger(row.age),
        po.gender = row.gender,
        po.population = toInteger(row.Population)

    CREATE (po)-[:FOR_COUNTY]->(c)
} IN TRANSACTIONS OF 1000 ROWS

Replace what is in the CALL block with whatever your model requires. You can also adjust the number of rows on the TRANSACTIONS line. 1000 is pretty small.

Querying and Searching for Data in Neo4J

This is not going to be a tutorial or complete reference on how to query data in Neo4J, there are a ton of those online. Querying requires learning Cypher (Neo4J’s query language). Cypher feels familiar to SQL, but it is significantly different that you will really need to study it to become proficient. Here’s a query:

MATCH (a:Accident)-[:INVOLVES]->(v:Vehicle)-[:IS_TYPE]->(t:VehicleType)
WHERE t.name CONTAINS "Motorcycle"
RETURN avg(a.severity) AS avg_severity
MS VSCode

This query says “Find the accidents that involved a vehicle of a certain type (Motorcycle) and return the average severity of each vehicle type. Kinda similar to SQL but also quite different. There are 3 objects and two relationships in this graph. An accident is connected to a vehicle and a vehicle is connected to a vehicle type.

Although you can do a lot of things in the Neo4J browser interface, I have found it super easy to move over to MS VS Code (VSC) or Jetbrains’ PyCharm and work with Python instead. In particular writing your code in a Jupyter notebook makes it very easy to debug and optimize your code as well as saving it. Of course you an also download the full Jupyter server and run your notebooks in that, but since I’ve always got some IDE open that is where I tend to do most things. Fairly easy to get this working. PyCharm is a really nice IDE for Python and automates all the setup tasks with pyenv, which is super convenient. Creating a project in PyCharm and then working on the project in VSC is what I’ve been doing. That probably seems lame, but I work with so many frameworks and languages, anything that can help me get configured properly is super helpful. Working with notebooks is easier in VSC, just create an .ipynb file and it will recognize that and install everything you need to get to work and install it for you.

How to Completely Delete all Neo4J Data

When learning, mistakes are easy. I found it useful to be able to completely wipe all data from the server and start over with a clean slate. This is specific to Neo4J Community. In order to delete all data, use Terminal. Stop the server and then go to /usr/local/var/neo4j/data. This is for a Hombrew based install. If you did not install with Hombrew then your directories are probably not the same. Remove everything in the databases and transactions folders. Now restart the server and boom –> clean slate. This will wipe out your password – to log back in use neo4j/neo4j and set a new password.

Conclusions – GraphDB’s and Neo4J

Working with Neo4J is fun! Its stable, the documentation is great, there is a ton of community support online. It is obviously a robust and mature tool. It is great when you can focus on really learning and not debugging.

I have only scratched the surface of what can be done with this technology. As with many of my posts here on Tarn Aeluin, I’ll come back to this one and keep updating it. What I would like to do is try to model out my “Big Beer List” as a graph database and see if I can replace my static list on the site. My list is really not that big, but modeling and then learning how to query would be interesting. Maybe untappd is doing something similar with its list. That would be challenging I think in terms of hosting the database and then figuring out how to integrate it into WordPress. Its all about time, but that would be pretty cool. Until then, keep checking here for updates.

Leave a Reply