Wednesday, January 6, 2016

Using Neo4j Cypher MERGE Effectively

One of the areas in Neo4j and Cypher that brings the most questions and discussion when I’m giving Neo4j trainings or working with other engineers building with Neo4j, is around how to use Cypher MERGE correctly and efficiently. The way I’ve explained Cypher MERGE to all our engineers and all the training attendees is this.
There are a few simple things to understand about how Neo4j handles Cypher MERGE operations to avoid undesired or unexpected behavior when using it.
1. The Cypher MERGE operation is a MATCH or CREATE of the entire pattern. This means that if any element of the pattern does NOT exist, Neo4j will attempt to create the entire pattern.
2. Always MERGE on each segment of the pattern that has the possibility to already exist.
3. After a MERGE operation you are guaranteed to have a use able reference to all identifiers established during the Cypher MERGE operation because they were either found or created.

Simple Cypher MERGE Examples

Let’s look at a couple examples of Cypher MERGE operations:
Assuming that a unique constraint exists on username for the User label and that (u:User {username: “neo”}) exists in the graph do you think these two statements are equivalent?
Statement 1:
MERGE (neo:User {username: “neo”})-[:KNOWS]->(trinity:User {username: “trinity”});
Statement 2:
MERGE (neo:User {username: “neo”})
MERGE (trinity:User {username: “trinity”})
MERGE (neo)-[:KNOWS]->(trinity);
The answer is no; they’re not equivalent. Here’s why they’re not:
In Statement 1 Neo4j will find that the entire pattern doesn’t MATCH because -[:KNOWS]->(trinity:User {username: “trinity”}) doesn’t exist in the graph.
This will cause Neo4j to attempt to create the entire pattern, which includes the already existing User node with a uniquely constrained username field causing an exception about violating the unique constraint.
In Statement 2 Neo4j is able to MATCH the ‘neo’ User node in the first line and establishes a reference to it. Then in the second line Neo4j doesn’t find the ‘trinity’ User node so a CREATE is performed, which establishes the reference. Then finally in the third statement, using the references established in the two preceding MERGE statements, Neo4j successfully connects both the ‘neo’ and ‘trinity’ User nodes with the KNOWS relationship.

Simple Cypher MERGE Best Practices

Knowing what know now that we looked the examples above, here are some best practices for you most common, simple Cypher MERGE operations:
In scenarios where node existence in the graph is optional, the best general strategy for Cypher MERGE operations is to always MERGE on the uniquely constrained identifier for each node involved in the total pattern in isolation and then MERGE the relationships using the node references already established.
In scenarios where node existence in the graph is required, the best general strategy for Cypher MERGE operations is to always MATCH the nodes that are expected to exist and MERGE only the relationship using the node references established by the MATCH operation.
There are more advanced Cypher MERGE patterns and strategies, but if you’re just starting out using Cypher MERGE in this way will help you consistently get the desired result.

Tweet: Some guidelines on how to use @neo4j #graphdb #Cypher MERGE operations consistently and efficiently. http://ctt.ec/T5rcz+

No comments:

Post a Comment