Monday, October 31, 2016

Using Neo4j Cypher MERGE Effectively

One of the areas in Neo4j and Cypher that brings the most questions and discussion when I’m giving Neo4j trainings or working with other engineers building with Neo4j, is around how to use Cypher
MERGE correctly and efficiently. The way I’ve explained Cypher MERGE to all our engineers and all the training attendees is this.
There are a few simple things to understand about how Neo4j handles Cypher MERGE operations to avoid undesired or unexpected behavior when using it.
1. The Cypher MERGE operation is a MATCH or CREATE of the entire pattern. This means that if any element of the pattern does NOT exist, Neo4j will attempt to create the entire pattern.
2. Always MERGE on each segment of the pattern that has the possibility to already exist.
3. After a MERGE operation you are guaranteed to have a useable reference to all identifiers established during the Cypher MERGE operation because they were either found or created.

Simple Cypher MERGE Examples

Let’s look at a couple examples of Cypher MERGE operations:
Assuming that a unique constraint exists on username for the User label and that (u:User {username: “neo”}) exists in the graph do you think these two statements are equivalent?
Statement 1:
MERGE (neo:User {username: “neo”})-[:KNOWS]->(trinity:User {username: “trinity”});
Statement 2:
MERGE (neo:User {username: “neo”})
MERGE (trinity:User {username: “trinity”})
MERGE (neo)-[:KNOWS]->(trinity);
The answer is no; they’re not equivalent. Here’s why they’re not:
In Statement 1 Neo4j will find that the entire pattern doesn’t MATCH because -[:KNOWS]->(trinity:User {username: “trinity”}) doesn’t exist in the graph.
This will cause Neo4j to attempt to create the entire pattern, which includes the already existing User node with a uniquely constrained

Data Modeling with Neo4j: “School Immunization in California” CSV to Graph

1 state, over 9 million children, and 42,981 rows of CSV immunization data. After many rough drafts, I was finally able to land on an efficient and aesthetically pleasing way to map out the immunization data of children in California (found and downloaded online from the California Department of
Education*).In this post our goal is to walk through the data modeling process to show how this CSV data can be connected meaningfully with Neo4j. What makes this data so interesting is its varying degrees of location, three distinct grade levels, and a dense record of immunization numbers and percentages-all spanning over two separate school years.
After successfully mapping the data, I could then easily explore it, answering questions such as: Where in California has the lowest amount of children vaccinated?, Are less parents vaccinating their children in 2015 compared to 2014?, and Which age group is more up to date on its vaccinations?. Furthermore, I was able to clearly visualize the data in small and large quantities using the neo4j graph.
* The public school dataset can be found at http://www.cde.ca.gov/ds/si/ds/pubschls.asp . The private school dataset can be found at http://www.cde.ca.gov/ds/si/ps/ .

Preparing the Files

Before anything else, we have to modify the CSVs by:
  • Changing the column names: capitalize only the first letter of each word and leave the rest of the word in lower case.
  • Replacing # with Number, % with Percent, 1 with One, + with Plus, and taking out any dashes and parentheses. More simply, spell out the symbols and numbers found in the headers.

Mapping the Data: Determining the Relationships Between Locations and Schools

Before getting into the code, I took pen to paper by simply drawing out the nodes and their relationships to 

Read More......

Data Modeling with Neo4j: “Chemicals in Cosmetics” Step-by-Step Process

Take this unique dataset in a CSV format and transform it into a graph using Neo4j. Using the Neo4j model, we can compact the vast number of relationships and properties within the Chemicals in Cosmetics dataset, creating more meaningful and easily applicable data.

Meet the Data

Since 2005, all Californian cosmetic companies are required to provide the information of any cosmetic product that contains chemical(s) that cause or are suspected to cause cancer, developmental birth defects, or harm to the reproductive system. This list, of the cosmetics and chemicals in question, is openly provided on the California government website*. Even more intriguing, are the numerous properties about the products and chemicals, such as important dates and times, whether the product is still being sold, whether the chemical is still being used, and much more. The interconnectedness of this data illustrates the power of the property graph model and its ability to succinctly store information by prioritizing both the nodes and the relationships.
Below is a list of the more ambiguous/allusive headers found in the CSV. Going through and understanding the headers is a vital step for developing an accurate graph model.Headers & Descriptions:1. CDPHId (#): CA Dpt. of Public Health identification number for product. May occur more than once.
2. CSFId (#): CDPH identification number for CSF.
3. CSF: Color, scent, and/or flavor. Not all products have specific colors, scents, or flavors.
4. CompanyId (#): CDPH internal identification number for company.
5. PrimaryCategoryId (#): CDPH identification number for category.
6. PrimaryCategory: Type of product (13 primary categories).
7. SubCategoryId (#): CDPH internal identification number for subcategory.
8. SubCategory: Type of product within one of the 13 primary categories.
9. CASId (#): CDPH identification number for

Neo4j for Your Modern Data Architecture

We recently sat down with Neo Technology, the company behind Neo4j, the world’s leading graph database, to talk more about our role as a trusted Neo4j solution partner and to dive deeper into our Neo4j Enterprise offerings.
Talk to me about GraphGrid. What’s your story?

So to understand GraphGrid, let’s dive into a little back story: We co-founded AtomRain nearly seven years ago with the vision to create an elite engineering team capable of providing enterprises worldwide with real business value by solving their most complex business challenges. As we figured out what that looked like practically, we found ourselves moving deeper down the technology stack into the services and data layer where we handled all the heavy lifting necessary to integrate data sources and provide the functionality, performance and scale needed to deliver powerful enterprise service APIs.

In early 2012, we had our first exposure to Neo4j and experienced first hand the potential of

MySQL to Neo4J

You’ve probably heard that an effective way to take move data from an existing relational database to graph is using LOAD CSV. But what exactly does the process of converting all or part of the database tables from MySQL to Neo4j using LOAD CSV involve start to finish? We’ll be using Mysql5 Northwind database as our example. There is a Neo4j tutorial that has a similar explanation using Postgres and discusses the graph modeling aspects as well. So definitely good to read through that. Here we’ll focus on MySQL and the CSV export in preparation for the Neo4j import.
the
First we’ll install and connect to the MySQL database:
1
2
$ brew install mysql
$ mysql.server restart
*Note: We’re skipping all MySQL server security because for this demonstration its simply an intermediary to get the data we need for the Neo4j LOAD CSV process.
Now using freely available MySQL Workbench or Sequel Pro connect to your localhost MySQL server. You should be able to do this directly on 127.0.0.1 without any username or password because we skipped the normal process of securing the server.
Import the Northwind.MySQL5.sql that you downloaded above. If you’re using Sequel Pro, you do this by choosing File -> Import… -> browse to your download and select Northwind.MySQL5.sql
When the import is finished you’ll see all the tables available for export to Neo4j. The specific tables we are interested in for our Neo4j graph model are Categories, Customers, Order Details, Orders, Products and Suppliers.
Export each table with right + click and selecting Export -> As csv file.
Customize the CSV file with settings that import smoothly into Neo4j (most should be selected by default):
1. NULL fields should export as a blank because it’s more efficient validate an actual existence or IS NULL check rather than actually creating the property with the literal string value “NULL” as value.
2. Escape values such as quotes with \ so quotes in the middle of the field do not break the

Modeling Time Series Data with Neo4j

I’ve been receiving many questions recently at trainings and meetups regarding how to effectively model time series data with use cases ranging from hour level precision to microsecond level “How granular do I really need to make this to efficiently work with and expose the time-based data being analyzed?” and “Do I need to generate all time nodes down to the desired precision level?” The balance that needs to be considered is the initialization and maintainability of all the time series nodes versus the dynamic creation as time series events require their existence and the impact the missing nodes may or may not have when querying time series events by various date and time ranges.
precision. In assessing the various approaches possible, I landed on a tree structure as the model that best fit the problem. The two key questions I found myself asking as I went through the process of building the time tree to connect the time series events were,

Originating the Time Tree

I ultimately decided that it would be most effective to create the hour, minute and second level nodes only when needed to connect an event into a day. So I expanded on the work done by Mark Needham in his post Neo4j: Cypher – Creating a time tree down to the day. The main modeling change in this step was to use a single CONTAINS relationship going from the higher tree toward the lower tree level to simplify movement up and down the entire tree through the use of depth-based pattern matching. Additionally I concluded that Read More......

Monday, October 24, 2016

Graph Advantage: Personal Gamification

User attention spans online are quite short today with all the options vying for their eyeballs. Gasification is an approach that surfaced as being an effective way to engage with users in a more engaging and interesting manner. Gamification in a non-generic, personal manner with the right incentives for each individual requires a very complete and connected understanding of that user.

Defining Gamification

Gamification involves integrating gaming mechanics into marketing and user interaction strategies. It’s a method that drives consumer engagement and participation since it produces positive behavior via incentives and rewards. As a matter of fact, this form of “gaming” can drive recurring user engagement and positive perception, as long as the approach is based on the right incentives.

Benefits of Gamification with Neo4j

Gamification is becoming more widely seen in today’s user strategy and is on the mind of those driving product and marketing decisions. Getting gamification right can transform the way businesses deal with their customers. Businesses are developing these game mechanics for their target users by means of rankings and customizations to get them to work for customers. The mechanics apply to situations that help promote customer loyalty, motivate buyers to continue making purchases, and provide attractive incentives to maintain their interest.
Involving not only the customer and your products, but also introducing rewards, challenges, and purpose as part of the interaction introduces a completely new and dynamic set of complexity with a very real-time and responsive nature to it. All these components working together in real-time can engage the user in higher and continued levels of participation.
When it comes to querying connected data in real-time, there is no database better than Neo4j. As a native graph database, Neo4j is designed for graph traversals in constant time, which means many data types can 

Graph Advantage: User Personalization

User personalization is intended to tailor each individual’s experience to them and really provide a more human element to the interaction. Providing this aspect of feeling known and understood rather than just being a generic set of eyes can go a long way towards more fulfilling and continued engagement with your users.

Digital retail is an major space where personalization surfaces because it’s challenging to find the right balance and approach for the interaction. By concentrating on the online retail journey of your customer and making it a personalized experience, you and your customers both benefit from the more personal and meaningful interaction.

Personalization through Digitizing Retail Knowledge

Many strategies for personalization involve long-running offline batch processes that take a considerable amount of time to complete in order to consider changes to what the system understands about me as an individual user. This delay in response is a major barrier to a personalized engagement with your user. The amount of data to be considered in total is quite astounding for the large established retail chains. However even with all this data there is still the possibility to provide a real-time user personalization experience for your customers. The information that is relevant for enhancing the online experience of each individual customer is quite small, although complex from a data connectedness perspective.
Customers today have become quite willing to share personal details in exchange for improved shopping offers and experience. By using these details in a meaningful way you’ll be demonstrating an understanding of your customers and showing how you can utilize such knowledge to enhance their experience, they’ll be more likely to continue to lean into the experience by providing even more insights into their preferences.
A lot of online retail websites offer a plethora of navigation trees, but people have been trained to make use 

Neo4j is for the Non-Technical

Neo4j unifies organizations across departments and across teams, both technical and non-technical, enabling a greater level of understanding and clarity in communication than previously possible. A Neo4j graph model is whiteboard friendly and allows everyone from business to engineering groups to speak the same language of connections. Communicating in contextually relevant connections that bring together business concepts reduces the potential for misunderstandings that cause delays and rework later.

Neo4j Connects Your Organization by Connecting Your Data

The world today is highly connected. Graph databases are whiteboard friendly and effective in mimicking erratic and inconsistent relationships through intuitive means. They help provide insights and understanding by creating connections within complex big data sets. As enterprises become increasingly data driven it is essential that all individuals, especially the non-technical groups have the ability to collaborate with engineering in a more integrated fashion. Neo4j removes the intimidation factor of technology typically required to deal with complex data and enables more unified collaboration because we all can relate to connections.
There a number of reasons why both technical and non-technical teams within an organization could all agree on Neo4j:
  • It offers incredible performance
    The more connected data gets in typical RDBMS and NoSQL databases, the faster performance query degrades. It’s fact that data within all organizations is growing rapidly in size and connectedness. Neo4j provides constant time navigation through your connected data whether your one level deep or ten levels deep.
  • It guarantees data reliability

Graph Advantage: Research Organizations

Many enterprises today build their business around research that involves piecing together meaningful data from the public domain for their customers. When trying to connect data across a domain in a meaningful way building around a graph database is a great tool because it models very well exactly how the business analysts at these research organizations are piecing together the real-world data they are finding during their research.

A business analyst may begin with one person and from there, move to the company they’re working for and then shifting to colleagues before moving on to places where their current colleagues previously worked, before finally settling for their past colleagues. Suddenly the business analysts has nearly finished building out an intuitive network of complex connections around this person of interest which would have been challenging and time consuming to try to represent in Excel.

Graph Database in Research Organizations

Research organizations are more than just managing large data volumes, their core goal is finding understanding that comes through research to gain insight of the available data. To properly leverage data relationships, a research organization requires a database technology that houses data relationship as a first-class entity.
As a native graph database, Neo4j provides several essential advantages for businesses today:
  • Neo4j structures data connections precisely as they exist in the world around us with the contextually specific connections between entities as primary entities that can be explored in constant time.
  • Real-time results for queries that are exploring the many different and complex connections around and 

Graph Advantage: Connected Enterprise

The connected enterprise is the new norm. Traditional chain paradigm with sequential and siloed operations lacking a connection between customer and factory is no longer cutting it. Today enterprises are excepted to be sufficiently in touch and aware of how to interact with each uniquely individual person they are fortunate to call their customer. Technologies and operational procedures are rapidly changing to enable information to be connected and taken together to drive decision making, direction, and interaction with the customer.

Connected Enterprise: Data Essentials

Connected data is the lifeblood of today’s enterprise. Yet, it’s frequently isolated in varying silos across an organization, with different accessibility, redundancy, quality, and varying data formats. Managing connected data involves identifying, cleaning, storing, and governing increased data volumes within an enterprise. Connected data involves essential information such as customers, users, products, services, sites, and business units.
Adequate practices for connected data management differ along a wide range of approaches. On one end, many believe that connected data should be united in one location; while on the other end, some recommend managing data assets from one application or service, even if information is housed in multiple locations.
In both cases, data architects require a data model that’s versatile and fluid when exceptions arise and business needs change. And the only model that can answer this is the graph database.

Data Management and Graph Databases

Enterprises today are flooded with “big data”, a majority of which is master data. Dealing with

Read More......

Every Organization Needs a Knowledge Graph

A knowledge graph as it relates to individual organizations is a unification of information across that organization enriched with contextual and semantic relevance. Introducing a knowledge graph creates a comprehensive and baseline set of knowledge accessible by personnel, applications and customers alike to gain understanding and drive actions and direction.

This foundational knowledge graph is not only useful for people and applications, but provides a relevant and evolving dataset for sophisticated learning and intelligence software systems to utilize in providing personalized internal guidance as well as highly engaging interactions with customers.

Knowledge Sharing Falling Short

To engage all personnel in collaboration and knowledge sharing, a majority of organizations today have adopted social networking trends and offering different kinds of internal tools. However, such applications can generate large volumes of unstructured organization data stored in isolated systems across an organization. This attempt at creating a holistic understanding falls short because all this knowledge sharing and information isn’t actually being connected together.
The main result from this approach is a complex infrastructure containing data silos filled with duplicated, expired, and redundant information. This makes it hard to see the right information and acquire important insights. Organizations today need a graph data platform to support increasingly complex data management needs; deal with information flow, data infrastructure and communication problems; and allow next-generation systems to effectively seek, share, filter, and review data.

Knowledge Graph: Understanding and Growth

By embracing the nuanced complexities, semantics and contextual connections within an organization, a knowledge graph can be a catalyst for understanding and growth. The diverse and complex aspects of an 

Read More......

Thursday, October 20, 2016

Master Data Management with Neo4j: Merging Two Financial Institutions

Master Data Management (MDM) is an increasingly complex topic for organizations today. The rate at which data in an enterprise to is flowing and evolving as a business asset, requires a the need for a more flexible and connection-centric master data storage solution.

This simple demonstrations shows some the benefits of a schema-free flexible graph model that treats relationships as first class citizens.

Master Data Management, is a practice that involves discovering, cleaning, housing, and governing data. Data architects for enterprises require a data model that offers ad hoc, variable, and excellent structures as business needs are constantly changing. This rapidly changing model ideally fits with a graph database.

Read more: https://www.graphgrid.com/graph-advan...

Better Insights from Your Master Data - Graph Database LA Meetup Demo

Master Data Management, is a practice that involves discovering, cleaning, housing, and governing data. Data architects for enterprises require a data model that offers ad hoc, variable, and flexible structures as business needs are constantly changing.

We'll be discussing the benefits of using the Neo4j graph database for Master Data Management including the flexible schema free data model, concepts of layering in data, keeping your data current and flowing and then the benefits of connected data analytics and real-time recommendations that can result.

An overview of MDM with Neo4j https://www.graphgrid.com/graph-advan...

Neo4j Integration with ElasticSearch - Elastic User Group LA Meetup Presentation

Complimentary technologies are very important as it's rarely possible for a single technology to be optimized for everything. There are always trade-offs - jack of all trades master of none also applies in technology solutions: https://www.graphgrid.com/jack-of-all....

In this increasingly polyglot world of data storage options one natural pairing that has proven quite effective is Neo4j and ElasticSearch. Neo4j is designed for optimal graph traversal (i.e querying highly connected data) and ElasticSearch is a powerful search server with great language analyzers and aggregation strategies out of the box. While you could try to rebuild the search capabilities in Neo4j because it's built on Lucene, you'd be relaying much of the foundation you already have in ElasticSearch with a subpar result and similarly while you could try to represent connected graph structures in ElasticSearch you'd be expending much effort with short comings in reliability due to the underlying storage mechanisms compared to using Neo4j for that connected data aspect. In this talk we'll look at paring them together and reaping the benefits of both doing what their best at.