Monday, August 29, 2016

Every Organization Needs a Knowledge Graph

A knowledge graph as it relates to individual organizations is a unification of information across that organization enriched with contextual and semantic relevance. Introducing a knowledge graph
creates a comprehensive and baseline set of knowledge accessible by personnel, applications and customers alike to gain understanding and drive actions and direction.
This foundational knowledge graph is not only useful for people and applications, but provides a relevant and evolving dataset for sophisticated learning and intelligence software systems to utilize in providing personalized internal guidance as well as highly engaging interactions with customers.

Knowledge Sharing Falling Short

To engage all personnel in collaboration and knowledge sharing, a majority of organizations today have adopted social networking trends and offering different kinds of internal tools. However, such applications can generate large volumes of unstructured organization data stored in isolated systems across an organization. This attempt at creating a holistic understanding falls short because all this knowledge sharing 

Getting Ready for Neo4j 3.0

As we’ve been deploying and testing the Neo4j 3.0 Milestone releases on GraphGrid we’ve been excited for some of the shiny new features and also taking note of the operational differences that we’ll need account for in preparation for supporting Neo4j 3.0.
Neo4j is the most popular option in today’s graph database space with its native graph reliability, performance and expressive querying language, Cypher, and Neo4j 3.0 takes strides in continuing the trend in being both enterprise ready and more developer friendly with your preferred language.

Neo4j 3.0 Feature Highlights

Some of the features in Neo4j 3.0 that we’re enjoying are the following in no particular order:
The upgrade to Lucene 5 comes with many performance improvements and other improvements made to the Lucene library since version 3 that was previously being used by Neo4j. Lucene is a high-performance, full-featured text search engine library written entirely in Java that Neo4j uses for some indexing

Graph Advantage: The Connected Customer

At this point in time, we’re living in the connected customer age. Frequently exposed to the internet, shoppers can easily find out important details on nearly every different product and company, making comparisons to competitors, and sharing their input through online ratings and/or social media. And
yet there is largely a disconnect between what companies know about their customers how they engage personally with them.
For the last several years, companies would try sizing up customers based on basic household data and demographics. At this time though, consumers have the advantage of sizing up businesses and their products. To keep pace with this trend, companies will need to be more personable in their digital interactions. This requires a higher level of intelligence to be built into those systems that facilitate personalized interactions with the customer. The data to do this is likely being collected and simply needs to be leveraged holistically to provide a personalized profile of their connected customer. Being able to

Investigative Journalism: On the Face of a Woman

All data tells a story—a story to be investigated. The beautiful thing about data is its ability to avoid all subtext and ambiguity and go straight to the facts. Swimming through large amounts of it however, can easily become overwhelming and unproductive. In order to be able to critically instigate the data and its story, one must be able to model and query it.
Last year we created a property graph model for a data set released to the public through the CHHS Open Data Portal. What struck us about this data was its enormous potential to be controversial. The data contained all cosmetic products sold in California “known or suspected to cause cancer, birth defects, or other developmental or reproductive harm” (CHHS). Many Californians assume that any makeup product they see on the shelves has passed every safety and health code before it reaches the public. This dataset said otherwise. Using the property graph model, we were able to bring meaning to the numbers.
The Chemicals in Cosmetics data set was richly interconnected

Synthetic Identities: Intro to Real-Time Detection

Synthetic identities are built through identity theft for use in fraud requiring individually valid identifying details. Identity criminals establish new identities through the combined use of false and actual data, or at times, independently valid information. Criminals utilize this synthetic identity to gain open deposit accounts, credits, driver’s licenses, and even passports.

Normally, identity criminals will often involve the use of an SSN (Social Security Number) and associating it with a name not related to that number. The use of real pieces of identity mixed together in new ways means that each piece of identifying information will pass a validation check on it’s own, which makes it hard to discover. Fraudsters know that synthetic identity theft is a simple-yet-lucrative act that can easily be carried out.

Synthetic Identities Factors

Bank fraud would typically involve identity criminals applying for loans, credit cards, unsecured bank credit lines, and overdrafts —- without any intention of making payment in return. It’s a major issue for today’s banking institutions.
Such problem can be a result of two main factors. The first factor involves first-party fraud being

Graph Advantage: Business Recommendation Engines

The most common interactions we have with recommendations today involve social, the people we may know, and retail, the products we may also like, but some of the more interesting recommendation engines are the ones that operate internal to an organization providing business recommendations around strategy, direction and execution. Designing and building business
recommendation engines that leverage a comprehensively connected data view within an enterprise can provide many competitive advantages. These advantages can include
increased efficiencies for how subject matter experts on the business domain should prioritize their daily efforts as well as helping the organization transition to being a more data driven enterprise with these insights guiding internal business use cases that go deep in offering a business-based direction on a holistic data view.

Business Recommendation Engines Guide Engagement

Whether it involves leveraging direct or indirect customer feedback through social media platforms, business supply chain details from the manufacturing plant to the logistics network, or inferring relationships according to an activity utilizing the network to determine the confidence in that assertion, the Neo4j graph database offers the significant advantages when it comes to making an enterprise data driven through business recommendation engines.
A known strategy for business recommendations internal to business is the design

Monday, August 22, 2016

Fitness and Nutrition Recommendation Engine

We’ve recently been exploring the new ways we as consumers would like to be interacted with by brands as we go through our daily lives. Fitness and nutrition are key parts to our lives and being able to do those well are crucial to remaining of balanced mind and body. Active lifestyle brands have captivated us in the recent years and we think it’s time that the engagement became more personal. They have enough information about us and we’d be willing to answer some simple questions about
our fitness and nutrition goals if they asked. This simple interaction would enable them to provide us with a personalized fitness and nutrition recommendation engine.
We have deep expertise in building recommendation engines and in a recent experiment we thought it would be interesting to build not just a fitness recommendation engine or a nutrition recommendation engine, but a personalized fitness and nutrition recommendation with a human engagement element. We picked the BeachBody website and used all publicly available data to see if with some creative rework and introduction of personalization concepts we could create a fitness and nutrition recommendation engine that would recommend me fitness programs and nutritional supplements to help me achieve my active 

Jack of All Data; Master of None

In building any technology there are always trade-offs when squeezing maximum performance out of the implementation so knowing what that guiding light is for a technology becomes very important. If you’re trying to do everything, or even too many things, odds are none if it will be great because you can’t go all out for one primary objective. Your focus will be split.
This becomes especially important when evaluating the graph database landscape where you have implementations that range from
  • doing only edge caching or single hop queries by using focused indexing strategies
  • to graph layers on top of existing database that still will suffer during complex graph traversals due to the underlying data storage implementation, which also restricts them from being able to guarantee the consistency of a relationship between two nodes on write
  • to various hybrids such as combining document with graph that try to do it all, but ultimately end up doing neither well
  • to fully native graph implementations that are designed for optimal graph traversal and navigation throughout the set of connected data
It’s key to understand these aspects of the underlying database implementation and the guiding light. Be wary of graph hybrids because they have a split focus on where they optimize. The saying, “Jack of trades; master of none” isn’t just true for people. If you’ve been creating technology for any period of time you

Graph Advantage: Interest Feed

What might interest you? In this age, it’s safe to assume that you have given at least some digital declaration regarding whatdefinitely interests you that is being used to power an interest feed. Like any good detective, we are in the business of following these clues to reveal yet undiscovered conclusions. Posts, topics, pictures, videos, people, and other assorted data that will likely interest you. Fortunately unlike detective work, digital sleuthing can be undertaken as a hard science, where algorithms and numbers are our forensic tools, culminating in an interest feed.

Neo4j Interest Feed Using Cypher

Finding commonalities among our digital footprints can be an intense task. In this example, we’ll use simple posts as our data type, with hashtags being the connecting clues. Let’s look at the user Adam Rainn in this graph. He’s made four posts, each with a different hashtag. He’s also following four famous users: Selah Britty, Brock Starr, Athena Leet, and Seamus Aktor. It’s only natural that posts from them should appear in Adam’s interest feed, especially those with similar hashtags. These can be found easily using this query:
1
MATCH (u:User {userId: "u1"})-[:FOLLOWS]-(v) WITH u, v MATCH (u)-[:POST]->()-[:HASHTAG]->()<-[:HASHTAG]-(p)<-[:POST]-(v) RETURN p;
This would get us some posts that we’re looking for, but this case is a bit trivial. We can sort through these much better if we weigh the posts according to how many hashtags they have in common:
1
MATCH (u:User {userId: "u1"})-[:FOLLOWS]-(v) WITH u, v MATCH (u)-[:FOLLOWS]-(v) WITH u, v MATCH (u)-[:POST]->()-[r:HASHTAG]->(h) WITH u, v, h, COUNT(r) AS cr MATCH (v)-[:POST]->(p)-[s:HASHTAG]->(h) WITH p, cr, COUNT(s) AS cs RETURN p ORDER BY cr * cs DESC;
But what good is an interest feed if it only tells us that we like things from people we’ve already declared we like? Let’s see what other discoveries we can make. You’ll notice a sixth user in this graph, Yun Nowne. Apparently, he’s made several posts with hashtags that Adam’s been using, and he even follows the same celebrities. How well can Cypher detect such users and reveal them to Adam?
1
MATCH (u:User {userId: "u1"})-[:POST]->()-[r:HASHTAG]->(h) WITH u, h, COUNT(r) AS cr MATCH (h)<-[s:HASHTAG]-(p)<-[:POST]-(v) WHERE u <> v WITH p, cr, COUNT(s) AS cs RETURN p ORDER BY cr * cs DESC;
The third and fifth post on this list come from Yun. And there’s no reason to stop here. As always, more data can yield more accurate (and more meaningful) predictions. In a denser graph, we could include the weight of the users we’re following based on how many of their posts we’ve liked, or how frequently our

Thinking in Patterns in Neo4j with Cypher

Thinking in patterns is the key to interacting with a graph database like Neo4j. One of the main challenges I see with those with deep relational database experience when transitioning to a graph database is the use of a relational approach for querying data. To query a graph database most efficiently there is a need to update the mental model for how database query interactions are approached. We’ll look at some examples of this and making this transition to thinking in patterns.
The overuse of relational query techniques most often manifests itself in a tendency to use WHERE clauses exclusively for filtering and comparisons from multiple complete sets of nodes, rather than enabling Neo4j to begin ignoring nodes as it expands the starting set in the MATCH clause. The goal of querying in the Neo4j graph database should be to get to the smallest starting set as quickly as possible to maximize the benefits of constant-time, index-free adjacency traversals within the local network around each starting node.

Thinking in Patterns Starts at Data Modeling

In order to query Neo4j in a pattern-centric manner that is sympathetic to the data layout the data model must consider these patterns that are important. One key in modeling the data is to know that each relationship off a node is literally a memory pointer to another node and the relationships around a node are grouped by their type. This allows constant time traversal and targeting from one node to a set of nodes all connected by a single type. Let’s look at an example…
Assuming we want to see individuals from Wooster, Ohio that were actors in a movie and see if any of them worked with any of the same directors. The non-normalized RDBMS approach to model this could be putting isActor, isDirector, city, state and movies properties on the Person node. Here’s a bit of an extreme example of this could look:
1
2
3
4
MATCH (actor:Person) WHERE actor.isActor = true AND actor.state = “Ohio” and actor.city = “Wooster”
WITH actor, actor.movies AS movies UNWIND movies AS movie
MATCH (director:Person) WHERE director.isDirector = true AND movie IN director.movies
RETURN director, collect(person) AS persons;
The issue with such approach is that it requires you to go through each node within the Person label to find the intersection of the values within the movies array for the Person nodes that have been determined to be