Monday, November 21, 2016

Connected Data Analytics: Basics

As organizations adopt graph databases, their available connected data will grow, which will drive the need for analytics to leverage the connected data as a core component of their analysis. The key to unlocking new insights is to leverage the connectedness of the data as part of a graph analytics solution. Through graph analytics enterprises have gained competitive advantages because they are now discovering the cause, effect, and influence of certain patterns present in their organizations data.

When it comes to exploring how graph analytics can be used in solving problems, it boils down to its ability to compare “many to many to many.” For example, it makes it possible to not only ask about “friends” of a person, but also friends of their friends as well with details include beyond the fact they’re connected. Building up on such scenarios allows you to see influencers within a network. Graph analytics can infer paths via complex relationships to determine connections that aren’t easy to find and surface these to human analysts for confirmation, validation and action.

Graph Analytics: Connected Data Discovery and Business Impact

One aspect where graph analytics is an advantage is data discovery. It allows you to see patterns within data when you have no idea what question you want to ask. This makes it possible to find a needle in a haystack. As patterns start emerging from data sets, you are able to surface a clear picture of the precise elements and 

Read More......

Data Validation and Testing Your Graph Data State

Data validation lets you gain insight on the quality of your data assets. This involves grading your organization consistently to monitor your progress. When testing data, it’s essential to set metrics, as well as succeeding steps and goals to drive improvements. Data testing is even more crucial when loading data into a schema free graph database like Neo4j. So how do we it efficiently and continuously?

Schema-Free Nature of Neo4j and Data Validation

Neo4j is schema-free by nature, but does provide some schema concepts that can be enforced. This means, when your data flows via your Neo4j data pipeline and graph, there won’t be enforced constraints on data type. This also means Neo4j will try to pick the best data type when a property is being written if it isn’t specifically enforced for variations in numerical precision and all numerical values that are desired to be stored as strings. So if you happen to load data into Neo4j using LOAD CSV and you write a property consisting only of numerical value and want it stored as a string, then it’s essential you always wrap it in the Cypher toString() function to ensure you won’t end up with properties consisting of varying data types.

Data Validation with Postman, REST Requests, and Newman

For large scale automated data validation it’s beneficial to make use of a REST-client like Postman to create a 

Read More......

Graph Advantage: Fraud Detection

Financial institutions and insurance firms with traditional fraud detection capabilities lose billions of dollars to fraud. Traditional approaches in detecting fraud play a critical aspect in minimizing financial losses. However, an increasing number of fraudsters have created different methods to avoid being discovered. In order to gain the upper hand again these financial institutions are need to combine the traditional subject matter expertise of an analyst with enhanced exploration and discovery capabilities enabled through a highly connected data set in a graph database.

Real-Time Fraud Detection with Graph Databases

Graph databases provide new ways of unearthing fraud rings and other high-tech scams with incredibly precision. This predictive assist, allows your company to focus on the important data necessary to uncover and halt advanced fraudulent actives in real-time. At the same time, a graph database can offer insight based on data relationships to help you create advanced fraud detection systems according to connected intelligence.

Why Neo4j is Effective at Detecting Fraud

With fraudulent activity becoming more sophisticated and disconnected, enterprises have augmented their fraud-detecting capabilities with Neo4j to discover fraud rings and other scams accurately and in real-time. Regardless if it’s money laundering, e-commerce fraud, or bank fraud, Neo4j aids you in detecting elements 

Read More......

Neo4j Production Ready: Enterprise Cloud

The cloud today has become the primary deployment option for startups and is gaining adoption across the worlds largest enterprises. As with other critical infrastructure holding sensitive organization or customer data, there are several key questions enterprises must consider when evaluating the Neo4j graph database cloud deployments.

When running Neo4j in production, especially for an enterprise, the obvious baseline is to use the Neo4j Enterprise edition because it offers high availability clustering, cache sharding, hot backup, enhanced monitoring, and a several other critical production features. Taking a shortcut here will leave your enterprise vulnerable to outages, data loss and little ability to upgrade as new versions of Neo4j are released.

Essential Neo4j Enterprise Database Features

In addition to the high availability clustering and hot backups a few of the key benefits of Neo4j Enterprise include:
  • Enterprise Lock Manager
    The enterprise lock manager enables high levels of concurrency through fast lock resolution, which provides vertical scaling of concurrent applications beyond 5 CPU cores.
  • Cache Sharding
    For large graphs this is very useful when paired with sticky sessions because it provides a high cache hit

Neo4j Production Ready: Security

With cloud adoption consistently accelerating in all organizations and industries, selecting a Neo4j cloud platform that offers your business security and scalability while eradicating lead time of internal-building is important. To simplify such a process for utilizing Neo4j Enterprise, the GraphGrid Data Platform provides a Neo4j Amazon Web Services (AWS) cloud offering. This Neo4j Enterprise data platform not only enables management of global Neo4j Enterprise clusters but is capable of helping you free up time from laying the foundation in operations, to let you concentrate on your product and services development for your business.

VPC Security for Neo4j Enterprise

Security is the biggest question on any organization’s mind when transitioning to the cloud, which is why that has been the core of the architecture and design for enabling a Neo4j Enterprise cloud offering in AWS.
In AWS it is important to utilize a VPC, which guarantees that all your Neo4j Enterprise resources can be launched in an isolated network that only your authorized personnel, infrastructure and services can access.
The VPC configuration must be properly configured to adhere to your enterprise security requirements. For instance, a public subnet is made so your servers can gain entry to the internet while your backend systems within a private subnet virtually has no internet access. It is important to establish controls for multiple security layers as well as security groups and network access control lists, for controlled access to Neo4j Enterprise clusters.
Furthermore, you can make a Peering or Direct VPN connectivity between your enterprise data center/VPC 

Neo4j Production Ready: Deployment Basics

If you intend to perform a Neo4j production deployment successfully, you’ll likely think about the best application architecture to use and how you’ll operate your Neo4j Enterprise deployment at a scale. Some things you’ll need to think about should include how you intend to guarantee availability uptime, handle failures and efficiently facilitate zero downtime upgrades, which is really just the required baseline to be considered production ready. It may go without say, but to go to production without using Neo4j Enterprise is a huge risk to your applications availability. 

Neo4j Deployment Options

In terms of deployment options, there are two ways in which you can incorporate the graph database Neo4j Enterprise version within your app. These can be:
  • Using Neo4j embedded: This means you’ll be utilizing the Neo4j Java libraries and packaging it with the rest of your application code into a WAR or JAR file that is deployed to the Java server of your choice such as JBoss or Tomcat.
  • Using Neo4j server: This means you’ll be utilizing the default Jetty server wrapper that is provided with Neo4j and communicating with the database over rest, which is the recommended approach for almost all applications because it keeps your database decoupled from your application and enables the two to 

Monday, November 14, 2016

Neo4j Enterprise Cluster Basics

Neo4j Enterprise enables a high availability cluster using the PAXOSprotocol for cluster communication prior to 3.x and the RAFT protocol with the core-edge clustering model is now available in the current milestone releases. If you’re interested into diving deeper into specifications and the implementation of the new RAFT protocol, I suggest you checkout Jim Webber’s great overview in his keynote from GraphConnect SF 2015. One very useful feature coming in 3.x is the ability to read your own writes. Meaning you can require that the transaction with write you made to core is available on the edge server handling the read request before it returns your request.

So while that is coming in 3.x, what is the current landscape in 2.x?

Neo4j Enterprise Write Operations

When operating a Neo4j Enterprise cluster, there will always be one master instance and some number of slaves. Neo4j is capable of handling write requests on all instances, but that requires the slave to proxy the request to the master so it is best to separate reads and writes to ensure the master is the only Neo4j instance handling write requests.
Writes to the Neo4j master instance will be optimistically pushed to zero or more slaves as configured. This means the master will try pushing the successfully written transaction to the specified number of slaves prior to the write request completion. If the replication ends up failing for any reason, the transaction on the master will still remain successful although it will be different from the typical normal replication factor. The Neo4j slave instances will continue to pull for their updates at the configured interval so the writes will still eventually replicate and be available for read requests.

Neo4j Enterprise Master Re-Election

Whenever a Neo4j Enterprise graph database instance becomes unavailable (as a result of network outages 

Read More......

Graph Advantage: Real-Time Recommendations

We all receive recommendations presented to us on a daily basis. From the products that we should be buying to the movies we should be watching to the people we should be dating…the list goes on. You’re capable of recommending just about anything — as long as you have the right data in place. Graph databases are naturally well-suited for building real-time recommendation engines thanks to the native graph traversal performance when traversing the network around and between the desired starting node such as a person that already bought a set of products.

Whether an enterprise functions within the social, media or retail sector, providing users with targeted, real-time recommendations are important for providing the customer value through a personalized experience, which is quickly becoming the baseline for remaining competitive. Unlike that of business data, recommendations should be contextual and inductive so it can be deemed relevant by the end consumer. Achieving this requires a “good enough” level of data classification with sufficient connectedness between the data points in the system.
With a graph database where relationships are treated as first class citizens, you can connect a customer’s browsing history while combining that with their purchase history and offline product and brand interactions to enable the real-time recommendation algorithm to utilize their present choices and offer personalized recommendations without any offline pre-compute delaying the interaction — lowering the potential for the consumer to purchase from a competitor.

Neo4j for Real-Time Recommendations

Whether you’re leveraging social connections or connecting data across digital and physical customer touch points, the Neo4j graph database provides the possibility of providing relevant real-time recommendations 

Read More......

Neo4j Data Pipeline

Every enterprise has a constant flow of new data that needs to be processed and stored, which can be done effectively using a data pipeline. Upon introducing Neo4j into an enterprise data architecture it becomes necessary to efficiently transform and load data into the Neo4j graph database. Doing this efficiently at scale with the enterprise integration patterns involved requires an intimate understanding of Neo4j write operations along with routing and queuing frameworks such as Apache Camel and ActiveMQ. Managing this requirement with its complexity proves to be a common challenge from enterprise to enterprise.

One of the common needs we’ve observed over the years is that an enterprise that wants to move forward efficiently with a Neo4j graph database needs to be able to rapidly create a reliable and robust data pipeline that can aggregate, manage and write their ever increasing volumes of data. The primary reason for this is to make it possible to write data in a consistent and reliable manner at a know flow rate. Solving this once and providing a robust solution for all is the driving force behind the creation of GraphGrid Data Pipeline.

GraphGrid Data Pipeline

The GraphGrid Data Platform, offers a robust data pipeline that manages high write throughput to Neo4j from varying input sources. The data pipeline is capable of batch operations management, keeps highly 

Read More......

Keeping Your Data Current and Flowing into Neo4j

For an enterprise to excel today a key aspect centers around utilization of the data-based business assets. To grow and succeed as a whole an enterprise must enable the usability, quality, and constant flow of its data into a connected state. Sometimes, an enterprise with a data architecture may have to deal with a complex life cycle while undergoing varying transformation processes. This makes it difficult to track the origin and flow of data as well as managing changes, audit trails, history, and a host of other critical processes.

Distributed Graph Database Platform with Neo4j

The dynamics of an increasingly distributed and connected world are shining the spotlight on a new generation of database focused on more efficiently modeling, storing and querying the connected nature of the data enterprises deal with in the real world. But as graph database usage grows, solving the issue of handling large volumes of read and write operations at scale will pose a serious challenge for the growing market.
Graph databases like Neo4j are perfect aggregation and landing place for data across the enterprise because it effectively deals with challenges presented with variations in data. As a leading graph database, enterprises are relying on Neo4j to effectively connect data for usage by real-time enterprise applications. The big challenge though is efficiently and continuously flowing data into your Neo4j graph database.
To do this effectively data connectors need to be utilized to perform ETL. The data extraction will come from 

Introducing a Graph Database into Your Data Architecture

A graph database is capable of offering long-lasting competitive advantages for organizations worldwide from startups to the largest enterprise. Interest within the enterprise sector surged dramatically the past two years and Forrester recently projected that graph databases will reach over 80% of leading enterprises within two years. Graph databases provide business benefits because graph databases make use of intuitive principles of the connections experienced between everything and everyone as a realistic representation of the way the world interacts. Even with all the benefits Graph Advantage: Why Every Enterprise Should Use a Graph Database, the introduction of a graph database into an enterprise, especially one that may have just finished getting their Hadoop implementation into production, can seem risky.
discussed in

Graph Database Data Model Flexibility

One of the great strengths of a the Neo4j graph database is it’s schema free flexible data model, which it turns out provides a very low-risk entry point as way for an enterprise to begin to explore the benefits of using a graph database. The Neo4j graph database is made to model and navigate connected data with high performance. The Neo4j graph database processes and stores data within the node and relationship structure defined by the written data, making it flexible enough to accommodate the many data models of the existing databases within an enterprise.

Enterprise Data Challenge

We know it’s not reasonable for an enterprise to go all in on a graph and try to replace existing SQL or 

Read More......

Graph Advantage: Why Every Enterprise Should Use a Graph Database

Data in the enterprise today is a bi-directional, always-flowing, continuously changing business asset. Yet it remains largely segmented and disconnected. For enterprises to begin converting their data into business value this data must be connected, understood and acted upon.

Enterprise Need for Graph Databases

Enterprise data stored in graph databases with explicit nodes and edges provides competitive advantages to organizes adopting graph databases in all industries, beyond the common use case of social media companies today. With an increasing number of connected devices producing data and the need for an advancing enterprise to be data driven in their decision making, creates a deep necessity for an enterprise to connect and understand their data in a meaningful way. When data is connected and accessible across the departments of an enterprise by using a graph database like Neo4j, their teams will benefit from a more comprehensive awareness of the business and make more informed decisions to help the enterprise grow.
Today, CIOs and CTOs aren’t just after large data volume management. They also need to gain insight and direction from their current data. In this case, relationships between data points are a lot more important than the individual data points. To effectively leverage data relationships, enterprises should rely on a graph database that treats relationship information as a first-class citizen. Additionally a graph database like Neo4j does more than just store data relationships effectively, it also is flexible in expanding the relationship with a flexible property model which provides important details about the connection between the two data points.

Enterprise Graph Database Advantages

Many leading enterprises today already experience the benefits of the Neo4j graph database to gain a 

Read More......

Monday, November 7, 2016

How Do I Load Data Into Neo4j?

The ability to load data into Neo4j is enabled through a variety of data loading APIs and tools. For processes where big data sets flow in or out of the Neo4j graph database, consideration needs to be taken to batch these read and write operations into batch sizes that are sympathetic to the master instances memory capacity as well the transactional overhead of data writes.
Neo4j provides a number of APIs to import big data sets including:
  • the Cypher transactional endpoint, which uses the Cypher query language and is simple to utilize from any programing language because files containing CQL can be structured to bulk load data and write consistently.
  • the Cypher data import capabilities exposed through LOAD CSV enable CSV files from a specified remote or local URL to be loaded and batching into desirable transaction sizes for importing massive data 

Cypher is Awesome

Cypher is a declarative pattern matching language created by Neo4j for the purposes of describing graph data representations effectively. Cypher is considered to be one of the most powerful features fore effectively expressing graph database traversals for reading and writing graph database data into Neo4j.

Cypher makes it capable for queries to do something like: “bring back my friends’ friends right now” or “give me back all pages this page is linked to within the last day” in the form of several code lines. As such, graph database queries and operations across all languages and integrations with Neo4j are able to query in a consistent manner.
The reception to Cypher has been so great that Neo4j launched the OpenCypher initiative to make Cypher the SQL for graph databases. The organizations that are joining OpenCypher are very important to the graph database movement because supporting a common graph database query language means there will be more utilization and commonality across graph database implementations.

Cypher Query Syntax

Cypher looks a lot like ASCII art because it uses textual pattern representations to express nodes and relationships. The nodes are surrounded with parenthesis which appear like circles and the relationships consist of dashes with square brackets. Here’s an example: (graphs)-[:ARE]-(everywhere)
If we want to find all people and their preferences, the query will involve identifiers for the person and thing. A pattern like “(person)-[:LIKES]->(thing)” could be utilized so it can be referred to later say, for instance, to gain access to properties such as “person.name” and “thing.title.”
Writing and representing depth based queries is one place where cypher makes it really easy to do. To look at my friends of friends is as simple as “(me)-[:FRIEND*2]->(fof). It’s actually fun to create queries because its 

Querying Your Neo4j Graph Database

There are different ways for querying, storing, managing and retrieving data in the Neo4j graph database. For a majority of users querying with Cypher is a great experience when it comes down to performing efficient and effective graph database traversals and interactions within the graph data model. For specific applications that entail further control on how graphs can be stored and queried in high performance, multi-threaded manner, a native Java API offers low-level access to graph database for granular control over the traversal and retrieval from the Neo4j graph database. When making use of the Java API, you’ll realize that you’re given great freedom and flexibility to communicate to the Neo4j graph database on how to best query your data for optimal results.

Querying with Cypher

The Cypher query language is an innovative SQL-like syntax designed for graphs that takes a more declarative approach. That means, you can tell Neo4j what you desire — not based on how to acquire it. When running a Cypher query, you’re expressing to the graph database what you need from it. In return, Neo4j has a compiler that translates the query into an executable plan describing data operation sets. The plan is conveniently arranged in a way that the obtained data from the graph is processed in a manner for each operation until a result is returned from the Cypher query.
The usual way for communicating with Neo4j consists of sending a Cypher query and parameters via an initiated POST request to the Neo4j database server. Frameworks or libraries managing wrappers around the REST API Neo4j methods from a programming language are called “drivers.” These drivers function by moving numerous queries and results over the network. Neo4j then operates by making further translation 

Native Graph Database Benefits

Choosing a native graph database provides granular control over all operations from the transactional behavior to on-disk data organization to clustering and driver protocols. With complete control over every aspect of the native graph database, fine-tune graph traversal optimizations can be performed and choices sympathetic to graph principals for reliability and ACID transactional support can be made and implemented without restriction.
Durability and certainty of the graph database records are crucial to preserve. Choosing reliability and making sure failed graph database transactions roll back maintains a consistent data state in the native graph database.

Graph Database Reliability

For graph databases, reliability is far more essential than availability since the connectedness of the data make them more highly demanding than aggregate databases. The issue of placing a graph later over an existing datastore will boil down to how data is written and which record is factual since within a graph there are two perspectives: the node from each side.
If mutations are made through multiple requests simultaneously, it’ll lead to an uncertain relationship status. A non- native graph database will resolve this by means of complex algorithms, but in the end, they simply don’t work, leaving you with erroneous data. Such incorrectness can spread through the graph as well as 

Native Graph Databases versus Non-Native Graph Databases

As with any graph database management system, native graph databases revolve around the idea of storage and use of query engines, which deals specifically with connected data persistence and traversal queries. The database query engine is in charge of operating queries, modifying, and extracting data. Native graph databases showcase the traversal of the graph data model paired with strategic index usage for locating the starting nodes for such operations. Storage involves how data can be physically housed and how it can be represented during extraction. Understanding graph database storage nuances is key to selecting the right graph database for your use case.

Relationships Matter: Non- and Native Graph Databases

Relationships are integral in any domain and requires frequent transversal. In a graph database, relationships are strictly explicit instead of being inferred. Creating explicit relationships can be gained either through the query engine on a non-native or native graph storage.
A graph database that depends on a non-native graph storage has relationships that will need to be inferred at runtime. For instance, if we intend to model an RDBMS graph, the processing engine will need to infer relationships through foreign keys while making the relationship concrete at runtime. This would be an expensive approach and won’t be feasible to traversing relationships due to the involvement of recursive joins.

Native Graph Databases and Index-Free Adjacency

As a native graph database, Neo4j turns relationships into first class entities in data records at store levels. It doesn’t place a layer of graph on an existing database storage engine. At the store level, Neo4j writes and reads data from the disk using techniques that are optimized for graph transversal.
Native graph databases utilizes a method known as the “index-free” adjacency.” It means that every data element is aimed directly to its incoming and outgoing relationships. This, in turn, point towards related