Amazon Neptune

A fully managed graph database service supporting Gremlin, openCypher, and SPARQL that can traverse billions of relationships with low latency

Overview

Amazon Neptune is a fully managed graph database service that supports both property graphs and RDF (Resource Description Framework). It efficiently stores and queries data with complex relationships between entities, such as friend connections in social networks, transaction patterns for fraud detection, and knowledge systems in knowledge graphs. With up to 15 read replicas for read scaling, storage replication across 6 AZs, and point-in-time recovery, it handles mission-critical workloads.

Choosing a Graph Model and Query Language

Neptune supports two graph models - property graphs and RDF - and you choose based on your use case. Property graphs attach properties (key-value pairs) to nodes (vertices) and edges, making them well-suited for social networks, recommendations, and fraud detection. You can choose between two query languages: Apache TinkerPop's Gremlin and openCypher (a declarative language developed by Neo4j). Gremlin is traversal-based and enables flexible graph exploration but has a steeper learning curve. openCypher uses a declarative SQL-like syntax that's intuitive to write, making it more approachable for developers with relational database experience. RDF represents data as triples (subject-predicate-object) and is used for building ontologies and knowledge graphs. Its query language is SPARQL, which conforms to W3C standards. In practice, property graphs are typically chosen for application data, while RDF is chosen for knowledge representation and the Semantic Web.

Graph-Specific Problems That RDBs Cannot Solve

The true strength of graph databases lies in multi-hop relationship traversals that relational database JOINs cannot process at practical speeds. For example, a query to recommend products purchased by friends-of-friends-of-friends that the user hasn't bought yet requires three levels of self-joins in an RDB, and the query won't return within a practical timeframe once the user count exceeds several million. Neptune uses index-free adjacency traversal, so latency degrades gracefully even as the number of hops increases. In fraud detection, a typical query detects circular patterns in money transfer networks (A to B to C to A), which can be concisely expressed using Gremlin's repeat().until() step. In knowledge graphs, SPARQL can execute inference queries like finding patients who are prescribed drugs that interact with a given medication, supporting medical safety decision-making. With a key-value store like DynamoDB, such relationship traversals require multiple queries from the application side, making them disadvantageous in both complexity and latency. For a deeper understanding of graph databases and their applications, books on graph databases (Amazon) are a great resource.

Neptune Serverless vs. Neptune Analytics

Beyond traditional provisioned instances, Neptune offers two newer options: Neptune Serverless and Neptune Analytics. Neptune Serverless automatically scales with workload demand and shrinks to a baseline NCU (Neptune Capacity Unit) level during idle periods, making it ideal for applications with variable traffic. It can be configured from a minimum of 1 NCU to a maximum of 128 NCU, with scaling completing in seconds. Neptune Analytics, launched in 2023, is an analytics-focused engine that loads the entire graph into memory for high-speed analytical queries. It includes built-in graph algorithms such as PageRank, shortest path, and community detection, returning interactive-speed results even on graphs with billions of edges. You can load data from Neptune Database snapshots into Neptune Analytics, enabling a pattern where OLTP runs on Neptune Database and analytics runs on Neptune Analytics. From a cost perspective, Neptune Serverless charges only for what you use, making it suitable for development and test environments, while Neptune Analytics charges based on analysis job execution time, making it suitable for large-scale batch analytics.

共有するXB!