Amazon Neptune のアイコン

Amazon Neptune Specialized2017年〜

A managed graph database service for processing highly connected datasets

What It Does

Amazon Neptune is a fully managed graph database supporting both property graphs (Apache TinkerPop Gremlin) and RDF (SPARQL). It models relationships between data using nodes (vertices) and edges, executing complex relationship queries at high speed. Data is automatically replicated across up to three AZs for high availability.

Use Cases

Used for analyzing friend and follower relationships in social networks, detecting transaction patterns for fraud detection, building knowledge graphs, recommendation engines, and managing network topologies.

Everyday Analogy

Think of it like a relationship diagram. While an RDB manages data in table format, Neptune stores relationships themselves as data - 'Person A is friends with Person B, and Person B is a colleague of Person C.' It can quickly traverse multi-hop relationships like 'friends of friends.'

What Is Neptune?

Amazon Neptune is a graph database that efficiently stores and queries relationships between data. Queries like 'products purchased by friends of friends,' which require multiple JOIN operations in an RDB, can be executed as fast graph traversals in Neptune. With Neptune Serverless, it auto-scales based on workload and reduces costs during idle periods.

Gremlin and SPARQL

Neptune supports two query languages. Gremlin is a traversal language for property graphs, where you build graphs with arbitrary properties on nodes and edges. SPARQL is a query language for RDF (Resource Description Framework), expressing data as subject-predicate-object triples. SPARQL is suited for building knowledge graphs and ontologies, while Gremlin is better for application data modeling. To deepen your practical knowledge of Gremlin and SPARQL, related books (Amazon) are helpful.

Getting Started

Create a cluster in the Neptune console and select an instance class. It deploys within a VPC, so connect from EC2 or Lambda via the VPC. You can try queries in the Gremlin console or Neptune Workbench (Jupyter Notebook). Choose Neptune Serverless to get started without pre-configuring capacity.

Things to Watch Out For

  • Deployed within a VPC, so it cannot be accessed directly from the public internet. A bastion host or VPN is required
  • Neptune Serverless can auto-scale based on workload, but latency may increase during cold starts
共有するXB!