Quantcast
Channel: Baeldung
Viewing all articles
Browse latest Browse all 3627

Introduction to JanusGraph

$
0
0

1. Introduction

In this tutorial, we’re going to look at JanusGraph and Gremlin.

JanusGraph is an open-source, massively scalable graph database. It has been designed to support huge graphs – large enough to require multiple database nodes working together – whilst still allowing us to work with them efficiently.

This is achieved by building on top of other, well-supported technologies as needed – e.g. Cassandra, HBase, Elasticsearch. We also get native integration with the Apache TinkerPop stack, including the Gremlin console and query language.

2. Running JanusGraph and Gremlin

To run JanusGraph locally, we first need to download the latest release, version 1.1.0 at the time of writing. Once downloaded, we can unpack this and we’re ready to run. This requires a Java 8+ JVM to be already installed.

Once unpacked, we can start a Gremlin session by running ./bin/gremlin.sh or ./bin/gremlin.bat from inside our unpacked directory:

-> % ./bin/gremlin.sh
         \,,,/
         (o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.tinkergraph
08:45:56 INFO org.apache.tinkerpop.gremlin.hadoop.jsr223.HadoopGremlinPlugin.getCustomizers - HADOOP_GREMLIN_LIBS is set to: /Users/baeldung/janusgraph-1.1.0/lib
08:45:56 WARN org.apache.hadoop.util.NativeCodeLoader.<clinit> - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.spark
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports

This Gremlin instance is able to host a JanusGraph database in-process if needed, which can be very useful for testing things out. We can access this using the JanusGraphFactory.open command, pointing at an appropriate configuration file:

gremlin> graph = JanusGraphFactory.open('conf/janusgraph-inmemory.properties')
08:46:06 INFO  org.apache.commons.beanutils.FluentPropertyBeanIntrospector.introspect - Error when creating PropertyDescriptor for public final void org.apache.commons.configuration2.AbstractConfiguration.setProperty(java.lang.String,java.lang.Object)! Ignoring this property.
08:46:06 INFO  org.janusgraph.diskstorage.configuration.builder.ReadConfigurationBuilder.setupTimestampProvider - Set default timestamp provider MICRO
08:46:06 INFO  org.janusgraph.graphdb.idmanagement.UniqueInstanceIdRetriever.getOrGenerateUniqueInstanceId - Generated unique-instance-id=c0a801777851
08:46:06 INFO  org.janusgraph.diskstorage.configuration.ExecutorServiceBuilder.buildFixedExecutorService - Initiated fixed thread pool of size 16
08:46:06 INFO  org.janusgraph.graphdb.database.StandardJanusGraph.<init> - Gremlin script evaluation is disabled
08:46:06 INFO  org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller.initializeTimepoint - Loaded unidentified ReadMarker start time 2025-02-24T08:46:06.291970Z into org.janusgraph.diskstorage.log.kcvs.KCVSLog$MessagePuller@21eedcde
==>standardjanusgraph[inmemory:[127.0.0.1]]

This will create a new graph variable in our client. This represents our graph database and will act as the entrypoint for all future calls.

There are a number of standard configuration files we can use for different storage and indexing backends, but the janusgraph-inmemory.properties file is the simplest. This configures an in-memory database with no persistence.

2.1. Standalone Server

Alternatively, we can start a standalone JanusGraph server using the ./bin/janusgraph-server.sh start command:

-> % ./bin/janusgraph-server.sh start
/Users/baeldung/janusgraph-1.1.0/conf/gremlin-server/gremlin-server.yaml will be used to start JanusGraph Server in background
Server started 8163

By default this will start a server running on port 8182, running with the same janusgraph-inmemory.properties file. However, because this is now a running server we can connect our clients to it instead.

We can point our Gremlin client at a remote server by using the :remote connect command:

gremlin> :remote connect tinkerpop.server conf/remote.yaml session
08:54:19 INFO  org.apache.tinkerpop.gremlin.driver.Connection.<init> - Created new connection for ws://localhost:8182/gremlin
08:54:19 INFO  org.apache.tinkerpop.gremlin.driver.ConnectionPool.<init> - Opening connection pool on Host{address=localhost/127.0.0.1:8182, hostUri=ws://localhost:8182/gremlin} with core size of 1
==>Configured localhost/127.0.0.1:8182-[96b90c45-4aef-405d-a336-5823bcde3995]

Once done, we also need to tell Gremlin to send all commands to the remote server:

gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182]-[96b90c45-4aef-405d-a336-5823bcde3995] - type ':remote console' to return to local mode

At this point, anything we do will work with this remote server instead.

Note that in doing this we don’t need to assign a graph variable – this is done implicitly to point to the database server we’ve connected to.

3. Graph Structure

Graph databases such as JanusGraph represent data in a graph. This means that we have a combination of vertices and edges joining them.

In JanusGraph, edges connect to exactly two vertices and have a direction – they always point from one vertex to another. As such, these are always directed graphs. However, they need not be acyclic – it’s valid for us to have cycles of any length between our vertices.

We represent data in our database as labels and properties on vertices and edges. Edges are required to have a label, and vertices can optionally have one. This defines the type of data we’re representing – e.g. “article” or “written_by”. In addition, we can have an arbitrary number of key/value pairs on each vertex or edge – e.g. “title: Introduction to JanusGraph”.

4. Loading Example Data

The Gremlin CLI comes with a special factory to load some example data into a database:

gremlin> GraphOfTheGodsFactory.loadWithoutMixedIndex(graph, true)
==>null

This loads a sample data set called The Graph Of The Gods:

This represents a small portion of the Roman pantheon and related data about them, but it’s enough to demonstrate how to use JanusGraph.

5. Querying Data

Now that we have a database with sample data, we’re ready to query it.

Before this, we need to create a traversal source:

gremlin> g = graph.traversal()
==>graphtraversalsource[standardjanusgraph[inmemory:[127.0.0.1]], standard]

This creates a new variable as the entry point for traversing our graph data.

5.1. Querying Vertices

The first thing we can do is to query for individual vertices.

The simplest form of this is to list every vertex without any filtering. The V() function on our variable gives us access to this:

gremlin> g.V()
08:13:56 WARN  org.janusgraph.graphdb.transaction.StandardJanusGraphTx$3.execute - Query requires iterating over all vertices [[]]. For better performance, use indexes
==>v[4136]
==>v[8232]
==>v[12328]
==>v[4184]
==>v[8280]
==>v[4216]
==>v[8312]
==>v[12408]
==>v[4256]
==>v[4288]
==>v[8384]
==>v[4304]

However, this isn’t very useful. We can also filter these to only vertices that meet certain criteria. For example, we can filter to only those vertices with certain properties using the has() function:

gremlin> g.V().has('name', 'hercules')
==>v[4136]

It would also be useful to see more about the returned vertex. So far we’re just getting the internal ID. If we want, we can see the entire map of values for the returned vertex:

gremlin> g.V().has('name', 'hercules').valueMap()
==>[name:[hercules],age:[30]]

Or we can get individual details instead:

gremlin> g.V().has('name', 'hercules').label()
==>demigod
gremlin> g.V().has('name', 'hercules').values('name')
==>hercules

We can also assign the results of our queries to a variable for later use:

gremlin> hercules = g.V().has('name', 'hercules').next()
==>v[4128]

We’re using the next() call here to return the actual vertex reference instead of the graph traversal. We need this to be able to reference it later:

gremlin> g.V(hercules).valueMap()
==>[name:[hercules],age:[30]]

5.2. Traversing Edges

Without being able to traverse edges, there’s no benefit to representing our data as a graph.

We traverse edges in our query by using the in() and out() functions – depending on the direction of the edge we want to traverse. When using out(), this follows edges that point out from the node in question:

gremlin> g.V().has('name', 'hercules').out('father').valueMap()
==>[name:[jupiter],age:[5000]]

Equally, when using in() this follows edges that point into the node:

gremlin> g.V().has('name', 'jupiter').in('father').valueMap()
==>[name:[hercules],age:[30]]

This will also correctly allow us to traverse cases where there are many matching edges to follow:

gremlin> g.V().has('name', 'hercules').out('battled').valueMap()
==>[name:[cerberus]]
==>[name:[hydra]]
==>[name:[nemean]]

As such, we may need to filter these further. As before, this is done using the has() function:

gremlin> g.V().has('name', 'hercules').out('battled').has('name', 'hydra').valueMap()
==>[name:[hydra]]

Unsurprisingly, we can follow these as far as we need to as well, allowing for more complex discoveries:

gremlin> g.V().has('name', 'hercules').out('battled').out('lives').in('lives').valueMap()
==>[name:[cerberus]]
==>[name:[pluto],age:[4000]]

This query will:

  • Find the vertex with the name “hercules”.
  • Follow all outgoing edges labeled ‘battled’ from this node.
  • Follow all outgoing edges labeled ‘lives’ from the resulting vertices.
  • Follow all incoming edges labeled ‘lives’ into the resulting vertices.

So effectively, it gives us everyone living in the same location as one of the monsters Hercules battled.

6. Adding and Editing Data

Being able to query our data is important, but if we can’t manipulate it then it’s not very useful.

We can add new vertices to our graph using the graph.addVertex() call:

gremlin> theseus = graph.addVertex('human')
==>v[16552]

Note that, because this is called on the graph itself and not the graph traversal, we get the new vertex back directly.

We can also specify some properties in the addVertex() call:

gremlin> theseus = graph.addVertex(T.label, 'human', 'name', 'theseus')
==>v[12528]

Since vertex labels are optional, they’re represented in this call as if they were properties with a special name of T.label.

We can later update the properties on this vertex using the property() call:

gremlin> theseus.property('name', 'theseus')
==>vp[name->theseus]

We can also create edges using the addEdge() method:

gremlin> cerberus = g.V().has('name', 'cerberus').next()
==>v[12496]
gremlin> theseus.addEdge('met', cerberus)
08:00:36 INFO  org.janusgraph.graphdb.relations.RelationIdentifier.<clinit> - Use default relation delimiter: -
==>e[3z2-9o0-hed-9n4][12528-met->12496]

We can now make use of these in querying our data:

gremlin> g.V().has('name', 'theseus').out('met').valueMap()
==>[name:[cerberus]]

7. Conclusion

In this article, we’ve taken a very brief look at JanusGraph and what we can do with it. There’s a lot more that can be achieved using this database, so next time you need to use a graph database, why not take a look?

The post Introduction to JanusGraph first appeared on Baeldung.
       

Viewing all articles
Browse latest Browse all 3627

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>