
insurance, healthcare, retail, energy, power, manufactur-
ing, government, marketing, supply chain, transporta-
tion, etc. This diverse and wide applicability of graphs
in many domains is also observed in [42]. Some of the
concrete use cases of graph databases have been pro-
vided in [51, 39, 48, 46]. Perhaps, the most common
example of graph database usage is fraud detection.
For example, [47] demonstrated a detailed example sce-
nario of traversing through a graph containing insurance
claims information and patients medical records to de-
tect fraudulent claims.
Similar to the different types of workloads in rela-
tional databases, there are also two different types of
graph database workloads. The first type focuses on
low-latency graph traversal and pattern matching. They
are often called graph queries. These queries only touch
small local regions of a graph, for example, finding 2-
hop neighbors of a vertex, or the shortest path between
two vertices. Due to the low-latency requirement and
the interactive nature of the graph queries, people also
call them graph OLTP. Graph OLTP is often used in
exploratory analysis and case studies. The second type
of graph workload is graph algorithms, which usually
involve iterative, long running processing on the entire
graph. Good examples are Pagerank and community de-
tection algorithms. Graph algorithms are often used for
BI-ish applications. Because of this reason, people also
call them graph OLAP. Recently, a new trend emerges
that combines graph and machine learning together,
called graph ML. For example, graph embedding or ver-
tex embedding are used to transform graph structures
into vector space which are then included as features
for ML model training. Graph neural network (GNN) is
another example of graph ML. Quite often graph ML is
lumped together with the graph OLAP workload.
3. GRAPH MODELS
Patient 1
Disease 1
Disease 2
isa
diagnosedWith
64572345
hasID
Diabetes
hasName
64572326
hasID
Type 2
Diabetes
hasName
198076
hasID
Alice Brown
hasName
Diagnosis 1
03/24/2020
happensOn
hasDiagnosis
(a) RDF Model
Properties:
ID = 198076
name = “Alice Brown”
Label: diagnosedWith
Properties:
time = “03/24/2020”
Properties:
ID = 64572326
name = “Type 2 diabetes”
Properties:
ID = 64572345
name = “Diabetes”
Label: isa
Label: disease
Label: disease
Label: patient
(b) Property Graph Model
Figure 1: RDF and property graph models
Whenever talking about a graph database, we need
to first talk about the graph model(s) that it supports.
The two prominent graph models supported by most
commercial graph databases are the RDF model and
the property graph model.
RDF Model. RDF is among the suite of W3C stan-
dards to support Linked Data and Knowledge Graphs [52].
An RDF graph is a directed edge-labeled graph, rep-
resented by the subject–predicate–object triples. Fig-
ure 1(a) shows an example graph represented in the
RDF model. This graph captures the following infor-
mation: A patient, named Alice Brown, with patient ID
19806, is diagnosed with Type 2 Diabetes which has
disease ID 64572326 on March 24, 2020; and Type 2
Diabetes is sub-type of Diabetes which has disease ID
6472345. For example, in the (Patient 1) −[hasName]→
(Alice Brown) triple, Patient 1 is the subject, hasName
is the predicate, and Alice Brown is the object. The
RDF model is often used in knowledge representation
and inference as well as sematic web applications. For
example, DBPedia [21] and YAGO [56] both utilize RDF
to represent their knowledge graphs and support queries
on the knowledge bases using SPARQL [53].
Property Graph Mo del. In comparison, a property
graph is a direct graph where each vertex and edge
can have arbitrary number of properties. Vertices/edges
can also be tagged with labels to distinguish the dif-
ferent types of objects/relationships in the graph. Fig-
ure 1(b) shows how the same information captured in
the RDF graph in Figure 1(a) is represented in the prop-
erty graph model. Here, instead of representing the ID
and the name of a patient or disease as separate nodes,
the property graph model can fold them in as the prop-
erties of the patient and the disease nodes. Similarly,
the diagnosis time can be represented as a property of
the diagnosedWith edge, eliminating the need to cre-
ate a separate diagnosis node and its connecting edges
to the patient and disease nodes. In general, the prop-
erty graph model can capture the same information with
fewer nodes and edges than the RDF model, as illus-
trated by this example. This is because a piece of in-
formation can only be represented either as a node or
an edge in the RDF model, whereas the property graph
model can also define it as an attribute of an existing
node or edge, thus leading to fewer number of nodes
and edges in the graph. The property graph model is
often used for applications that require graph traversal,
pattern matching, path and graph analysis.
Today, although both models are supported in the
graph database industry, as we will show in Section 5,
the property graph model has overwhelming endorse-
ment, despite the fact that RDF is a much older model.
All the major offerings we surveyed in the paper sup-
port the property graph model, and two of them also
support the RDF model. In [27], Hartig proposed a
formal transformations between the RDF and property
graph models, in the hope to reconcile both models.
4. GRAPH QUERY LANGUAGES
On the graph OLTP side, for RDF graphs, there is
the standard SPARQL query language [53]. For prop-
erty graphs, there are many languages being used and
proposed, but no clear winner. One of the top con-
tenders is Tinkerpop Gremlin [1] which is supported
评论