图数据库原理下载_在线阅读_10

is_680203

暂无简介

图数据库原理 Report Graph Databases CS341 Distributed Information Systems University of Basel Lukas Beck lukas.beck@stud.unibas.ch 21. December 2012 Contents 1 Introduction & Motivation 2 2 Graph Theory 3 2.1 Single-Relational Graph . . . . . . . . . . . . . . . . . . . ...

Report Graph Databases CS341 Distributed Information Systems University of Basel Lukas Beck lukas.beck@stud.unibas.ch 21. December 2012 Contents 1 Introduction & Motivation 2 2 Graph Theory 3 2.1 Single-Relational Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2.2 Multi-Relational Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 2.3 Property Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3 Graph Databases 5 3.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 3.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3 Graph Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3.1 Local Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 3.3.2 Local Recommendation . . . . . . . . . . . . . . . . . . . . . . . . . . 7 3.3.3 Collaborative Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . 8 3.4 Real World Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4 Implementations 9 5 Conclusion 9 2 1 Introduction & Motivation When working with large graphs, one realizes that traditional databases – for example relational databases – reach their limits. The main problem lies in the storage of graphs. To illustrate the problem, let us look at an example: Figure 1 illustrates our graph and problem. User Alex is interested in further rec- ommendations for Pizza places. The figure shows one possible (graphical) solution by looking at other user’s recommendations based on Alex’ favorites. Figure 1: An example graph illustrating a graphical solution for finding additional rec- ommendations for user Alex based on his existing recommendations. user recommends Alex C Alex D Luke D Luke E Table 1 Table 1 shows the recommends edges for the presented graph in a relational database. By a simple lookup, we find all recommendations of Alex. Then, we have to reach other users that recommended the same places (Luke in our case). This is done by self-joining the table. Now, we again have to use a self-join to ‘traverse’ from Luke to his recommendations. The following SQL query shows a possible solution: 3 SELECT * FROM recommendations r1 JOIN recommendations r2 ON r2.user <> r1.user AND r2.recommends = r1.recommends JOIN recommendations r3 ON r3.user = r2.user Notice that the filtering of already recommended places is omitted for reasons of readability. The main problem with this approach are the very expensive join operations. Imagine that there are hundred of thousands of recommendations in the database and the database system has to do several self-joins to answer the query. Thus, we need another approach to answer such queries efficiently: graph databases. However, before introducing these types of databases we first need to have a look at different types of graphs used in this context. 2 Graph Theory There exists a variety of graph models. This section serves as an introduction to the most widely known and relevant ones used in graph databases. Notice that the list of presented types is by no means complete. 2.1 Single-Relational Graph This type of graph is the classical textbook graph. Formally, it is defined as follows: Definition 1. A single-relational graph is given as a pair G = ⟨V,E⟩ with V : a set of vertices, also called nodes E: a set of pairs of vertices denoted as edges If the pairs of E are ordered, the graph is directed, otherwise it is undirected. Figure 2: A single-relational graph with undirected edges. Nodes are represented by circles. Edges connect different nodes. 4 The main limitation of this graph is that it only models one type of relation – thus the name single-relational. This means that there is only one type of nodes and one type of edges. Graphs of this type are very restraining to use in real world problems. 2.2 Multi-Relational Graph Multi-relational graphs overcome the main limitation of single-relational graphs by introducing labels to every edge. This enables to model explicitly different types of edges. Also, these labels can introduce different types of nodes implicitly. Formally, it is defined as a single-relational graph G = ⟨V,E⟩ with a function ` that maps edges to labels. In graph theory, G is called an edge-labeled graph. Figure 3: An example figure of a directed multi-relational graph. In this example, the labeling is defined as ` ∶ E → {follows, recommends, employs}. The types of nodes are implicitly given by the labels where follows maps users to users, recommends users to places and employs places to users. 2.3 Property Graph Multi-relational graph are already very suited to use as a basis for modelling problems. However, it is complicated to store additional, non-relational data in these graphs1. Property graphs solve exactly this problem by enabling to store additional key-value pairs to vertices and edges. 1This is done by introducing additional nodes and relating them to the initial node. 5 Figure 4: An example of a directed property graph. In addition to labels, we can store key-value pairs for each node and edge. Property graphs are very powerful and can be used to model other graph types. This or similar types of graphs are most often used in actual graph database implementations. Property graphs – as defined here – are binary edge graphs which means that an edge connects two nodes. This simplifies the actual implementation as from a node’s point of view, an edge is reduced to a link to another node instead of a list of nodes. However, it is possible but cumbersome to model n-ary edge graphs like hypergraphs. 3 Graph Databases 3.1 Definition Sadly, there is currently no textbook definition of what a graph database is. Additionally, almost every database can model graphs and in particular property graphs too2. This fact obviously raises the question what graph databases are if every database is able to model graphs. Rodriguez [2010] provided a very natural definition of graph databases to capture the basic intention: “A graph database is any storage system that provides index-free adjacency.” [Rodriguez, 2010, Slide 27] Effectively, this means that a graph database provides some sort of pointer to follow edges to other nodes. As we saw in the introduction, in relational databases this is not the case. An edge is saved using some sort of identifier which is then used to find the actual edge or node using an index. Figure 5 illustrates this difference between explicit and implicit adjacency: In both figures, a index is built to access the nodes. However, when storing the graph implicitly, in each node only reference ids to adjacent nodes are saved. When following an edge, this id is used to find the actual node B through the index. This lookup scales logarithmically with the number of nodes in the index. 2We saw one example for relational databases in the introduction. 6 On the other hand, when storing the graph explicitly there exists a direct ‘pointer’ to adjacent nodes. In this case, we do not have to search through the index again but just ‘follow’ the pointer. As this lookup is done locally, it has constant time complexity. (a) (b) Figure 5: Difference between storing edges of graphs implicitly (5a) and explicitly (5b). 3.2 Properties Out of the above definition, we can derive directly some of the properties that graph databases provide. First and most importantly graph databases emphasis on relation between things. Compared to expensive join operations known in relational databases, graph databases scale better when looking up adjacent elements as the explicit storage enables a local and not a global lookup. Because of the underlying, flexible graph model, graph databases are suited for fast changing and evolving data sets. Graph databases enable to address and solve graph problems in a more natural way. Examples of such problems are path finding algorithms or graph traversal. Some examples of graph traversals are shown in the following section. 3.3 Graph Traversal Graph databases provide a natural way to solve problems that can be solved by traversing over the graph. This sections illustrates some of the possible problems. To give an idea of how an actual query looks like, examples using Gremlin3 are provided as well. 3.3.1 Local Search Local search is a simple problem where one is interested (is searching) for nodes around the neighborhood of given nodes. For example, Figure 6 answers the question what Pizza Places did users that are being followed by Mike recommend. The traversal starts at user Mike, follows every edge labeled follows and then follows every recommends edge. The result contains every node found at the end of the traversal. 3Gremlin is graph traversal language which provides a simple interface to work with graphs. More information on Gremlin can be found at http://gremlin.tinkerpop.com. 7 In Gremlin, the following query answers the example: gremlin > v.outE(’follows ’).inV.outE(’recommends ’).inV.name ==>A ==>B Figure 6: An example of locally searching on a graph. We are interested in the Pizza places that users followed by Mike did recommend. 3.3.2 Local Recommendation A more sophisticated example assumes that if you like a Pizza place, you will also like other recommendations of users that like your Pizza place. Using this assumption, we can provide recommendations given a Pizza place as seen in Figure 7. To answer such a query, we follow every recommendation and list every other recommendation of the found users as a result: gremlin > v.inE(’recommends ’).outV.outE(’recommends ’). inV.filter{it != v}.name ==>C ==>E 8 Figure 7: An example of a simple recommendation query: “What Pizza Places may you also like if you like D?” 3.3.3 Collaborative Filtering Instead of defining a given Pizza place for recommendation, we can use the recommenda- tions an user made directly. This is one possible way of defining collaborative filtering and can be used to give users automatically additional recommendations based on their existing ones. This is the example we used in the introduction in Figure 1 where we were interested in recommendations for user Alex. Effectively, we simply add a step before doing local recommendation and use every Pizza place the user recommended as an ‘input’ for local recommendation: gremlin > v.outE(’recommends ’).inV.inE(’recommends ’). outV.filter{it != v}.outE(’recommends ’).inV.name ==>E 3.4 Real World Use Cases Several characteristics of the data or the problems can lead in using graph databases. If data is highly connected and/or emphasizes on relational aspects, a graph databases may be a suitable solution. Similarly if the problem can be solved in traversing the graph or exploiting relational properties, a graph database may be optimal. One real world example which relies on graph database is fraud detection. The problem in detecting fraudulent activities involves following multiple connections among persons which is natural task for graph databases. A similar problem faces criminal detection and identifying groups or communication of criminals across multiple hops. 9 4 Implementations There exists a variety of graph database implementations. Wikipedia4 lists already 20 different projects around graph databases. The most notable (or known) is arguably Neo4j5 which uses the property graph as its underlying model. Important to notice is that there is no standardized access to the graph database. The most prominent approaches are Cypher6 – the query language used in Neo4j – and Blueprints7 – a generic Java API similar to JDBC. Cypher is only supported in Neo4j whereas Blueprint provides several implementations – among others Neo4j, OrientDB and Dex. 5 Conclusion We showed several problems with traditional databases when working with large graphs. To approach these problems, we then introduced graph databases, their most common graph model and their properties. To further illustrate these systems, we presented several graph traversal examples that can be addressed and solved efficiently in graph databases. Afterwards, we presented real world problems that use such systems. 4http://en.wikipedia.org/wiki/Graph_database#Graph_database_projects 5http://neo4j.org 6http://docs.neo4j.org/chunked/milestone/cypher-query-lang.html 7http://blueprints.tinkerpop.com/ 10 References Marko A. Rodriguez. The graph traversal programming pattern. WindyCity DB confer- ence, 2010. URL http://slideshare.net/slidarko/graph-windycitydb2010. Introduction & Motivation Graph Theory Single-Relational Graph Multi-Relational Graph Property Graph Graph Databases Definition Properties Graph Traversal Local Search Local Recommendation Collaborative Filtering Real World Use Cases Implementations Conclusion

本文档为【图数据库原理】，请使用软件OFFICE或WPS软件打开。作品中的文字与图均可以修改和编辑，图片更改请在作品中右键图片并更换，文字修改请直接点击文字进行修改，也可以新增和删除文档中的内容。

图数据库原理

热门搜索

历史搜索