You are here

Comparing Samos Document Search Performance between Apache Solr and Neo4j

Title: Comparing Samos Document Search Performance between Apache Solr and Neo4j.
4 views
0 downloads
Name(s): Stallard, Adam Preston, author
Zhao, Peixiang, professor co-directing thesis
Smith, Shawn R., professor co-directing thesis
Haiduc, Sonia, committee member
Nistor, Adrian, committee member
Florida State University, degree granting institution
College of Arts and Sciences, degree granting college
Department of Computer Science, degree granting department
Type of Resource: text
Genre: Text
Master Thesis
Issuance: monographic
Date Issued: 2017
Publisher: Florida State University
Place of Publication: Tallahassee, Florida
Physical Form: computer
Physical Form: online resource
Extent: 1 online resource (59 pages)
Language(s): English
Abstract/Description: The Distributed Oceanographic Match-Up Service (DOMS) currently under development is a centralized service that allows researchers to easily match in situ and satellite oceanographic data from distributed sources to facilitate satellite calibration, validation, and retrieval algorithm development. The Shipboard Automated Meteorological and Oceanographic System (SAMOS) initiative provides routine access to high-quality marine meteorological and near-surface oceanographic observations from research vessels. SAMOS is one of several endpoints connected into the DOMS network, providing in-situ data for the match-up service. DOMS in-situ endpoints currently use Apache Solr as a backend search engine on each node in the distributed network. While Solr is a high-performance solution that facilitates creation and maintenance of indexed data, it is limited in the sense that its schema is fixed. The property graph model escapes this limitation by removing any prohibiting requirements on the data model, and permitting relationships between data objects. This paper documents the development of the SAMOS Neo4j property graph database including new search possibilities that take advantage of the property graph model, performance comparisons with Apache Solr, and a vision for graph databases as a storage tool for oceanographic data. The integration of the SAMOS Neo4j graph into DOMS is also described. Various data models are explored including spatial-temporal records from SAMOS added to a time tree using Graph Aware technology. This extension provides callable Java procedures within the CYPHER query language that generate in-graph structures used in data retrieval. Neo4j excels at performing relationship and path-based queries, which challenge relational-SQL databases because they require memory intensive joins due to the limitation of their design. Consider a user who wants to find records over several years, but only for specific months. If a traditional database only stores timestamps, this type of query could be complex and likely prohibitively slow. Using the time tree model in a graph, one can specify a path from the root to the data which restricts resolutions to certain time frames (e.g., months). This query can be executed without joins, unions, or other compute-intensive operations, putting Neo4j at a computational advantage to the SQL database alternative. That said, while this advantage may be useful, it should not be interpreted as an advantage to Solr in the context of DOMS. Solr makes use of Apache Lucene indexing at its core, while Neo4j provides its own native schema indexes. Ultimately they each provide unique solutions for data retrieval that are geared for specific tasks. In the DOMS setting it would appear that Solr is the most suitable option, as there seems to be very limited use cases where Neo4j does outperform Solr. This is primarily because the use case as a subsetting tool does not require the flexibility and path-based queries that graph database tools offer. Rather, DOMS nodes are using high performance indexing structures to quickly filter large amounts of raw data that are not deeply connected, a feature of large data sets where graph queries would indeed become useful.
Identifier: FSU_SUMMER2017_Stallard_fsu_0071N_13933 (IID)
Submitted Note: A Thesis submitted to the Department of Computer Science in partial fulfillment of the requirements for the degree of Master of Science.
Degree Awarded: Spring Semester 2017.
Date of Defense: April 17, 2017.
Keywords: CYPHER, database, graph, Neo4j, SAMOS, Solr
Bibliography Note: Includes bibliographical references.
Advisory Committee: Peixiang Zhao, Professor Co-Directing Thesis; Shawn Smith, Professor Co-Directing Thesis; Sonia Haiduc, Committee Member; Adrian Nistor, Committee Member.
Subject(s): Computer science
Persistent Link to This Record: http://purl.flvc.org/fsu/fd/FSU_SUMMER2017_Stallard_fsu_0071N_13933
Owner Institution: FSU

Choose the citation style.
Stallard, A. P. (2017). Comparing Samos Document Search Performance between Apache Solr and Neo4j. Retrieved from http://purl.flvc.org/fsu/fd/FSU_SUMMER2017_Stallard_fsu_0071N_13933