Databases: Difference between revisions

    From Consumerium development wiki R&D Wiki
    (move introductory stuff to the introductory paragraph + expand the preferences of what we want to select for databases)
    m (Juboxi moved page Database to Databases: pluralization is rational as per content)
     
    (22 intermediate revisions by the same user not shown)
    Line 1: Line 1:
    This article is about choice of database models and implementations.  
    '''This article is about choice of database models and implementations.'''


    [[Copyleft]] free to modify and free in cost software is strongly preferred over other solutions. Minimal modifications required could be another preference as that means maintenance of the chosen solution is minimized in that aspect.
    ''' Types of databases of interest being evaluated'''
    * [[#Relational databases|Relational databases]] provided by a [[w:Relational Database Management System|RDBMS]] and queried with [[w:SQL|SQL]]. Track-proven technolog applied, since the early 70's.
    * [[#NoSQL databases|NoSQL]]-type databases, ( "Not Only SQL" or have your pick on the other supposed significations ) variates include among others:
    ** [[#Subject-predicate-object databases|Subject-predicate-object databases]] are implemented by [[w:graph databases|graph databases]], specialized native [[w:Triplestore|triplestore]]s and piggy-packing solutions that use an RDBMS to store and query the triplets and the networks they compose.
    ** [[#Graph databases|Graph databases]] would intuitively appear more advanced than using RDF-triplet composed semantic networks but are not much different on the outside. Both jump through the same hoops but with different efficiency and grace.
    ** [[#Object databases|Object databases]] are old but on the rise with NoSQL-based thinking and the modern needs, like leanness, real-time need  and scaleability for which the other solutions might be too limiting.


    '''Known types of databases'''
    All of these may be used to store semantic data though advantages and disadvantages vary depending on task at hand.
    * [[w:Relational database|Relational database]] provided by a [[w:RDBMS|RDBMS]]
    * Subject-predicate-object database are provided by [[w:Triplestore|Triplestore]]s , native or on top of RDBMS
    * [[w:Graph databases|Graph databases]] would intuitively appear more advanced than using RDF-triplet composed semantic networks.
    * [[w:Object database|Object database]]s were a supposed fad in the late 80's and early 90's as [[w:relational algebra]] based systems are quite old.


    == Relational database ==
    For more options and information see the '''[[w:Database model|Wikipedia article on database models]]'''.
    Together Consumerium and Consumium run all the 3 major free full fledged RDBMS there are: [[w:MariaDB]], [[w:MySQL]] and [[w:Postgresql]].
    ----


    === Relevant relational database powered software ===
    == Relational databases ==
    * [[MediaWiki]]s run on [[w:MySQL]] or the better and more ethical binary compatible drop-in replacement [[w:MariaDB]]. A recent fork by the original MySQL founders from MySQL.


    The Netherlands server serving the [[Consumium]] free social media run on MariaDB 10
    Relational databases work by storing data in [[w:Table (database)|tabular form]] where [[w:column (database)|columns]] represent data items of predetermined type and [[w:row (database)|rows]] represent the values each "item" has. Relational databases are accessed mainly with [[w:SQL|SQL]] ( Structured Query Language ). However the RDBMS converts that into [[w:relational algebra|relational algebra]] and optimizes that and the relational algebra query actually returns the result table that has those columns and rows you requested.


    * https://d.consumium.org since 2013 and the rest mentioned on the landing page over there since 2016
    Together Consumerium and Consumium run all the 3 major free full fledged RDBMS:


    * [[w:MariaDB]], a binary compatible drop-in replacement for MySQL that provides some technical advantages and the warm feeling that this is a fork of MySQL by the original MySQL AB founders.
    * [[w:MySQL]], the most widely known of free databases powering this [[Development Wiki]]
    * [[w:PostgreSQL]] powering the https://media.consumium.org [[w:GNU MediaGoblin]]


    * Postgresql is also in use working as data storage for [[w:GNU MediaGoblin]] at https://media.consumium.org
    ----


    == Subject-predicate-object database ==
    == NoSQL databases ==
    {{Q|A '''NoSQL''' (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.|Wikipedia|[[w:NoSQL|NoSQL]]}}
     
    {{Q|Honestly 'Not Only SQL' sounds best from what I've read.|Lowest Troll|[[w:NoSQL|NoSQL]]}}
    All the following database types can be considered variations of NoSQL.
    * [[#Subject-predicate-object database|Subject-predicate-object database]]
    * [[#Graph database|Graph databases]]
    * [[#Object database|Object database]]
     
    ----
     
    === Subject-predicate-object databases ===
    Subject-predicate-object databases basically construct [[w:semantic]] networks from interlinked atomic units called a [[w:triplet]] so they are not fundamentally different from graph databases in functionality and utility offered.
    Subject-predicate-object databases basically construct [[w:semantic]] networks from interlinked atomic units called a [[w:triplet]] so they are not fundamentally different from graph databases in functionality and utility offered.


    These networks may be queried with a suitable query language such as [[w:SPARQL]] which in practice allows you to compose semantic queries.
    These networks may be queried with a suitable query language such as [[w:SPARQL]] which in practice allows you to compose semantic queries.


    === Relevant subject-predicate-object database powered systems to interoperate with ===
    {{Q|'''SPARQL''' is a [[w:recursive acronym|recursive acronym]] and stands for '''SPARQL Protocol and RDF Query Language'''). It is an [[w:RDF query language|RDF query language]], that is, a [[w:Semantic_Query|semantic]] [[w:query language|query language]] for [[w:database|database]]s, able to retrieve and manipulate data stored in [[w:Resource Description Framework|Resource Description Framework (RDF)]] format.|Wikipedia|[[w:SPARQL|SPARQL]]}}
    * [[Semantic MediaWiki]]
     
    * [[DBpedia]]
    {{Q|A '''triplestore''' or '''RDF store''' is a purpose-built [[w:database|database]] for the storage and retrieval of [[w:Resource Description Framework#Overview|triples]] through [[w:Semantic Query|semantic queries]].|Wikipedia|[[w:Triplestore]]}}
    * [[Wikidata]]


    === Things to consider in selection of triplestore ===
    ==== Relevant subject-predicate-object database powered systems to interoperate with ====
    * '''[[Semantic MediaWiki]]''' is system for inputting and querying semantic data within the MediaWiki and it is implemented as extension(s).
    * '''[[DBpedia]]''' the original structured data harvesting effort for the MediaWiki wikis
    * '''[[Wikidata]]''' is effort by the [[Wikimedia Foundation]] since 2012 to provide a central storage for data items instead of manually replicating it in various language versions


    A [[w:triplestore]] maybe a native implementation from ground up or be standing on the shoulders of a standard RDBMS system where actual [[w:SQL]] is formulated by the interpreter and then queried from SQL. This probably has upsides and downsides.


    === Lists and comparisons of subject-predicate-object databases and SPARQL implementations ===
    ==== Things to consider in selection of triplestore ====
    {{Q|Some '''subject-predicate-object databases''' (also known as ''[[w:triplestore|triplestore]]s'') have been built as database engines from scratch, while others have been built on top of existing commercial relational database engines (e.g., SQL-based).|Wikipedia|[[w:list of subject-predicate-object databases|list of subject-predicate-object databases]]}}
     
    ==== Lists and comparisons of subject-predicate-object databases and SPARQL implementations ====
    * [[w:List of subject-predicate-object databases|Wikipedia's list of subject-predicate-object databases]]
    * [[w:List of subject-predicate-object databases|Wikipedia's list of subject-predicate-object databases]]
    * [[w:List of SPARQL implementations|Wikipedia's list of SPARQL implementations]]
    * [[w:List of SPARQL implementations|Wikipedia's list of SPARQL implementations]]


    == Graph database ==
    ----
     
    === Graph databases ===
    A [[w:graph database|graph database]] stores and queries [[w:Graph (abstract data type)|graphs]].
     
    These graphs may be stored in and constructed from RDF triplets readily so they are quite alike and overlapping in functionality offered but the query performance varies (see talk page for more).
     
    ''' Lists of graph databases '''
    * [[w:Graph_database#List_of_graph_databases|Wikipedia's list of graph databases]]
    * [[w:Graph_database#List_of_graph_databases|Wikipedia's list of graph databases]]


    == Object database ==
    ''' Free reading on graph databases '''
    * [[w:Object_database#Timeline|Wikipedia's chronological list of object databases]]
    * [https://neo4j.com/graph-databases-book/ Free Graph Databases book from the great O'Reilly] kindly provided by [[w:Neo4j]]
     
    ----
     
    === Object databases ===
    {{Q|An object database stores complex data and relationships between data directly, without mapping to relational rows and columns, and this makes them suitable for applications dealing with very complex data.|Wikipedia|[[w:Object_database#Comparison_with_RDBMSs|functional difference between object and relational databases]]}}
    ''' Lists of object databases '''
    * [[w:Object_database#Timeline|Wikipedia's list of object databases by publication date.]]
     
    ----
    == Useful resources for database related matters ==
    * [http://swat.cse.lehigh.edu/projects/lubm/ Lehigh University Benchmark] may be useful in evaluating [[w:semantic query|semantic query]] performance
    * [http://www.cambridgesemantics.com/semantic-university/sparql-by-example A very good SPARQL tutorial Cambridge Semantics]

    Latest revision as of 10:53, 27 August 2016

    This article is about choice of database models and implementations.

    Types of databases of interest being evaluated

    • Relational databases provided by a RDBMS and queried with SQL. Track-proven technolog applied, since the early 70's.
    • NoSQL-type databases, ( "Not Only SQL" or have your pick on the other supposed significations ) variates include among others:
      • Subject-predicate-object databases are implemented by graph databases, specialized native triplestores and piggy-packing solutions that use an RDBMS to store and query the triplets and the networks they compose.
      • Graph databases would intuitively appear more advanced than using RDF-triplet composed semantic networks but are not much different on the outside. Both jump through the same hoops but with different efficiency and grace.
      • Object databases are old but on the rise with NoSQL-based thinking and the modern needs, like leanness, real-time need and scaleability for which the other solutions might be too limiting.

    All of these may be used to store semantic data though advantages and disadvantages vary depending on task at hand.

    For more options and information see the Wikipedia article on database models.


    Relational databases[edit | edit source]

    Relational databases work by storing data in tabular form where columns represent data items of predetermined type and rows represent the values each "item" has. Relational databases are accessed mainly with SQL ( Structured Query Language ). However the RDBMS converts that into relational algebra and optimizes that and the relational algebra query actually returns the result table that has those columns and rows you requested.

    Together Consumerium and Consumium run all the 3 major free full fledged RDBMS:


    NoSQL databases[edit | edit source]

    “A NoSQL (originally referring to "non SQL" or "non relational") database provides a mechanism for storage and retrieval of data which is modeled in means other than the tabular relations used in relational databases.”

    “Honestly 'Not Only SQL' sounds best from what I've read.”

    All the following database types can be considered variations of NoSQL.


    Subject-predicate-object databases[edit | edit source]

    Subject-predicate-object databases basically construct w:semantic networks from interlinked atomic units called a w:triplet so they are not fundamentally different from graph databases in functionality and utility offered.

    These networks may be queried with a suitable query language such as w:SPARQL which in practice allows you to compose semantic queries.

    SPARQL is a recursive acronym and stands for SPARQL Protocol and RDF Query Language). It is an RDF query language, that is, a semantic query language for databases, able to retrieve and manipulate data stored in Resource Description Framework (RDF) format.”

    “A triplestore or RDF store is a purpose-built database for the storage and retrieval of triples through semantic queries.”

    Relevant subject-predicate-object database powered systems to interoperate with[edit | edit source]

    • Semantic MediaWiki is system for inputting and querying semantic data within the MediaWiki and it is implemented as extension(s).
    • DBpedia the original structured data harvesting effort for the MediaWiki wikis
    • Wikidata is effort by the Wikimedia Foundation since 2012 to provide a central storage for data items instead of manually replicating it in various language versions


    Things to consider in selection of triplestore[edit | edit source]

    “Some subject-predicate-object databases (also known as triplestores) have been built as database engines from scratch, while others have been built on top of existing commercial relational database engines (e.g., SQL-based).”

    Lists and comparisons of subject-predicate-object databases and SPARQL implementations[edit | edit source]


    Graph databases[edit | edit source]

    A graph database stores and queries graphs.

    These graphs may be stored in and constructed from RDF triplets readily so they are quite alike and overlapping in functionality offered but the query performance varies (see talk page for more).

    Lists of graph databases

    Free reading on graph databases


    Object databases[edit | edit source]

    “An object database stores complex data and relationships between data directly, without mapping to relational rows and columns, and this makes them suitable for applications dealing with very complex data.”

    Lists of object databases


    Useful resources for database related matters[edit | edit source]