Install virtuoso if not already present:

Tabular Data

We start up our Virtuoso server, wait for it to come up, and then connect:

We can represent any data as RDF with a little care. For instance, consider the nycflights13 data. First, we must represent any primary or foreign keys in any table as URIs, indicated by a prefix, and not by bare strings:

uri_flights <- flights %>% 
  mutate(tailnum = paste0("planes:", tailnum),
         carrier = paste0("airlines:", carrier))

We write the data.frames out as nquads. Recall that each cell of a data.frame can be represented as a triple, in which the column is the predicate, the primary key (or row number) the subject, and the cell value the object. We turn column names and primary keys into URIs using a prefix based on the table name.

write_nquads(airlines,  "airlines.nq", key = "carrier", prefix = "airlines:")
write_nquads(planes,  "planes.nq", key = "tailnum", prefix = "planes:")
write_nquads(uri_flights,  "flights.nq", prefix = "flights:")

We’re ready to import all these triples. This may take a few minutes:

system.time(
  vos_import(con, c("flights.nq", "planes.nq", "airlines.nq"))
)
#>    user  system elapsed 
#>   0.047   0.009 133.654

The data from all three tables is now reduced into a single triplestore graph, one triple for each data point. Rather than joining tables, we can write SPARQL query that names the columns we want.

List Data

Transform JSON (or list data) into triples. In this case, we have a large JSON blob (or R list) containing metadata on all rOpenSci packages:

download.file("https://raw.githubusercontent.com/ropensci/roregistry/gh-pages/raw_cm.json", "raw_cm.json")
nq <- jsonld::jsonld_to_rdf("raw_cm.json") # drops implicit URIs if not base URIs
writeLines(nq, gzfile("ro.nq.gz"))

And bulk-import

vos_import(con, "ro.nq.gz")

Find all packages where “Carl Boettiger” is an “author”, and return: package name, license, and co-author surnames:

query <-
"PREFIX schema: <http://schema.org/>
SELECT DISTINCT ?coauthor  ?license ?package 
 WHERE {
 ?s schema:name ?package ;
    schema:author ?author ;
    schema:license ?license ;
    schema:author ?coauth .
 ?author schema:givenName 'Carl' .
 ?author schema:familyName 'Boettiger' .
 ?coauth schema:familyName ?coauthor
}"

vos_query(con, query) %>% distinct() %>%
mutate(license = basename(license), package = basename(package)) # Tidy up URIs into names
#>       coauthor      license
#> 1    Boettiger          MIT
#> 2         Hart      CC0-1.0
#> 3         Lapp BSD-3-Clause
#> 4          Vos BSD-3-Clause
#> 5          Ram      CC0-1.0
#> 6    Boettiger BSD-3-Clause
#> 7    Boettiger          MIT
#> 8    Boettiger      GPL-3.0
#> 9    Boettiger          MIT
#> 10   Boettiger      GPL-3.0
#> 11   Boettiger          MIT
#> 12   Boettiger          MIT
#> 13   Boettiger      CC0-1.0
#> 14   Boettiger      CC0-1.0
#> 15   Boettiger          MIT
#> 16      Salmon      GPL-3.0
#> 17       Jones          MIT
#> 18 Chamberlain BSD-3-Clause
#> 19 Chamberlain          MIT
#> 20 Chamberlain          MIT
#> 21 Chamberlain      CC0-1.0
#> 22 Chamberlain      CC0-1.0
#> 23 Chamberlain          MIT
#> 24  Shumelchyk BSD-3-Clause
#> 25   Boettiger          MIT
#> 26 Chamberlain          MIT
#> 27         Zhu          MIT
#> 28        Jahn          MIT
#> 29   Boettiger          MIT
#> 30         Ram          MIT
#> 31 Temple Lang      CC0-1.0
#> 32  Wainwright      CC0-1.0
#> 33         Ram          MIT
#> 34   Boettiger          MIT
#> 35        Dyck          MIT
#> 36   Boettiger      CC0-1.0
#> 37       Harte      CC0-1.0
#> 38 Chamberlain      CC0-1.0
#> 39         Ram      CC0-1.0
#> 40   Boettiger          MIT
#> 41         Ram          MIT
#> 42 Chamberlain          MIT
#> 43   Boettiger          MIT
#> 44   Boettiger      CC0-1.0
#> 45 Temple Lang      CC0-1.0
#>                                                                   package
#> 1                                emld: Ecological Metadata as Linked Data
#> 2                                 rfigshare: An R Interface to 'figshare'
#> 3                                                O for the 'NeXML' Format
#> 4                                                O for the 'NeXML' Format
#> 5                                 rfigshare: An R Interface to 'figshare'
#> 6                                                O for the 'NeXML' Format
#> 7                 arkdb: Archive and Unarchive Databases Using Flat Files
#> 8                  codemetar: Generate 'CodeMeta' Metadata for R Packages
#> 9  EML: Create and Manipulate Data using the Ecological Metadata Language
#> 10                 piggyback: Managing Larger Data on a GitHub Repository
#> 11                    rdflib: Tools to Manipulate and Query Semantic Data
#> 12                                  rdryad: Access for Dryad Web Services
#> 13                                rfigshare: An R Interface to 'figshare'
#> 14                                   rfishbase: R Interface to 'FishBase'
#> 15                             virtuoso: Interface to Virtuoso using ODBC
#> 16                 codemetar: Generate 'CodeMeta' Metadata for R Packages
#> 17 EML: Create and Manipulate Data using the Ecological Metadata Language
#> 18                                               O for the 'NeXML' Format
#> 19                        rcrossref: Client for Various 'CrossRef' 'APIs'
#> 20                                  rdryad: Access for Dryad Web Services
#> 21                                rfigshare: An R Interface to 'figshare'
#> 22                                   rfishbase: R Interface to 'FishBase'
#> 23                 rplos: Interface to the Search API for 'PLoS' Journals
#> 24                                               O for the 'NeXML' Format
#> 25           datasauce: Create and manipulate Schema.org Dataset metadata
#> 26           datasauce: Create and manipulate Schema.org Dataset metadata
#> 27                        rcrossref: Client for Various 'CrossRef' 'APIs'
#> 28                        rcrossref: Client for Various 'CrossRef' 'APIs'
#> 29                        rcrossref: Client for Various 'CrossRef' 'APIs'
#> 30                        rcrossref: Client for Various 'CrossRef' 'APIs'
#> 31                                   rfishbase: R Interface to 'FishBase'
#> 32                                   rfishbase: R Interface to 'FishBase'
#> 33      rfisheries: Programmatic Interface to the 'openfisheries.org' API
#> 34      rfisheries: Programmatic Interface to the 'openfisheries.org' API
#> 35      rfisheries: Programmatic Interface to the 'openfisheries.org' API
#> 36          rgpdd: R Interface to the Global Population Dynamics Database
#> 37          rgpdd: R Interface to the Global Population Dynamics Database
#> 38          rgpdd: R Interface to the Global Population Dynamics Database
#> 39          rgpdd: R Interface to the Global Population Dynamics Database
#> 40                 rplos: Interface to the Search API for 'PLoS' Journals
#> 41                 rplos: Interface to the Search API for 'PLoS' Journals
#> 42                      taxview: Tools for Vizualizing Data Taxonomically
#> 43                      taxview: Tools for Vizualizing Data Taxonomically
#> 44 treebase: Discovery, Access and Manipulation of 'TreeBASE' Phylogenies
#> 45 treebase: Discovery, Access and Manipulation of 'TreeBASE' Phylogenies