Main data types and classes

The data served by the NBA consists of four main data types:

  • Specimen
  • Taxon
  • Multimedia
  • Geo

Additionally, the data type Metadata stores miscellaneous information about NBA settings. Each of the data types is modelled as an R6 class and therefore has its own members such as fields and methods. Documentation about a specific class can be retrieved in the standard manner, e.g. ?Specimen Each class of the data model can be instantiated and has a toJSONString and toList method returning the object’s JSON representation and the object’s data as a list datatype, respectively.

API Client classes

The interaction with the API is accomplished by the API client classes:

  • SpecimenClient
  • TaxonClient
  • MultimediaClient
  • GeoClient
  • MetadataClient

The client class is by default initialized to connect to the base URL http://api.biodiversitydata.nl/v2. For testing purposes, this can be set to a different URL, see ?SpecimenClient for details.

Queries

Concept

With the SpecimentClient created above, the query endpoint for specimens can now be reached via the query function. Query parameters can be specified as a list with named parameters. To query for instance for specimen records that have the type status holotype and a female sex, one can pass a named list as the queryParams parameter to the query function:

# specify two query conditions
l <- list(identifications.typeStatus="holotype", sex="female")

# run query
res <- client$query(queryParams=l)

The query function then returns an object of class Response, which, in turn, has a field content of class QueryResult. From the QueryResult, the single result items can be accessed as follows:

## [1] "Specimen" "R6"

Note that by default, both conditions are connected by a logical AND. The queryParams passed as a list thus correspond to basic human readable queries. For more advanced queries, containing different logical operators or nested sub-queries, the user can specify the query in a QuerySpec object.

Advanced queries

Using the QuerySpec object

Advanced queries with different operators than AND or nested query conditions can not be accomplished by simply passing the query parameters as a list. Instead, a query is modeled as a QuerySpec object which captures the relationships between multiple query terms. Please also refer to the NBA QuerySpec documentation for more information.

A QuerySpec object usually consists of one or more QueryCondition objects, specifying query terms. A QueryCondition object usually contains the fields field, operator, and value(see also ?QueryCondition). These fields can be specified in the constructor. If, for example, we want to query for records with a unitID equal to L.4304195, a QueryCondition would look as follows:

Now, a QuerySpec object can be assembled with the QueryCondition(s) passed as a list:

Below, we show an example of how to nest multiple query conditions. The query conditions below define to query for specimens of sex ** female* and family Equidae and of rank Species.

Extending the constraint to also include specimens of rank Subspecies, we can combine the latter condition with an additional one using the method or:

Size of the query result set

By default, the NBA returns the first 10 hits for a given query. In, for instance, a query without parameters has many hits

## [1] 35279515

but only the first 10 are returned in the resultSet:

length(res$content$resultSet)
## [1] 10

In order to increase the size of a resultSet, a size parameter can be passed to the constructor of a QuerySpec object. Below, we will get the first 1000 records of the query above:

## [1] 1000

Operators

In the above examles we searched for fields that exactly match a given string using the operator EQUALS that is specified in the user-defined QueryCondition. However, for most fields there are more operators for matching available, including e.g. partial matching and ignoring cases:

## [1] 0
## [1] 505

The function get_field_info on a certain field for a certain datatype lists which operators are allowed for that field e.g. for the field identifications.defaultClassification.genus. Let’s look at other operators which can be used for this field:

##  [1] "EQUALS"             "NOT_EQUALS"         "EQUALS_IC"         
##  [4] "NOT_EQUALS_IC"      "CONTAINS"           "NOT_CONTAINS"      
##  [7] "IN"                 "NOT_IN"             "MATCHES"           
## [10] "NOT_MATCHES"        "STARTS_WITH"        "NOT_STARTS_WITH"   
## [13] "STARTS_WITH_IC"     "NOT_STARTS_WITH_IC"

Often useful is e.g. the operator IN, which allowes matching against multiple values given as a vector:

## {
##   "conditions": [
##     {
##       "field": "identifications.defaultClassification.genus",
##       "operator": "IN",
##       "value": ["Phoenix", "Trachycarpus"]
##     }
##   ]
## }

For numeric or date fields, common comparison operators such as LT (less than) or GT (greater than) or BETWEEn are implemented:

##  [1] "EQUALS"        "NOT_EQUALS"    "EQUALS_IC"     "NOT_EQUALS_IC"
##  [5] "LT"            "LTE"           "GT"            "GTE"          
##  [9] "BETWEEN"       "NOT_BETWEEN"   "IN"            "NOT_IN"
## [1] 521

For more information, please also refer to the NBA documentation on operators.