GBIF Downloads

GBIF provides two ways to get occurrence data: through the /occurrence/search route (see occ_search() and occ_data()), or via the /occurrence/download route (many functions, see below). occ_search()/occ_data() are more appropriate for smaller data, while occ_download*() functions are more appropriate for larger data requests. Note that the download service is equivalent to downloading a dataset from the GBIF website - but doing it here makes it reproducible (and easier once you learn the ropes)!

The download functions are:

occ_download() is the function to start off with when using the GBIF download service. With it you can specify what query you want. Unfortunately, the interfaces to the search vs. download services are different, so we couldn’t make the rgbif interface to occ_search/occ_data the same as occ_download.

Be aware that you can only perform 3 downloads simultaneously, so plan wisely. To help with this limitation, we are working on a queue helper, but it’s not ready yet.

Let’s take a look at how to use the download functions:

Load rgbif

library("rgbif")

Kick off a download

Instead of passing parameters like taxonkey = 12345 in occ_search, for downloads we pass the whole thing as a character string because you can use operators other than = (equal to).

What occ_download returns is not the data itself! When you send the request to GBIF, they have to prepare it first, then when it’s done you can download it.

What occ_download returns is some useful metadata that tells you about the download, and helps us check and know when the download is done.

Check download status

After running occ_download, we can pass the resulting object to occ_download_meta - with primary goal of checking the download status.

Continue running occ_download_meta until the Status value is SUCCEEDED or KILLED. If it is KILLED that means something went wrong - get in touch with us. If SUCCEEDED, then you can proceed to the next step (downloading the data with occ_download_get).

Before we go to the next step, there’s another function to help you out.

With occ_download_list you can get an overview of all your download requests, with

x <- occ_download_list()
x$results <- tibble::as_tibble(x$results)
x
#> $meta
#>   offset limit endofrecords count
#> 1      0    20        FALSE   211
#>
#> $results
#> # A tibble: 20 x 18
#>                        key                    doi
#>  *                   <chr>                  <chr>
#>  1 0000796-171109162308116 doi:10.15468/dl.nv3r5p
#>  2 0000739-171109162308116 doi:10.15468/dl.jmachn
#>  3 0000198-171109162308116 doi:10.15468/dl.t5wjpe
#>  4 0000122-171020152545675 doi:10.15468/dl.yghxj7
#>  5 0000119-171020152545675 doi:10.15468/dl.qiowtc
#>  6 0000115-171020152545675 doi:10.15468/dl.tdbkzn
#>  7 0010067-170714134226665 doi:10.15468/dl.ro6qj1
#>  8 0010066-170714134226665 doi:10.15468/dl.bhekhi
#>  9 0010065-170714134226665 doi:10.15468/dl.xy4nfp
#> 10 0010064-170714134226665 doi:10.15468/dl.hsqp84
#> 11 0010062-170714134226665 doi:10.15468/dl.h2apik
#> 12 0010061-170714134226665 doi:10.15468/dl.1srstq
#> 13 0010059-170714134226665 doi:10.15468/dl.2me5hk
#> 14 0010058-170714134226665 doi:10.15468/dl.sjmxvf
#> 15 0010057-170714134226665 doi:10.15468/dl.f28182
#> 16 0010056-170714134226665 doi:10.15468/dl.4t2qim
#> 17 0010055-170714134226665 doi:10.15468/dl.lumz7s
#> 18 0010054-170714134226665 doi:10.15468/dl.wfkgqm
#> 19 0010053-170714134226665 doi:10.15468/dl.fintow
#> 20 0010050-170714134226665 doi:10.15468/dl.a2h9gu
#> # ... with 16 more variables: license <chr>, created <chr>, modified <chr>,
#> #   status <chr>, downloadLink <chr>, size <dbl>, totalRecords <int>,
#> #   numberDatasets <int>, request.creator <chr>, request.format <chr>,
#> #   request.notificationAddresses <list>, request.sendNotification <lgl>,
#> #   request.predicate.type <chr>, request.predicate.predicates <list>,
#> #   request.predicate.key <chr>, request.predicate.value <chr>

Canceling downloads

If for some reason you need to cancel a download you can do so with occ_download_cancel or occ_download_cancel_staged.

occ_download_cancel cancels a job by download key, while occ_download_cancel_staged cancels all jobs in PREPARING or RUNNING stage.

Fetch data

After you see the SUCCEEDED status on calling occ_download_meta, you can then download the data using occ_download_get.

This only download data to your machine - it does not read it into R. You can now move on to importing into R.

Citing download data

The nice thing about data retrieved via GBIF’s download service is that they provide DOIs for each download, so that you can give a link that resolves to the download with metadata on GBIF’s website. And it makes for a nice citation.

Using the funciton gbif_citaiton we can get citations for our downloads, with the output from occ_download_get or occ_download_meta.

You’ll notice that the datasets slot is NULL - because when using occ_download_meta, we don’t yet have any information about which datasets are in the download.

But if you use occ_download_get you then have the individual datasets, and we can get citatations for each idividual dataset in addition to the entire download.

Here, we get the overall citation as well as citations (and data rights) for each dataset.

Please do cite the data you use from GBIF!