Spin up a download request for GBIF occurrence data.

occ_download(..., body = NULL, type = "and", format = "DWCA",
  user = NULL, pwd = NULL, email = NULL, curlopts = list())

occ_download_prep(..., body = NULL, type = "and", format = "DWCA",
  user = NULL, pwd = NULL, email = NULL, curlopts = list())

Arguments

...

One or more of query arguments to kick of a download job. If you use this, don't use body parameter. All inputs must be character strings. See Details.

body

if you prefer to pass in the payload yourself, use this parameter. if use this, don't pass anythig to the dots. accepts either an R list, or JSON. JSON is likely easier, since the JSON library jsonlite requires that you unbox strings that shouldn't be auto-converted to arrays, which is a bit tedious for large queries. optional

type

(character) One of equals (=), and (&), or (|), lessThan (<), lessThanOrEquals (<=), greaterThan (>), greaterThanOrEquals (>=), in, within, not (!), like

format

(character) The download format. One of DWCA (default), SIMPLE_CSV, or SPECIES_LIST

user

(character) User name within GBIF's website. Required. See Details.

pwd

(character) User password within GBIF's website. Required. See Details.

email

(character) Email address to recieve download notice done email. Required. See Details.

curlopts

list of named curl options passed on to HttpClient. see curl_options for curl options

Details

Argument passed have to be passed as character (e.g., 'country = US'), with a space between key ('country'), operator ('='), and value ('US'). See the type parameter for possible options for the operator. This character string is parsed internally.

The value can be comma separated, in which case we'll turn that into a predicate combined with the OR operator, for example, "taxonKey = 2480946,5229208" will turn into

'{
   "type": "or",
   "predicates": [
     {
      "type": "equals",
      "key": "TAXON_KEY",
      "value": "2480946"
     },
     {
      "type": "equals",
      "key": "TAXON_KEY",
      "value": "5229208"
     }
   ]
}'

Acceptable arguments to ... are:

  • taxonKey = 'TAXON_KEY'

  • scientificName = 'SCIENTIFIC_NAME'

  • country = 'COUNTRY'

  • publishingCountry = 'PUBLISHING_COUNTRY'

  • hasCoordinate = 'HAS_COORDINATE'

  • hasGeospatialIssue = 'HAS_GEOSPATIAL_ISSUE'

  • typeStatus = 'TYPE_STATUS'

  • recordNumber = 'RECORD_NUMBER'

  • lastInterpreted = 'LAST_INTERPRETED'

  • continent = 'CONTINENT'

  • geometry = 'GEOMETRY'

  • basisOfRecord = 'BASIS_OF_RECORD'

  • datasetKey = 'DATASET_KEY'

  • eventDate = 'EVENT_DATE'

  • catalogNumber = 'CATALOG_NUMBER'

  • year = 'YEAR'

  • month = 'MONTH'

  • decimalLatitude = 'DECIMAL_LATITUDE'

  • decimalLongitude = 'DECIMAL_LONGITUDE'

  • elevation = 'ELEVATION'

  • depth = 'DEPTH'

  • institutionCode = 'INSTITUTION_CODE'

  • collectionCode = 'COLLECTION_CODE'

  • issue = 'ISSUE'

  • mediatype = 'MEDIA_TYPE'

  • recordedBy = 'RECORDED_BY'

Note

see downloads for an overview of GBIF downloads methods

geometry

When using the geometry parameter, make sure that your well known text (WKT) is formatted as GBIF expects it. They expect WKT to have a counter-clockwise winding order. For example, the following is clockwise POLYGON((-19.5 34.1, -25.3 68.1, 35.9 68.1, 27.8 34.1, -19.5 34.1)), whereas they expect the other order: POLYGON((-19.5 34.1, 27.8 34.1, 35.9 68.1, -25.3 68.1, -19.5 34.1))

note that coordinate pairs are longitude latitude, longitude first, then latitude

you should not get any results if you supply WKT that has clockwise winding order.

also note that occ_search()/occ_data() behave differently with respect to WKT in that you can supply counter-clockwise WKT to those functions but they treat it as an exclusion, so get all data not inside the WKT area.

Methods

  • occ_download_prep: prepares a download request, but DOES NOT execute it. meant for use with occ_download_queue()

  • occ_download: prepares a download request and DOES execute it

Authentication

For user, pwd, and email parameters, you can set them in one of three ways:

  • Set them in your .Rprofile file with the names gbif_user, gbif_pwd, and gbif_email

  • Set them in your .Renviron/.bash_profile (or similar) file with the names GBIF_USER, GBIF_PWD, and GBIF_EMAIL

  • Simply pass strings to each of the parameters in the function call

We strongly recommend the second option - storing your details as environment variables as it's the most widely used way to store secrets.

See ?Startup for help.

Query length

GBIF has a limit of 12,000 characters for a download query. This means that you can have a pretty long query, but at some point it may lead to an error on GBIF's side and you'll have to split your query into a few.

References

See the API docs http://www.gbif.org/developer/occurrence#download for more info, and the predicates docs http://www.gbif.org/developer/occurrence#predicates

Examples

# NOT RUN {
# occ_download("basisOfRecord = LITERATURE")
# occ_download('taxonKey = 3119195')
# occ_download('decimalLatitude > 50')
# occ_download('elevation >= 9000')
# occ_download('decimalLatitude >= 65')
# occ_download("country = US")
# occ_download("institutionCode = TLMF")
# occ_download("catalogNumber = Bird.27847588")

# download format
# z <- occ_download('decimalLatitude >= 75', format = "SPECIES_LIST")

# res <- occ_download('taxonKey = 7264332', 'hasCoordinate = TRUE')

# pass output directly, or later, to occ_download_meta for more information
# occ_download('decimalLatitude > 75') %>% occ_download_meta

# Multiple queries
# occ_download('decimalLatitude >= 65', 'decimalLatitude <= -65', type="or")
# gg <- occ_download('depth = 80', 'taxonKey = 2343454', type="or")

# complex example with many predicates
# shows example of how to do date ranges for both year and month
# res <- occ_download(
#  "taxonKey = 2480946,5229208",
#  "basisOfRecord = HUMAN_OBSERVATION,OBSERVATION,MACHINE_OBSERVATION",
#  "country = US",
#  "hasCoordinate = true",
#  "hasGeospatialIssue = false",
#  "year >= 1999",
#  "year <= 2011",
#  "month >= 3",
#  "month <= 8"
# )

# Using body parameter - pass in your own complete query
## as JSON
query1 <- '{"creator":"sckott",
  "notification_address":["myrmecocystus@gmail.com"],
  "predicate":{"type":"and","predicates":[
    {"type":"equals","key":"TAXON_KEY","value":"7264332"},
    {"type":"equals","key":"HAS_COORDINATE","value":"TRUE"}]}
 }'
# res <- occ_download(body = query1, curlopts=list(verbose=TRUE))

## as a list
library(jsonlite)
query <- list(
  creator = unbox("sckott"),
  notification_address = "myrmecocystus@gmail.com",
  predicate = list(
    type = unbox("and"),
    predicates = list(
      list(type = unbox("equals"), key = unbox("TAXON_KEY"),
        value = unbox("7264332")),
      list(type = unbox("equals"), key = unbox("HAS_COORDINATE"),
        value = unbox("TRUE"))
    )
  )
)
# res <- occ_download(body = query, curlopts = list(verbose = TRUE))

# Prepared query
occ_download_prep("basisOfRecord = LITERATURE")
occ_download_prep("basisOfRecord = LITERATURE", format = "SIMPLE_CSV")
occ_download_prep("basisOfRecord = LITERATURE", format = "SPECIES_LIST")
# }