Advanced search and citation of occurrences
Hannah L. Owens
Cory Merow
Brian Maitner
Jamie M. Kass
Vijay Barve
Robert Guralnick
2024-10-28
Source:vignettes/b_Advanced.Rmd
b_Advanced.Rmd
Advanced features
This vignette demonstrates more advanced features and customization
available in occCite
. We recommend you read
vignette("Simple.Rmd", package = "occCite")
first, if you
have not already done so.
Loading data from previous GBIF searches
Querying GBIF can take quite a bit of time, especially for multiple
species and/or well-known species. In this case, you may wish to access
previously-downloaded data sets from your computer by specifying the
general location of your downloaded .zip
files.
occQuery
will crawl through your specified
GBIFDownloadDirectory
to collect all the .zip
files contained in that folder and its subfolders. It will then import
the most recent downloads that match your taxon list. These GBIF data
will be appended to a BIEN search the same as if you do the simple
real-time search (if you chose BIEN as well as GBIF), as was shown
above. checkPreviousGBIFDownload
is TRUE
by
default, but if loadLocalGBIFDownload
is TRUE
,
occQuery
will ignore checkPreviousDownload
. It
is also worth noting that occCite
does not currently
support mixed data download sources. That is, you cannot do GBIF queries
for some taxa, download previously-prepared data sets for others, and
load the rest from local data sets on your computer.
# Simple search
myOldOccCiteObject <- occQuery(x = "Protea cynaroides",
datasources = c("gbif", "bien"),
GBIFLogin = GBIFLogin,
GBIFDownloadDirectory =
system.file('extdata/', package='occCite'),
checkPreviousGBIFDownload = T)
Here is the result. Look familiar?
#GBIF search results
head(myOldOccCiteObject@occResults$`Protea cynaroides`$GBIF$OccurrenceTable);
## name longitude latitude coordinateUncertaintyInMeters day month
## 1 Protea cynaroides 18.43928 -33.95440 8 17 2
## 2 Protea cynaroides 22.12754 -33.91561 4 11 2
## 3 Protea cynaroides 18.43927 -33.95429 8 17 2
## 4 Protea cynaroides 18.43254 -34.29275 31 6 2
## 5 Protea cynaroides 18.42429 -34.02934 2167 10 2
## 6 Protea cynaroides 18.43529 -34.10545 2 8 2
## year datasetKey dataService
## 1 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## 2 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## 3 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## 4 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## 5 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## 6 2022 50c9509d-22c7-4a22-a47d-8c48425ef4a7 GBIF
## datasetName
## 1 iNaturalist Research-grade Observations
## 2 iNaturalist Research-grade Observations
## 3 iNaturalist Research-grade Observations
## 4 iNaturalist Research-grade Observations
## 5 iNaturalist Research-grade Observations
## 6 iNaturalist Research-grade Observations
#The full summary
summary(myOldOccCiteObject)
##
## OccCite query occurred on: 20 June, 2024
##
## User query type: User-supplied list of taxa.
##
## Sources for taxonomic rectification: GBIF Backbone Taxonomy
##
##
## Taxonomic cleaning results:
##
## Input Name Best Match Taxonomic Databases w/ Matches
## 1 Protea cynaroides Protea cynaroides (L.) L. GBIF Backbone Taxonomy
##
## Sources for occurrence data: gbif, bien
##
## Species Occurrences Sources
## 1 Protea cynaroides (L.) L. 2334 17
##
## GBIF dataset DOIs:
##
## Species GBIF Access Date GBIF DOI
## 1 Protea cynaroides (L.) L. 2022-03-02 10.15468/dl.ztbx8c
Getting citation data works the exact same way with previously-downloaded data as it does from a fresh data set.
#Get citations
myOldOccCitations <- occCitation(myOldOccCiteObject)
print(myOldOccCitations)
## Writing 5 Bibtex entries ... OK
## Results written to file 'temp.bib'
## AFFOUARD A, JOLY A, LOMBARDO J, CHAMP J, GOEAU H, CHOUET M, GRESSE H, BONNET P (2023). Pl@ntNet observations. Version 1.8. Pl@ntNet. https://doi.org/10.15468/gtebaa. Accessed via GBIF on 2022-03-02.
## AFFOUARD A, JOLY A, LOMBARDO J, CHAMP J, GOEAU H, CHOUET M, GRESSE H, BOTELLA C, BONNET P (2023). Pl@ntNet automatically identified occurrences. Version 1.8. Pl@ntNet. https://doi.org/10.15468/mma2ec. Accessed via GBIF on 2022-03-02.
## Chamberlain, S., Barve, V., Mcglinn, D., Oldoni, D., Desmet, P., Geffert, L., Ram, K. (2024). rgbif: Interface to the Global Biodiversity Information Facility API. R package version 3.8.1. https://CRAN.R-project.org/package = rgbif.
## Chamberlain, S., Boettiger, C. (2017). R Python, and Ruby clients for GBIF species occurrence data. PeerJ PrePrints.
## Fatima Parker-Allie, Ranwashe F (2018). PRECIS. South African National Biodiversity Institute. https://doi.org/10.15468/rckmn2. Accessed via GBIF on 2022-03-02.
## iNaturalist contributors, iNaturalist (2024). iNaturalist Research-grade Observations. iNaturalist.org. https://doi.org/10.15468/ab3s5x. Accessed via GBIF on 2022-03-02.
## Maitner, B. (2023). . R package version 1.2.6. https://CRAN.R-project.org/package = BIEN.
## Missouri Botanical Garden,Herbarium. Accessed via BIEN on NA.
## MNHN, Chagnoux S (2024). The vascular plants collection (P) at the Herbarium of the Muséum national d'Histoire Naturelle (MNHN - Paris). Version 69.384. MNHN - Museum national d'Histoire naturelle. https://doi.org/10.15468/nc6rxy. Accessed via GBIF on 2022-03-02.
## MNHN. Accessed via BIEN on NA.
## naturgucker.de. naturgucker. https://doi.org/10.15468/uc1apo. Accessed via GBIF on 2022-03-02.
## Observation.org (2024). Observation.org, Nature data from around the World. https://doi.org/10.15468/5nilie. Accessed via GBIF on 2022-03-02.
## Owens, H., Merow, C., Maitner, B., Kass, J., Barve, V., Guralnick, R. (2024). occCite: Querying and Managing Large Biodiversity Occurrence Datasets. R package version 0.5.9. https://CRAN.R-project.org/package = occCite.
## Ranwashe F (2024). Botanical Database of Southern Africa (BODATSA): Botanical Collections. Version 1.25. South African National Biodiversity Institute. https://doi.org/10.15468/2aki0q. Accessed via GBIF on 2022-03-02.
## Rob Cubey (2022). Royal Botanic Garden Edinburgh Living Plant Collections (E). Royal Botanic Garden Edinburgh. https://doi.org/10.15468/bkzv1l. Accessed via GBIF on 2022-03-02.
## SANBI. Accessed via BIEN on NA.
## Senckenberg (2020). African Plants - a photo guide. https://doi.org/10.15468/r9azth. Accessed via GBIF on 2022-03-02.
## Taylor S (2019). G. S. Torrey Herbarium at the University of Connecticut (CONN). University of Connecticut. https://doi.org/10.15468/w35jmd. Accessed via GBIF on 2022-03-02.
## Team}, {.C. (2024). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/.
## Teisher J, Stimmel H (2024). Tropicos MO Specimen Data. Missouri Botanical Garden. https://doi.org/10.15468/hja69f. Accessed via GBIF on 2022-03-02.
## Tela Botanica. Carnet en Ligne. https://doi.org/10.15468/rydcn2. Accessed via GBIF on 2022-03-02.
## UConn. Accessed via BIEN on NA.
Note that you can also load multiple species using either a vector of species names or a phylogeny (provided you have previously downloaded data for all of the species of interest), and you can load occurrences from non-GBIF data sources (e.g. BIEN) in the same query.
Performing a Multi-Species Search
In addition to doing a simple, single species search, you can also
use occCite
to search for and manage occurrence datasets
for multiple species. You can either submit a vector of species names,
or you can submit a phylogeny! The occCitation function will
return a named list of citation tables in the case of multiple
species.
occCite with a Phylogeny
Here is an example of how such a search is structured, using an unpublished phylogeny of billfishes.
library(ape)
#Get tree
treeFile <- system.file("extdata/Fish_12Tax_time_calibrated.tre", package='occCite')
phylogeny <- ape::read.nexus(treeFile)
tree <- ape::extract.clade(phylogeny, 22)
#Query databases for names
myPhyOccCiteObject <- studyTaxonList(x = tree,
datasources = "GBIF Backbone Taxonomy")
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
## handled warning: Package taxize unavailable. Skipping taxonomic rectification.
#Query GBIF for occurrence data
myPhyOccCiteObject <- occQuery(x = myPhyOccCiteObject,
datasources = "gbif",
GBIFDownloadDirectory = system.file('extdata/', package='occCite'),
loadLocalGBIFDownload = T,
checkPreviousGBIFDownload = F)
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
## Warning in gbifRetriever(searchTaxa[[i]]): GBIF unreachable; please try again later.
# What does a multispecies query look like?
summary(myPhyOccCiteObject)
##
## OccCite query occurred on: 28 October, 2024
##
## User query type: User-supplied phylogeny.
##
## Sources for taxonomic rectification: GBIF Backbone Taxonomy
##
##
## Taxonomic cleaning results:
##
## Input Name Best Match
## 1 Tetrapturus_angustirostris Tetrapturus_angustirostris
## 2 Tetrapturus_belone Tetrapturus_belone
## 3 Tetrapturus_pfluegeri Tetrapturus_pfluegeri
## Taxonomic Databases w/ Matches
## 1 Not rectified.
## 2 Not rectified.
## 3 Not rectified.
##
## Sources for occurrence data: gbif
##
## Species Occurrences Sources
## 1 Tetrapturus_angustirostris 0 0
## 2 Tetrapturus_belone 0 0
## 3 Tetrapturus_pfluegeri 0 0
##
## GBIF dataset DOIs:
##
## Species GBIF Access Date GBIF DOI
## 1 Tetrapturus_angustirostris <NA> <NA>
## 2 Tetrapturus_belone <NA> <NA>
## 3 Tetrapturus_pfluegeri <NA> <NA>
When you have results for multiple species, as in this case, you can also plot the summary figures either for the whole search…
plot(myPhyOccCiteObject)
## Error in d.tbl[[i]]: subscript out of bounds
or you can plot the results by species!
## Error in d.tbl[[i]]: subscript out of bounds
And then you can print out the citations, separated by species (or not, but in this example, they’re separate).
#Get citations
myPhyOccCitations <- occCitation(myPhyOccCiteObject)
## Error in strsplit(occResults$GBIF$Metadata$modified, "T"): non-character argument
#Print citations as text with accession dates.
print(myPhyOccCitations, bySpecies = T)
## Error: object 'myPhyOccCitations' not found