taxizedb - Tools for Working with Taxonomic Databases on your machine
Docs: https://docs.ropensci.org/taxizedb/
taxize is a heavily used taxonomic toolbelt package in R - However, it makes web requests for nearly all methods. That is fine for most cases, but when the user has many, many names it is much more efficient to do requests to a local SQL database.
Data sources
Not all taxonomic databases are publicly available, or possible to mash into a SQLized version. Taxonomic DB’s supported:
- NCBI: text files are provided by NCBI, which we stitch into a sqlite db
 
- ITIS: they provide a sqlite dump, which we use here
 
- The PlantList: created from stitching together csv files. this source is no longer updated as far as we can tell. they say they’ve moved focus to the World Flora Online
 
- Catalogue of Life: created from Darwin Core Archive dump.
 
- GBIF: created from Darwin Core Archive dump. right now we only have the taxonomy table (called gbif), but will add the other tables in the darwin core archive later
 
- Wikidata: aggregated taxonomy of Open Tree of Life, GLoBI and Wikidata. On Zenodo, created by Joritt Poelen of GLOBI.
 
- World Flora Online: http://www.worldfloraonline.org/
 
Update schedule for databases:
- NCBI: since 
db_download_ncbi creates the database when the function is called, it’s updated whenever you run the function 
- ITIS: since ITIS provides the sqlite database as a download, you can delete the old file and run 
db_download_itis to get a new dump; they I think update the dumps every month or so 
- The PlantList: no longer updated, so you shouldn’t need to download this after the first download. hosted on Amazon S3
 
- Catalogue of Life: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest COL data into a SQLite database thats hosted on Amazon S3
 
- GBIF: a GitHub Actions job runs once a day at 00:00 UTC, building the lastest GBIF data into a SQLite database thats hosted on Amazon S3
 
- Wikidata: last updated April 6, 2018. Scripts are available to update the data if you prefer to do it yourself.
 
- World Flora Online: since 
db_download_wfo creates the database when the function is called, it’s updated whenever you run the function 
Links:
Get in touch in the issues with any ideas on new data sources.
All databases are SQLite.
 
Package API
This package for each data sources performs the following tasks:
- Downloaded taxonomic databases 
db_download_*
 
- Create 
dplyr SQL backend via dbplyr::src_dbi - src_*
 
- Query and get data back into a data.frame - 
sql_collect
 
- Manage cached database files - 
tdb_cache
 
- Retrieve immediate descendents of a taxon - 
children
 
- Retrieve the taxonomic hierarchies from local database - 
classification
 
- Retrieve all taxa descending from a vector of taxa - 
downstream
 
- Convert species names to taxon IDs - 
name2taxid
 
- Convert taxon IDs to species names - 
taxid2name
 
- Convert taxon IDs to ranks - 
taxid2rank
 
You can use the src connections with dplyr, etc. to do operations downstream. Or use the database connection to do raw SQL queries.
 
install
cran version
dev version