About the package
We are developing taxize as a package to allow users to search over many websites for species names (scientific and common) and download up and downstream taxonomic hierarchical information - and many other things.
The functions in the package that hit a specific API have a prefix and suffix separated by an underscore. They follow the format of service_whatitdoes. For example, gnr_resolve uses the Global Names Resolver API to resolve species names. General functions in the package that don't hit a specific API don't have two words separated by an underscore, e.g., classification.
You need API keys for Encyclopedia of Life (EOL), the Universal Biological Indexer and Organizer (uBio), Tropicos, and Plantminer.
Installing taxize
A stable version is now available on CRAN.
# Stable version from CRAN
install.packages('taxize')
library(taxize)
# or install the development version from our GitHub repo.
install.packages('devtools')
library(devtools)
install_github('taxize_', 'ropensci')
library(taxize)
Phylomatic - get a phylogeny of plant species
library(doMC)
registerDoMC(cores=4)
# input the taxonomic names
taxa <- c("Poa annua", "Abies procera", "Helianthus annuus")
# fetch the tree - the formatting of names and higher taxonmy is done within the function
tree <- phylomatic_tree(taxa=taxa, get = 'POST', informat='newick', method = "phylomatic", storedtree = "R20120829", taxaformat = "slashpath", outformat = "newick", clean = "true")
# plot the tree
plot(tree)
Integrated Taxonomic Information Service (ITIS)
Search by term and search type (Note: TSN stands for Taxonomic Serial Number)
searchbyscientificname("Tardigrada")
combinedname tsn
1 Rotaria tardigrada 58274
2 Notommata tardigrada 58898
3 Pilargis tardigrada 65562
4 Tardigrada 155166
5 Heterotardigrada 155167
6 Arthrotardigrada 155168
7 Mesotardigrada 155358
8 Eutardigrada 155362
9 Scytodes tardigrada 866744
Show higher taxonomy for a given TSN
classification(685566, ID = "tsn")
parentName parentTsn rankName taxonName tsn
1 Kingdom Animalia 202423
2 Animalia 202423 Phylum Chordata 158852
3 Chordata 158852 Subphylum Vertebrata 331030
4 Vertebrata 331030 Class Amphibia 173420
5 Amphibia 173420 Order Caudata 173584
6 Caudata 173584 Family Plethodontidae 173631
7 Plethodontidae 173631 Subfamily Plethodontinae 550197
8 Plethodontinae 550197 Genus Plethodon 173648
9 Plethodon 173648 Species Plethodon asupak 685566
Get a TSN from a species name
tsn <- get_tsn(searchterm = "Quercus_douglasii", searchtype = "sciname")
# use classification with get_tsn together:
classification(tsn)
$`1`
parentName parentTsn rankName taxonName tsn
1 Kingdom Plantae 202422
2 Plantae 202422 Subkingdom Viridaeplantae 846492
3 Viridaeplantae 846492 Infrakingdom Streptophyta 846494
4 Streptophyta 846494 Division Tracheophyta 846496
5 Tracheophyta 846496 Subdivision Spermatophytina 846504
6 Spermatophytina 846504 Infradivision Angiospermae 846505
7 Angiospermae 846505 Class Magnoliopsida 18063
8 Magnoliopsida 18063 Superorder Rosanae 846548
9 Rosanae 846548 Order Fagales 19273
10 Fagales 19273 Family Fagaceae 19275
11 Fagaceae 19275 Genus Quercus 19276
12 Quercus 19276 Species Quercus douglasii 19322
NCBI Taxonomy Browser
## Get Unique Identifier
uids <- get_uid(c("Chironomus riparius", "Chaetopteryx"))
## And retrieve classification
out <- classification(uids)
lapply(out, head)
[[1]]
ScientificName Rank UID
1 cellular organisms no rank 131567
2 Eukaryota superkingdom 2759
3 Opisthokonta no rank 33154
4 Metazoa kingdom 33208
5 Eumetazoa no rank 6072
6 Bilateria no rank 33213
[[2]]
ScientificName Rank UID
1 cellular organisms no rank 131567
2 Eukaryota superkingdom 2759
3 Opisthokonta no rank 33154
4 Metazoa kingdom 33208
5 Eumetazoa no rank 6072
6 Bilateria no rank 33213
Global Names Resolver (GNR)
Get just id's and names of sources in a data.frame
tail(gnr_datasources(todf = T))
id title
82 164 BioLib.cz
83 165 Tropicos - Missouri Botanical Garden
84 166 nlbif
85 167 The International Plant Names Index
86 168 Index to Organism Names
87 169 uBio NameBank
Give me the id for EOL (Encyclopedia of Life)
out <- gnr_datasources(todf = T)
out[out$title == "EOL", "id"]
[1] 12
## Fuzzy search for sources with the word "zoo"
out <- gnr_datasources(todf = T)
outdf <- out[agrep("zoo", out$title, ignore.case = T), ]
outdf[1:2, ]
id title
20 100 Mushroom Observer
25 105 ZooKeys
Resolve some names
### Search for _Helianthus annuus_ and _Homo sapiens_, return a data.frame
gnr_resolve(names = c("Helianthus annuus", "Homo sapiens"), returndf = TRUE)[1:2, ]
data_source_id submitted_name name_string score title
1 4 Helianthus annuus Helianthus annuus 0.988 NCBI
3 10 Helianthus annuus Helianthus annuus 0.988 Freebase
### Search for the same species, with only using data source 12 (i.e., EOL)
gnr_resolve(names = c("Helianthus annuus", "Homo sapiens"), data_source_ids = "12", returndf = TRUE)
data_source_id submitted_name name_string score title
1 12 Helianthus annuus Helianthus annuus 0.988 EOL
2 12 Homo sapiens Homo sapiens 0.988 EOL
Tropicos
Get accepted names
head(tp_acceptednames(id = 25503923))
variable value category
1 NameId 25503923 Synonym
2 ScientificName Aira pumila Synonym
3 ScientificNameWithAuthors Aira pumila Pursh Synonym
4 Family Poaceae Synonym
5 NameId 25509881 Accepted
6 ScientificName Poa annua Accepted
Get synonyms
head(tp_synonyms(id = 25509881))
variable value category
1 NameId 25503923 Synonym
2 ScientificName Aira pumila Synonym
3 ScientificNameWithAuthors Aira pumila Pursh Synonym
4 Family Poaceae Synonym
5 NameId 25509881 Accepted
6 ScientificName Poa annua Accepted
uBio
Search uBio for elephant using ubio_namebank
ubio_namebank(searchName = "elephant", sci = 1, vern = 0)
namebankID nameString fullNameString packageID packageName
1 6938660 Q2VyeWxvbiBlbGVwaGFudA== Q2VyeWxvbiBlbGVwaGFudA== 80 Cerylonidae
basionymUnit rankID rankName
1 6938660 24 species
head(ubio_namebank(searchName = "Helianthus annuus", sci = 1, vern = 0)[,-c(2, 3)])
namebankID packageID packageName basionymUnit rankID rankName
1 2658020 598 Asterales 0 24 species
2 462478 598 Asterales 0 24 species
3 9073865 5634 Asteraceae 0 444 fo
4 9073864 5634 Asteraceae 0 444 fo
5 8624412 5634 Asteraceae 8722942 444 fo
6 8291505 5634 Asteraceae 0 265 subspecies
Search for two plant species
out <- lapply(list("Helianthus debilis", "Astragalus aduncus"), function(x) ubio_namebank(searchName = x, sci = 1, vern = 0))
head(out[[2]][, -c(2, 3)]) # just Astragalus aduncus output
namebankID packageID packageName basionymUnit rankID rankName
1 2843601 663 Rosales 0 24 species
2 9261962 5780 Fabaceae 0 261 SP
3 9261999 5780 Fabaceae 0 261 SP
4 9261975 5780 Fabaceae 0 261 SP
5 704148 663 Rosales 0 24 species
6 8374424 5780 Fabaceae 722833 444 fo
IUCN Red List
Query the Red List API
ia <- iucn_summary(c("Panthera uncia", "Lynx lynx"))
Extract Status
library(plyr)
laply(ia, function(x) x$status)
[1] "EN" "LC"
Support and Bugs
For bugs, feature requests and other issues, please submit an issue via Github.
For general comments, email scott at ropensci.gmail.com . This package is part of the rOpenSci suite of R tools. For similar packages visit ropensci.org