name2taxid()
returns a vector and dies if there are any ambiguous
names. name2taxid_map()
returns a data.frame mapping names to ids
name2taxid(x, db = "ncbi", verbose = TRUE, out_type = c("uid", "summary"), ...)
x | (character) Vector of taxon keys for the given database |
---|---|
db | (character) The database to search, one of ncbi, itis, gbif, wfo, or tpl |
verbose | (logical) Print verbose messages |
out_type | (logical) character "uid" for an ID vector, "summary" for a table with columns 'tax_id' and 'tax_name'. |
... | Additional arguments passed to database specific classification functions. |
The NCBI taxonomy database includes common names, synonyms and misspellings. However, the database is a little inconsistent. For some species, such as Arabidopsis thaliana, the misspelling Arabidopsis_thaliana is included, but the same is NOT done for humans. However, underscores are supported when querying through entrez, as is done in taxize, which implies entrez is replacing underscores with spaces. So I do the same. A corner case appears when an organism uses underscores as part of the name, not just a standin for space ("haloarchaeon 3A1_DGR"). To deal with this case, we replace underscores with spaces ONLY if there are not spaces in the original name.
if (FALSE) { name2taxid(c('Arabidopsis thaliana', 'pig')) name2taxid(c('Arabidopsis thaliana', 'pig'), out_type="summary") name2taxid(x=c('Arabidopsis thaliana', 'Apis mellifera'), db = "itis") name2taxid(x=c('Arabidopsis thaliana', 'Apis mellifera'), db = "itis", out_type="summary") name2taxid(x=c('Arabidopsis thaliana', 'Quercus kelloggii'), db = "wfo") name2taxid(x=c('Arabidopsis thaliana', 'Quercus kelloggii'), db = "wfo", out_type="summary") name2taxid("Austrobaileyaceae", db = "wfo") name2taxid("Quercus kelloggii", db = "gbif") name2taxid(c("Quercus", "Fabaceae", "Animalia"), db = "gbif") name2taxid(c("Abies", "Pinales", "Tracheophyta"), db = "col") name2taxid(c("Abies mangifica", "Acanthopale aethiogermanica", "Acanthopale albosetulosa"), db = "tpl") }