Retrieve nucleotide sequences from NCBI.

Usage

get_seqs(taxon_name, gene, seqrange, getrelated, writetodf = TRUE, filetowriteto)

Arguments

taxon_name
Scientific name to search for (character).
gene
Gene (character) or genes (character vector) to search for.
seqrange
Sequence range, as e.g., "1:1000" (character).
getrelated
Logical, if TRUE, gets the longest sequences of a species in the same genus as the one searched for. If FALSE, get's nothing.
writetodf
Write resulting data.frame of results to a file on your machine (logical).
filetowriteto
If writetodf=TRUE, then specify the file name. Default=T.

Value

Data.frame of results.

Description

This function retrieves one sequences for each species, picking the longest available for the given gene.

Details

Removes predicted sequences so you don't have to remove them. Predicted sequences are those with accession numbers that have "XM_" or "XR_" prefixes.

Examples

# A single species get_seqs(taxon_name="Acipenser brevirostrum", gene = c("coi", "co1"), seqrange = "1:3000", getrelated=T, writetodf=F)
Working on Acipenser brevirostrum... ...retrieving sequence IDs... ...retrieving sequence ID with longest sequence length... ...retrieving sequence... ...done.
taxon 1 Acipenser brevirostrum gene_desc 1 Acipenser brevirostrum voucher BIOUG:BCF-699-10 cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial gi_no acc_no length 1 186883022 EU523877.1 652 sequence 1 CCTGTATTTAGTATTTGGTGCCTGAGCAGGCATAGTCGGCACAGCCCTCAGCCTTCTGATCCGTGCCGAACTGAGCCAACCCGGTGCCCTGCTTGGCGATGATCAGATCTACAATGTTATCGTCACAGCCCACGCCTTTGTCATGATTTTCTTTATAGTAATACCCATCATAATTGGCGGATTCGGAAACTGACTGGTCCCCCTAATAATTGGGGCCCCAGACATGGCATTTCCTCGCATGAACAATATGAGCTTCTGACTCCTACCCCCATCCTTCCTACTCCTTTTAGCCTCCTCTGGGGTAGAGGCCGGAGCCGGCACAGGGTGAACTGTTTACCCCCCACTGGCGGGAAACCTGGCCCATGCAGGAGCCTCTGTAGACCTAACCATTTTCTCCCTTCACCTGGCTGGGGTTTCGTCCATTTTGGGGGCTATTAATTTTATTACCACCATTATTAACATGAAACCCCCCGCAGTATCCCAATATCAGACACCTCTATTTGTGTGATCTGTATTAATCACGGCCGTACTTCTCCTACTATCACTGCCAGTGCTAGCTGCAGGGATCACAATGCTCCTAACAGACCGAAATTTAAACACCACCTTCTTTGACCCAGCCGGAGGAGGAGACCCCATCCTCTACCAACACCTA spused 1 Acipenser brevirostrum
# Many species, can run in parallel or not using plyr species <- c("Colletes similis","Halictus ligatus","Perdita trisignata") llply(species, get_seqs, gene = c("coi", "co1"), # notice different sp. output for Perdita seqrange = "1:2000", getrelated=T, writetodf=F)
Working on Colletes similis... ...retrieving sequence IDs... ...retrieving sequence ID with longest sequence length... ...retrieving sequence... ...done. Working on Halictus ligatus... ...retrieving sequence IDs... ...retrieving sequence ID with longest sequence length... ...retrieving sequence... ...done. Working on Perdita trisignata... ...retrieving sequence IDs... no sequences of coi for Perdita trisignata - getting other sp.no sequences of co1 for Perdita trisignata - getting other sp. ...retrieving sequence IDs for related species... ...retrieving sequence ID with longest sequence length... ...retrieving sequence... ...done.
[[1]] taxon 1 Colletes similis gene_desc 1 Colletes similis voucher TCDB-T540 cytochrome oxidase subunit 1 (COI) gene, partial cds; mitochondrial gi_no acc_no length 1 387931795 JQ909720.1 654 sequence 1 TTTATTTTTGCTATATGAACAGGAATAATTGGTTCTTCTTTAAGAATAATTATTCGAATAGAATTAAGATCTCCTGGTATATGAATTAATAATGATCAAATTTATAATTCTATTGTTACTGCACATGCTTTTATTATAATTTTTTTTATAGTTATACCTTTTTTAATTGGRGGRTTTGGTAATTGATTAATTCCATTAATAATTGGAGCTCCTGATATAGCATTTCCTCGTATAAATAATATAAGATTTTGATTATTACCTCCTTCTTTAATTTTATTATTATTAGGAAGAATTTTATAYTCAGGAAGAGGAACTGGATGAACTGTWTATCCTCCATTATCTTCTTTAATATATCATTCTTCTTTATCTGTTGATTTAACTATTTTTTCTTTACATATTGCAGGTATTTCATCTATTATAGGATCAATAAATTTTATTGTAACAATTTTAAAAATAAAAAATTATAATTTAAATTATGATCAATTAACATTATTTTCATGATCAGTTTTTATTACAACTATTTTATTATTATTATCTTTACCTGTATTAGCRGGTGCAATTACAATATTATTAACTGATCGTAATTTAAATACTTCTTTTTTTGATCCATCTGGTGGAGGWGATCCAATTTTATATCAACATTTATTTTGATTT spused 1 Colletes similis [[2]] taxon 1 Halictus ligatus gene_desc 1 Halictus ligatus Hali-c cytochrome oxidase subunit I (COI) gene, partial cds; and tRNA-Leu gene, complete sequence; mitochondrial genes for mitochondrial products gi_no acc_no length 1 19913235 AF438426.1 1478 sequence 1 CAATTTACCCTCCATTATCATCGATTATATATCATTCATCTTTTTCAGTAGATTTTTCTATCTTCTCCTTACATATAGCAGGAATCTCTTCAATCATAGGAGCTATTAATTTTATTGTAACTATTATTTTAATAAAAAATATTTCACTTAATATAAATCAAATCCCTCTATTTCCTTGATCAGTAAAAATTACTGCAATTTTACTTCTTCTCTCTCTTCCAGTTTTAGCAGGAGCTATTACAATATTATTAACAGATCGAAATTTAAATACATCATTTTTTGACCCTTCAGGAGGTGGAGACCCAATTTTATATCAACATTTATTTTGATTTTTCGGTCATCCTGAAGTTTACATTCTAATTTTACCTGGATTTGGATTAATTTCTCACATTATTTTTAATGAAAGAGGAAAAAAAGAAATTTTTGGTAAATTAGGTATAATTTATGCAATAATAGGAATTGGATTTTTAGGATTTATTGTATGAGCCCATCATATATTTACTGTAGGATTAGATGTAGATACACGAGCTTATTTCACATCTGCAACTATAATTATTGCTGTCCCCACAGGAATTAAAGTATTTAGATGATTAGCCACATACTGTGGTTCAAAAATTAAATTAAATCCTTCAATTAATTGATCTTTAGGTTTTATTTTTTTATTTACTATAGGAGGATTAACTGGTATTATACTATCAAATTCTTCTATTGATATTATACTACATGATACATACTACGTAATTGGTCATTTTCATTATGTTCTATCTATAGGAGCAGTATTTGCAATTATTGCAAGATTAATTCATTGATACCCCTTATTTACTGGATTAACTTTAAATAAAAAATTATTAAATATTCAATTTATCATAATATTTACTGGTGTAAATTTAACATTTTTTCCACAACATTTTTTAGGATTAATAGGTATACCTCGACGATATTCAGACTATCCTGATGCCTATTATTGTTGAAATTTAATTTCATCTATTGGTTCTTTAATTACTTTTAATAGATTAATTTTATTAATCTTTACTATATTAGAAAGATTAATTATAAAACGATTAATTTTATTTAAATATTTTCAATCATCATTAGAATGATTACAAAACTATCCCCCTTTAAGACATTCATACAATGAACTTCCTATCATTATTTTTAAATTTTAATATGGCAGAATAGTGTAATGAATTTAAGACTCATAAATAAAATTGATTAATTTTTATTAAAATTTATATTTTTATATAAATATTGCTACATGAAATATATATTCCTTTCAAGATCCTAACTCACCATTTGCAGATAATTTATTTTATTTTTATAATTTTACTATAATTACTTTAACAATAATTACTATTTCTACATTATATATATTGATTTTTATTTTAAATAATAAATATTTAAATTTAAATTTATTAAAAAATCATAATATCGAAATTCTATGAACTATTACCCCTATAATTATACTCTTAATTATCTCT spused 1 Halictus ligatus [[3]] taxon 1 Perdita trisignata gene_desc 1 Perdita halictoides voucher Peha84 cytochrome oxidase subunit I (COI) gene, partial cds; mitochondrial gi_no acc_no length 1 156631315 EF594406.1 785 sequence 1 ATTTTACCAGGATTTGGTTTAATTTCTCATATTATTTCAAATGAAAGAGGAAAAAAAGAAACCTTTGGTAATTTAGGAATAATTTATGCTATATTAGGAATTGGATTTTTAGGATTTGTAGTATGAGCTCATCATATATTTACTGTTGGAATAGATGTTGATACACGAGCATATTTTACTTCAGCTACTATAATTATTGCAGTACCTACAGGAATTAAAGTATTTAGATGATTAACTACATTTCATGGAGCAAAAATTATAAATAAACCTACATTTTTATGATCAATAGGATTTATTTTTTTATTTACAATAGGTGGTTTAACTGGAATTATACTATCAAATTCTTCAATTGATATTATTTTACATGATACTTATTATGTTGTAGGTCATTTTCATTATGTATTATCTATAGGAGCTGTTTTTGCTATTTTTAGTAGATTAGTTTTCTGATATCCTTCAATTATAAGATTAACTTTAAATAATAATTTATTAAAAATTCAATTTTATTTAATATTTATTGGTGTAAATATAACATTTTTTCCTCAACATTTTTTAGGATTAATAGGTATACCTCGACGATATTCAGATTATCCAGATGCATATATATGTTGAAATATAATTTCTTCTATAGGTTCAATTATTTCAACTAACAGAATATTTTTATTTATTTACATTATTCTAGAAAGAATAATTAAAAAACGATTAGTTTTATATAAATTTTCATTAAATTCATTAGAATGATTACAAAATTTTCCTCCAACAACTCATACATTTAATGAAATTCC spused 1 Perdita halictoides

Author

Scott Chamberlain myrmecocystus@gmail.com