• Build a database of all rodents
• " />

In this first tutorial we are going to build a database for all rodents. The rodents are a good test case for playing with restez as they are a relatively small domain in GenBank but still have charismatic organisms that people are familiar enough with to understand. To keep things extra fast, we will also limit the number of sequences in the database by limiting the sequence sizes between 100 and 1000.

The database you build here will be used again in later tutorials and you may wish to experiment with it yourself. Therefore it is best to locate a suitable place in your harddrive where you would like to store it for later reference. In this tutorial and in others, we will always refer to the rodents’ restez path with the variable rodents_path.

Setting up the rodents database will likely take a long time. The exact time will depend on your internet speeds and machine specs. For reference, this vigenette was written on a MacBook Air (2013) via WiFi with a download speed of 13 MBPS. With this setup, downloading the database took 26 minutes and creating the database took 59 minutes.

library(restez)
# set the restez path to a memorable location
restez_path_set(rodents_path)
db_download(preselection = '15')

## Build

library(restez)
restez_path_set(rodents_path)
db_create(min_length = 100, max_length = 1000)

## Check status

library(restez)
#> -------------
#> restez v1.0.0
#> -------------
#> Remember to restez_path_set() and, then, restez_connect()
restez_path_set(rodents_path)
restez_connect()
#> Remember to run restez_disconnect()
restez_status()
#> Checking setup status at  ...
#> ───────────────────────────────────────────────────────────────────────────────────────────────
#> Restez path ...
#> ... Path '[RODENTS PATH]/restez'
#> ... Does path exist? 'Yes'
#> ───────────────────────────────────────────────────────────────────────────────────────────────
#> ... Does path exist? 'Yes'
#> ... N. files 32
#> ... N. GBs 2.04
#> ... GenBank division selections 'Rodent'
#> ... GenBank Release 228
#> ... Last updated '2018-11-15 10:42:40'
#> ───────────────────────────────────────────────────────────────────────────────────────────────
#> Database ...
#> ... Path '[RODENTS PATH]/restez/sql_db'
#> ... Does path exist? 'Yes'
#> ... N. GBs 0.6
#> ... Is database connected? 'Yes'
#> ... Does the database have data? 'Yes'
#> ... Number of sequences 197553
#> ... Min. sequence length 100
#> ... Max. sequence length 1000
#> ... Last_updated '2018-11-15 13:50:11'
restez_disconnect()

## Next up

How to search for and fetch sequences