• Build a database of all rodents
  • " />

    In this first tutorial we are going to build a database for all rodents. The rodents are a good test case for playing with restez as they are a relatively small domain in GenBank but still have charismatic organisms that people are familiar enough with to understand. To keep things extra fast, we will also limit the number of sequences in the database by limiting the sequence sizes between 100 and 1000.

    The database you build here will be used again in later tutorials and you may wish to experiment with it yourself. Therefore it is best to locate a suitable place in your harddrive where you would like to store it for later reference. In this tutorial and in others, we will always refer to the rodents’ restez path with the variable rodents_path.

    Setting up the rodents database will likely take a long time. The exact time will depend on your internet speeds and machine specs. For reference, this vigenette was written on a MacBook Air (2013) via WiFi with a download speed of 13 MBPS. With this setup, downloading the database took 26 minutes and creating the database took 59 minutes.

    Download

    library(restez)
    # set the restez path to a memorable location
    restez_path_set(rodents_path)
    # download for domain 15
    db_download(preselection = '15')

    Build

    library(restez)
    restez_path_set(rodents_path)
    db_create(min_length = 100, max_length = 1000)

    Check status

    library(restez)
    #> -------------
    #> restez v1.0.0
    #> -------------
    #> Remember to restez_path_set() and, then, restez_connect()
    restez_path_set(rodents_path)
    restez_connect()
    #> Remember to run `restez_disconnect()`
    restez_status()
    #> Checking setup status at  ...
    #> ───────────────────────────────────────────────────────────────────────────────────────────────
    #> Restez path ...
    #> ... Path '[RODENTS PATH]/restez'
    #> ... Does path exist? 'Yes'
    #> ───────────────────────────────────────────────────────────────────────────────────────────────
    #> Download ...
    #> ... Path '[RODENTS PATH]/restez/downloads'
    #> ... Does path exist? 'Yes'
    #> ... N. files 32
    #> ... N. GBs 2.04
    #> ... GenBank division selections 'Rodent'
    #> ... GenBank Release 228
    #> ... Last updated '2018-11-15 10:42:40'
    #> ───────────────────────────────────────────────────────────────────────────────────────────────
    #> Database ...
    #> ... Path '[RODENTS PATH]/restez/sql_db'
    #> ... Does path exist? 'Yes'
    #> ... N. GBs 0.6
    #> ... Is database connected? 'Yes'
    #> ... Does the database have data? 'Yes'
    #> ... Number of sequences 197553
    #> ... Min. sequence length 100
    #> ... Max. sequence length 1000
    #> ... Last_updated '2018-11-15 13:50:11'
    restez_disconnect()