For all sequences in a cluster(s) calculate the frequency of separate words in either the sequence definitions or the reported feature name.

calc_wrdfrq(phylota, cid, min_frq = 0.1, min_nchar = 1,
  type = c("dfln", "nm"), ignr_pttrn = "[^a-z0-9]")

Arguments

phylota

Phylota object

cid

Cluster ID(s)

min_frq

Minimum frequency

min_nchar

Minimum number of characters for a word

type

Definitions (dfln) or features (nm)

ignr_pttrn

Ignore pattern, REGEX for text to ignore.

Value

list

Details

By default, anything that is not alphanumeric is ignored. 'dfln' and 'nm' match the slot names in a SeqRec, see list_seqrec_slots().

See also

Examples

data('dragonflies') # work out what gene region the cluster is likely representing with word freqs. random_cids <- sample(dragonflies@cids, 10) # most frequent words in definition line (calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'dfln'))
#> $`685` #> named numeric(0) #> #> $`136` #> named numeric(0) #> #> $`26` #> wrds #> h3 gene partial cds histone #> 0.1487768 0.1118323 0.1118323 0.1113330 0.1113330 #> #> $`250` #> sequence #> 0.1363636 #> #> $`313` #> named numeric(0) #> #> $`152` #> wrds #> rrna and #> 0.1543739 0.1029160 #> #> $`779` #> wrds #> rrna and #> 0.1578947 0.1052632 #> #> $`49` #> named numeric(0) #> #> $`301` #> wrds #> 18s gene partial ribosomal rna sequence #> 0.1181818 0.1181818 0.1181818 0.1181818 0.1181818 0.1181818 #> #> $`756` #> wrds #> gene sequence #> 0.1451613 0.1451613 #>
# most frequent words in feature name (calc_wrdfrq(phylota = dragonflies, cid = random_cids, type = 'nm'))
#> $`685` #> numeric(0) #> #> $`136` #> numeric(0) #> #> $`26` #> numeric(0) #> #> $`250` #> wrds #> 28s and contains internal ribosomal rna #> 0.125 0.125 0.125 0.125 0.125 0.125 #> spacer transcribed #> 0.125 0.125 #> #> $`313` #> numeric(0) #> #> $`152` #> wrds #> internal spacer transcribed #> 0.3333333 0.3333333 0.3333333 #> #> $`779` #> wrds #> internal spacer transcribed #> 0.3333333 0.3333333 0.3333333 #> #> $`49` #> numeric(0) #> #> $`301` #> numeric(0) #> #> $`756` #> wrds #> 16s ribosomal rna #> 0.3333333 0.3333333 0.3333333 #>