Finding communities in large datasets
Source:vignettes/large_datasets_communities.Rmd
      large_datasets_communities.RmdCommunity detection in very large datasets
When using larger datasets of tree-ring series, calculating the table
with similarities can take a lot of time, but finding communities even
more. It is therefore recommended to use of parallel computing for
Clique Percolation:
clique_community_names_par(network, k=3, n_core = 4). This
reduces the amount of time significantly. For most datasets
clique_community_names() is sufficiently fast and for
smaller datasets clique_community_names_par() can even be
slower due to the parallelisation. Therefore, the funtion
clique_community_names() should be used initially and if
this is very slow, start using
clique_community_names_par().
The workflow is similar as described in the
vignette("dendroNetwork"), but with minor changes:
- load network. 
- compute similarities. 
- find the maximum clique size: - igraph::clique_num(network).
- 
detect communities for each clique size separately: - com_cpm_k3 <- clique_community_names_par(network, k=3, n_core = 6).
- com_cpm_k4 <- clique_community_names_par(network, k=4, n_core = 6).
- and so on until the maximum clique size. 
 
- merge these into a single - data frameby- com_cpm_all <- rbind(com_cpm_k3,com_cpm_k4, com_cpm_k5,... ).
- create table for use in cytoscape with all communities: - com_cpm_all <- com_cpm_all |> dplyr::count(node, com_name) |> tidyr::spread(com_name, n).
- Continue with the visualisation in Cytoscape, see the relevant section in the - vignette("dendroNetwork").