SparCC is a network inference tool that was specifically designed to be robust
to data compositionality. The method is described in
PLoS Comp 8(9): e1002687.
Step 1 - Compute correlations
SparCC is a python program that runs on command line. Please choose the input file "arctic_soils_filtered.txt". Compute the compositionality-robust correlations as the median of 10 iterations as follows:
python SparCC.py arctic_soils_filtered.txt -i 10 --cor_file=arctic_soils_sparcc.txt > sparcc.log
where -i gives the number of iterations over which the correlations are averaged.
SparCC averages its results over several estimates of the true fractions, which it estimates
from the counts using the Dirichlet distribution.
Step 2 - Compute bootstraps
You can then generate bootstraps from the input data using the following command:
python MakeBootstraps.py arctic_soils_filtered.txt -n 100 -o Resamplings/boot
where Resamplings is a directory and boot is the prefix of all resampled data sets. You then have to launch SparCC on each of the resampled data sets. This is best done in a script. Here is a simple script that does the job. As an example, this bash script would generate 10 correlation matrices from the first 10 resampled data sets:
for i in 0 1 2 3 4 5 6 7 8 9 do python SparCC.py Resamplings/boot_$i.txt -i 10 --cor_file=Bootstraps/sim_cor_$i.txt >> sparcc.log done
where Bootstraps is the directory into which correlation matrices computed from the resampled data
matrices will be written.
To compute p-values, more than 10 iterations are needed.
Precomputed bootstrap correlations for 100 iterations
can be downloaded here.
Step 3 - Compute p-values
Once the bootstrapped correlation scores have been computed, the p-values can be generated using command:
python PseudoPvals.py arctic_soils_sparcc.txt Bootstraps/sim_cor 10 -o pvals_two_sided.txt -t 'two_sided' >> sparcc.log
Step 4 - Visualization
Note that you need to threshold the p-value matrix at the desired cut-off and to convert it into a network using a script of your own. For example, below is a simple R script that will perform this task. In the script, the p-value matrix is converted into a matrix of significances. Do you know why? You can find the answer here.
# load R graph library igraph library(igraph) path="pvals_two_sided.txt" pvals=read.table(path,header=TRUE,sep="\t") pvals.mat=pvals[,2:ncol(pvals)] # set p-values of 0 to a non-zero, small p-value so we can take the logarithm pvals.mat[pvals.mat==0]=0.000000001 # convert into significance sig.mat=-1*log10(pvals.mat) # remove all edges with significance below 1 sig.mat[sig.mat<1]=0 sig.mat=as.matrix(sig.mat) # convert adjacency matrix into a graph sparcc.graph=graph.adjacency(sig.mat,mode="undirected") # display the graph layout=layout.spring plot(sparcc.graph, layout=layout)