CoNet on command line
Using CoNet on command line - step by step tutorial
When a high number of iterations is requested (especially with renormalization enabled),
CoNet is best run on command line.
To ease the command line call of CoNet, the Cytoscape plugin allows generating the command line call from
the current settings.
Prerequisites
This tutorial demonstrates how to run CoNet on command line. It assumes that you
have downloaded and unzipped the CoNet.zip file. It also assumes that you have Java
Runtime Environment 1.6 or higher installed on your machine
(if Cytoscape 2.8 runs on your computer, you don't need to worry about this).
Optional: Set the Classpath
Optionally, you can add CoNet.jar located in the lib folder of CoNet to your class path.
Java will find the jar by looking up this class path variable.
Check this tutorial
on how to do this on different systems.
Example for MacOS and UNIX-based systems (on command line):
export LIB=/Users/me/Documents/CoNet/lib
export CLASSPATH=${CLASSPATH}:${LIB}/CoNet.jar
Example for Windows (in command prompt window):
set CLASSPATH=%CLASSPATH%;C:\Users\me\Documents\CoNet\lib\CoNet.jar;
Tutorial steps
- Start Cytoscape and open the CoNet plugin. Load the demo settings
(the file is located in the demo folder of the CoNet directory).
- Open CoNet's Data menu and select "Costello_2009_oral.txt" located in the demo folder
as input matrix.
- Click the "Generate command line call" button in CoNet's main menu. This
will open another window with some text in it.
- Copy the line(s) of the text starting with "java". These lines represent
the commands you want to carry out. They call CoNet's command line version with the
parameters you have selected or loaded in the Cytoscape plugin.
- If you did not set the class path, paste the command into an editor and replace
"java" by "java -cp C:\Users\me\Documents\CoNet\lib\CoNet.jar" in Windows and by
"java -cp /Users/me/Documents/CoNet/lib/CoNet.jar" in MacOS/UNIX. For either OS, please do not
copy the example paths, but give the path in which your CoNet.jar is located.
Then copy the modified command.
- Open a command shell. In Windows, you can do so by
clicking "Start", then "Run...". A window will open into which you write "cmd".
This will open the command prompt window. Alternatively, you can also open the "Command prompt"
program located in "Accessories".
In MacOS, you can open the terminal application located
in /Applications/Utilities.
- In the command shell, go to the demo folder of the CoNet directory using the command "cd".
Example for MacOS:
cd /Users/me/Documents/CoNet/demo
Example for Windows:
cd C:\Users\me\Documents\CoNet\demo
- Paste the command. On MacOS/UNIX, you can add an ampersand (&) at the end of the command
to send it to the background
- Push Enter to start the execution of the command.
- After a few seconds, a network file starting with "cooccurrence" and ending with ".gdl"
should have been generated in the demo folder.
- You can load this network into Cytoscape by selecting it using the "Load" button in CoNet's
main menu and then pushing "GO".
Command line tips and tricks
Command line help
You can get a short and a long version of the command line help. For the short version,
please type:
java be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser -h
For the long version, type:
java be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser -H
Runtime memory
You can increase java runtime memory with option -Xmx. Example:
java -Xmx2000M be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser -h
Alias
In MacOS/UNIX, you can set an alias to shorten the CoNet call on command line.
For this, add the line below to your bash shell configuration file:
alias conet="java be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser"
Then you can call the program as:
conet -h
Running CoNet on command line with a configuration file
Some options of the CoNet Cytoscape plugin are given to CoNet via a configuration file.
These concerns the following options (the corresponding command line option is given in brackets):
- RserveHost (host)
- RservePort (port)
- LineageSeparator (lineage_separator)
- MiImplementation (mi_implementation)
- NoRserveDependency (no_rserve)
- PseudoCounts (pseudocount)
- Poolvar (poolvar)
- DisableSpeedup (disable_speedup)
In addition, a number of other variables can be set via the configuration file
(check the command line help for more details).
Here's an example for a configuration file:
############ CONET CONFIG #############
# rserve config
rserve_host=127.0.0.1
rserve_port=6311
# phylogenetic lineage
lineage_separator=--
# mutual information computation
mi_implementation=minet
minet_r_batch=true
# access to R
no_rserve=false
no_r=false
# p-value computation
poolvar=false
# speed-up disabled
disable_speedup=true
This configuration file indicates that CoNet makes use of Rserve at the specified host and port.
It then specifies the lineage separator string (which is needed for taxon metadata specifying
phylogenetic lineages). Furthermore, it enables mutual information computation in minet
and, since R is needed for minet usage, allows calls to R via command line and via Rserve.
In addition, the configuration switches off the CoNet speed-up, so that
CoNet uses the previous (slower) implementation of various similarity measures (disable_speedup).
Example for CoNet with configuration file
Here is an example of running CoNet command line with a configuration file. The example
assumes that Rserve is running and minet is installed in R. Please copy the command into one line.
java be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser
-Z CoNetConfig.txt --method ensemble
--input /CoNet/demo/Costello_2009_oral.txt --matrixtype abundance
--ensemblemethods correl_pearson/dist_bray/
dist_kullbackleibler/sim_mutInfo --minetdisc equalfreq
--format gdl --nantreatment pairwise_omit
--nantreatmentparam 5 --networkmergestrategy union
--stand col_norm --multigraph
--ensembleparams correl_pearson~upperThreshold=0.72/
correl_pearson~lowerThreshold=-0.6/dist_bray~upperThreshold=0.89/
dist_bray~lowerThreshold=0.25/dist_kullbackleibler~upperThreshold=7.16/
dist_kullbackleibler~lowerThreshold=0.45/sim_mutInfo~lowerThreshold=0.6
--filter row_minocc --filterparameter 5.0
--output cooccurrenceNetworkDemo.gdl
where CoNetConfig.txt is a file located in the directory where the command is carried out. CoNetConfig.txt has
the following content:
############ CONET CONFIG #############
# rserve config
no_rserve=false
rserve_host=127.0.0.1
rserve_port=6311
# mutual information computation with minet
mi_implementation=minet
Advanced users: Submitting CoNet jobs to a SGE cluster (MacOS/UNIX)
CoNet can send jobs to a SunGridEngine (SGE) cluster (command line option -b).
This feature is needed when thousands of permutation iterations need to be carried out.
A number of options in the configuration file allow to manage the cluster submission
capabilities of CoNet (see list below). Most important is the job_num option, which defines how iterations
are split into jobs. For example, if 1,000 permutations need to be carried out, a
job_num of 100 will result in the submission of 100 jobs, each run with 10 iterations.
In consequence, 100 temporary score files will be created (having 10 lines each), which will be merged into a final
randomization score file after completion of all jobs. When submitting jobs to a SGE cluster,
the lib_dir, jar_file, temp_dir, queue, job_num and memory options should all be set, and
it is advisable to set both keep_tmpscores and no_rserve to true.
Here's an example configuration:
############ CONET CONFIG #############
no_rserve=true
no_r=true
####### cluster configuration
# location of the CoNet jar
lib_dir=/path/to/my/conet/lib/folder/
# name of the CoNet jar
jar_file=CoNet.jar
# temporary score files are stored here
tmp_dir=/path/to/my/temp/folder
# SGE queue name
queue=all.q
# memory allocated to java (via option -Xmx)
memory=4000
# number of jobs into which iterations are splitted
job_num=100
# keep the temporary score files
keep_tmpscores=true
# dry run: test run that does not submit jobs to the cluster
dry_run=false
# do not keep launcher scripts
keep_scripts=false
When the CoNet command exits before completion of the jobs, temporary score files
can be concatenated after completion of the jobs using command line option --restorefromscorefolder
List of cluster options
Here is the list of all cluster-related options, to be set via the configuration file.
- lib_dir = location of the CoNet jar file (not including the jar file name)
- jar_file = the name of the jar file
- tmp_dir = location of temporary score files
- keep_tmpscores = if true, temporary score files are kept in the temp directory
- job_num = the number of jobs submitted to the cluster (the number of iterations run in each job is determined automatically from the number of requested iterations)
- queue = the name of the queue
- memory = the memory in MB allocated to the job (defaults to 1000)
- user_cmd = a bash command carried out before the job is started (e.g. user_cmd=module load java). The user command can include additional SGE directives (e.g. user_cmd=#$ -m ae -M me@somewhere.com). Several commands can be given by emulating the new line separator with three question marks (e.g. user_cmd=. /etc/profile.d/modules.sh???module load java).
- launch_dir = directory in which cluster submission is launched (defaults to the current directory)
- keep_scripts = if true, job submission scripts are kept
- dry_run = if true, job submission scripts are created but not launched
- quiet_run = if true, output and error streams will be directed to /dev/null, so the creation of job-specific log files is suppressed
Advanced users: Parallelizing CoNet (MacOS/UNIX)
CoNet randomization can be split in several jobs to speed it up.
Conet supports this parallelization in two ways: SGE-cluster submission (see above)
and user-made wrappers. If you would like to run several CoNet jobs in parallel using your
own wrapper, the following options are of interest to you:
You can specify -f with value "randscore". This causes CoNet to write random scores
instead of a network to the output. The original network can be provided via option -I.
If it is provided (and -f is set to randscore), the original network will not be recomputed, but read in from the file
given via -I. If the file given via -I does not exist yet, the original network will be exported to this file.
Thus, you can set up parallelization in the following way:
The first step is to compute the original network scores:
java -Xmx2000m -cp CoNet.jar
be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser
-i input.txt -f randscore -E correl_spearman/dist_bray --method ensemble
--ensembleparamfile thresholds.txt --multigraph --pvaluemerge brown
-F rand --iterations 1 -g 0.05 --resamplemethod shuffle_rows
-I oriscores.txt -K edgeScores --scoreexport
--output randomScores.0 > oriscores.log &
Next, you can create N jobs (where i is the job index going from 1 to N), by launching the following command N times:
java -Xmx2000m -cp CoNet.jar
be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser
-i input.txt -f randscore -E correl_spearman/dist_bray --method ensemble
--ensembleparamfile thresholds.txt --multigraph --pvaluemerge brown
-F rand --iterations 10 -g 0.05 --resamplemethod shuffle_rows
-I oriscores.txt -K edgeScores --scoreexport
--output randomScores.i > randscores_i.log &
Finally, you can merge all separately generated random score files (randomScores.i with i from 1 to N)
by appending them to the original score file (oriscores.txt), thus creating your final permutation or bootstrap score file.
If all your random score files are in one directory and if their names start with "randomScores.", you can let CoNet do the final merge.
For this, point to the random score directory by setting tmp_dir in the configuration file to this directory
and add option --restorefromscorefolder on command line.
An example command could look like this:
java -Xmx2000m -cp CoNet.jar
be.ac.vub.bsb.cooccurrence.cmd.CooccurrenceAnalyser
-i input.txt -f gdl -E correl_spearman/dist_bray --method ensemble
--ensembleparamfile thresholds.txt --multigraph --pvaluemerge brown
-F rand --iterations 100 -g 0.05 --resamplemethod shuffle_rows
-I oriscores.txt -K edgeScores --restorefromscorefolder
-Z CoNetConfig.txt --output network.gdl > restore.log &
Advanced users: Ensemble Pipeline Bash Script (MacOS/UNIX)
In the cmd folder of the CoNet distribution, there is an example for a bash script that runs the steps of the ensemble part
of the pipeline published in
PLoS Computational Biology 8, e1002606.
The example bash script uses input data from the third CoNet tutorial.
It assumes that it is located in a folder that has two sub-folders "Input" and "Output",
where input files (input matrix and metadata) are located in the input folder and
the output files will be written to the output folder.
Don't forget to give the script execution permission (chmod 755 cooc.sh).
The bash script runs all required network construction steps in one go, that is computation of
initial thresholds, generation of renormalized permutation and bootstrap scores
and final network construction.
The renormalized permutation score computation is the longest step,
it takes around 5 minutes on an 8GB RAM machine.
You can run a first test by setting PERMUT, BOOT and RESTORE to false and
enabling COMPUTE_THRESHOLDS and TEST instead.
If you want to enable CLUSTER, make sure you have the SGE cluster management system installed.
Then, add required cluster options to the CoNetConfig.txt and CoNetConfigBoot.txt configuration files.
Both configuration files should specify the location of the CoNet jar file (lib_dir and jar_file)
as well as the requested job number (job_num) and memory (memory).
The CoNetConfig.txt configuration file should in addition point to the location of the directory where
temporary permutation score files will be saved (tmp_dir). Likewise, the CoNetConfigBoot.txt configuration
file should point to a different location, where temporary bootstrap score files will be saved.
Note that in case you keep temporary score files (keep_tmpscores=true),
you can restore random score distributions from these files using option --restorefromscorefolder later on.