Interactive Xander Example on MSU's HPCC
This exercise is applicable to a the installation of RDPTools on MSU's HPCC. Exercise steps are:
Create a working directory.
Copy the configuration and command scripts into the working directory.
Edit the configuration script.
From the working directory, enter a run command of the form:
Copy Scripts into Working Directory
1. Create a working directory in your home directory, e.g. ~/test_xander
2. Move into your working directory.
3. Copy xander_setenv.sh
to your working directory renaming it to my_xander_setenv.sh
in the process.
4. Copy run_xander_skel.sh
to your working directory, renaminig it to run_xander.sh
in the processl.
Edit the Configuration Script File
The configuration script is divided into three sections. The first section gives the absolute paths to the sequence (data) file, output directory, HMM files, and programs. The second section gives a short name that will be prefixed to the contig names. This name must be unique for each sample in an experiment. The third section gives parameters for each step in assembling the contigs. These parameters are specific to the data set, but usually no more than two or three need to be changed. (See the section Choosing Xander Parameters.) The configuraton file is commented to help you edit it for your particular installation and data set.
To run this exercise on MSU's HPCC, there is only one line in the configuration script that you need to change. With the editor of your choice, edit the path variable for your working directory in the First section of my_xander_setenv.sh
to the absolute path, e.g ~/test_xander
. After your edits, this section should look like this:
To make this change using the editor nano, begin by entering the following:
Use the arrow keys to move the cursor to the appropriate place in the text. Insert characters by typing. Delete characters with the delete or backspace keys. There is a menu at the bottom of the nano screen indicating keys to use to write out the changed file and exit nano. The "^" in this menu means the "Ctrl" key. When you are finished making changes, hold down the Ctrl key and type o. You will be offered the opportunity to change the file name. To keep the same name, just hit the Enter key. Then hold down the Ctrl key and type x to exit nano. You may check that your changes have been made with less:
Do Not Edit the Run Script File
To run this example on MSU's HPCC, you do not need to edit anything in run_xander.sh
. The path for BASEDIR
is already set correctly.
Run Xander
Make sure that the two script files are executable. Change the file permissions if necessary.
The following example command will attempt to run all three steps (build, find and search) for the genes nifH, nirK, rplB, and nosZ specified in the input parameters. It creates an assembly output directory k45
for kmer length of 45. It makes an output directory for each gene inside the k45
directory and saves all the output in the gene output directories. This toy data set should take approximately 5 minutes to run. Messages will be echoed to the screen during this time. The program is finished when the input prompt reappears.
You can also run the three steps separately, or search multiple genes in parallel.
IMPORTANT: If you want to rebuild the bloom graph structure, you need to manually delete the bloom file (k45.bloom
) in the output directory. If you want to rerun finding the starting kmers for a gene, you need to manually delete that gene's output directory (in this case under k45
). As a safety precaution, the script will not automatically over-write these files, but they are specific to the data being searched in subsequent assembly (search) steps. That is, they have to be created for each data set.
Last updated
Was this helpful?