# Interactive Xander Example on MSU's HPCC

This exercise is applicable to a the installation of **RDPTools** on MSU's HPCC. Exercise steps are:

1. Create a working directory.
2. Copy the configuration and command scripts into the working directory.
3. Edit the configuration script.
4. From the working directory, enter a run command of the form:

```
./run_xander.sh my_xander_setenv.sh "build find search" "nifH nirK nosZ rplB"
```

## Copy Scripts into Working Directory

**1.** Create a working directory in your home directory, e.g. `~/test_xander`

```
mkdir ~/test_xander
```

**2.** Move into your working directory.

```
cd ~/test_xander
```

**3.** Copy `xander_setenv.sh` to your working directory renaming it to `my_xander_setenv.sh` in the process.

```
cp /mnt/research/rdp/public/RDPTools/Xander_assembler/bin/xander_setenv.sh ~/test_xander/my_xander_setenv.sh
```

**4.** Copy `run_xander_skel.sh` to your working directory, renaminig it to `run_xander.sh` in the processl.

```
cp /mnt/research/rdp/public/RDPTools/Xander_assembler/bin/run_xander_skel.sh ~/test_xander/run_xander.sh
```

## Edit the Configuration Script File

The configuration script is divided into three sections. The first section gives the absolute paths to the sequence (data) file, output directory, HMM files, and programs. The second section gives a short name that will be prefixed to the contig names. This name must be unique for each sample in an experiment. The third section gives parameters for each step in assembling the contigs. These parameters are specific to the data set, but usually no more than two or three need to be changed. (See the section **Choosing Xander Parameters**.) The configuraton file is commented to help you edit it for your particular installation and data set.

To run this exercise on MSU's HPCC, there is only one line in the configuration script that you need to change. With the editor of your choice, edit the path variable for your working directory in the First section of `my_xander_setenv.sh` to the absolute path, e.g `~/test_xander`. After your edits, this section should look like this:

```
## First section
## THIS SECTION MUST BE MODIFIED FOR YOUR FILE SYSTEM. MUST BE ABSOLUTE PATH
## SEQFILE can use wildcards to point to multiple files (fasta, fastq, or 
## gz format), as long as there are no spaces in the names.
SEQFILE=/mnt/research/rdp/public/RDPTools/Xander_assembler/testdata/test_reads.fa
WORKDIR=~/test_xander
REF_DIR=/mnt/research/rdp/public/RDPTools/Xander_assembler
JAR_DIR=/mnt/research/rdp/public/RDPTools
UCHIME=/mnt/research/rdp/public/thirdParty/uchime-4.2.40/uchime
HMMALIGN=/opt/software/HMMER/3.1b1--GCC-4.4.5/bin/hmmalign
```

To make this change using the editor **nano**, begin by entering the following:

```
nano my_xander_setenv.sh
```

Use the arrow keys to move the cursor to the appropriate place in the text. Insert characters by typing. Delete characters with the delete or backspace keys. There is a menu at the bottom of the **nano** screen indicating keys to use to write out the changed file and exit **nano**. The "^" in this menu means the "Ctrl" key. When you are finished making changes, hold down the Ctrl key and type o. You will be offered the opportunity to change the file name. To keep the same name, just hit the Enter key. Then hold down the Ctrl key and type x to exit **nano**. You may check that your changes have been made with **less**:

```
less my_xander_setenv.sh
```

## Do Not Edit the Run Script File

To run this example on MSU's HPCC, you do not need to edit anything in `run_xander.sh`. The path for `BASEDIR` is already set correctly.

## Run Xander

Make sure that the two script files are executable. Change the file permissions if necessary.

```
ls -l
chmod 755 my_xander_setenv.sh
chmod 755 run_xander.sh
ls -l
```

The following example command will attempt to run all three steps (build, find and search) for the genes nifH, nirK, rplB, and nosZ specified in the input parameters. It creates an assembly output directory `k45` for kmer length of 45. It makes an output directory for each gene inside the `k45` directory and saves all the output in the gene output directories. This toy data set should take approximately 5 minutes to run. Messages will be echoed to the screen during this time. The program is finished when the input prompt reappears.

```
./run_xander.sh my_xander_setenv.sh "build find search" "nifH nirK rplB nosZ"
```

You can also run the three steps separately, or search multiple genes in parallel.

```
./run_xander.sh my_xander_setenv.sh "build find" "nifH nirK rplB nosZ"
./run_xander.sh my_xander_setenv.sh "search" "nifH" &
./run_xander.sh my_xander_setenv.sh "search" "nirK" &
./run_xande.sh my_xander_setenv.sh "search" "rplB" &
./bin/run_xander.sh my_xander_setenv.sh "search" "nosZ" &
```

**IMPORTANT:** If you want to rebuild the bloom graph structure, you need to manually delete the bloom file (`k45.bloom`) in the output directory. If you want to rerun finding the starting kmers for a gene, you need to manually delete that gene's output directory (in this case under `k45`). As a safety precaution, the script will not automatically over-write these files, but they are specific to the data being searched in subsequent assembly (search) steps. That is, they have to be created for each data set.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://jfq3.gitbook.io/rdp-tutorials/run_xander/xander_hpcc_example.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
