Adding SeqMatch Databases

Add a SeqMatch Database

Click on this link to download the archive file SeqMatch_DBs.zip. Place the file in the folder you mapped to the contianer (see creating-the-container) and decompress it. This generates the directory SeqMatch_DBs with the following files:

| SeqMatch_DBs  
├── release11_4_types.trainee  
├── release11_4_type_descriptions.txt  
├── release11_4_bac_islote_descriptions.txt  
├── isolate_trainee_files  
  ├──── release11_4_bac_isolates_1.trainee  
  ├──── release11_4_bac_isolates_2.trainee  
  ├──── release11_4_bac_isolates_3.trainee
  ├──── release11_4_bac_isolates_4.trainee  
  ├──── release11_4_bac_isolates_5.trainee  
  ├──── release11_4_bac_isolates_6.trainee  
  ├──── release11_4_bac_isolates_7.trainee  
  ├──── release11_4_bac_isolates_8.trainee  
        

As you can see, their are two databases; one for type strains and another for bacterial isolates.

Using SeqMatch

Entering the following command from within the container ...

java -jar /usr/local/RDPTools/SequenceMatch.jar seqmatch

Gives a help message for the seqmatch program:



usage: seqmatch <refseqs | trainee_file_or_dir> <query_file>
                trainee_file_or_dir is a single trainee file or a
                directory containing multiple trainee files
 -d,--desc <arg>      A tab-delimited description file containing seqID
                      and description
 -k,--knn <arg>       Find k nearest neighbors [default = 20]
 -o,--outFile <arg>   Write output to a file
 -s,--sab <arg>       Minimum sab score [default = .5]

To classify sequences in a fasta file with the type strains database, use a command of the form (edit paths as necessary):

java -jar /usr/local/RDPTools/SequenceMatch.jar seqmatch \
~/SeqMatch_DBs/release11_4_types.trainee \
query.fasta \
--desc ~/SeqMatch_DBs/release11_4_type_descriptions.txt \
--knn 20 \
--sab 0.5 \
--outFile query_classified.tsv

To classify sequences in a fasta file with the isolates database, use a command of the form (edit paths as necessary):

java -jar /usr/local/RDPTools/SequenceMatch.jar seqmatch \
~/SeqMatch_DBs/isolate_trainee_files \
query.fasta \
--desc ~/SeqMatch_DBs/release11_4_bac_isolate_descriptions.txt \
--knn 20 \
--sab 0.5 \
--outFile query_classified.tsv

Last updated