Protocols

Take a moment to familiarize yourself with some of the available programs. These programs are intended to be run through the "Terminal" app.

Protocol1

Protocol1 takes any protein sequence, gi number, or accession number and performs an NCBI BLAST, returning a list of close homologs that have e values of 0.005 or better. The user is presented with the results and is given the option to perform a second iteration. Once the user is satisfied with the results, Protocol1 removes any redundancies from the gi list and submits it to NCBI's Protein Entrez tool, where it downloads a TinyXML format file containing information about each gi number. This is then run through Make_Table5, which removes short, and abnormally long sequences, removes similar sequences, and annotates the remaining sequences.

Usage: Type "protocol1" to begin...

Protocol2

Protocol2 takes two FASTA files or one alignment file and compares all sequences to each other using the Smith Waterman algorithm to determine their best local alignments. Results are returned in standard deviation (S.D.) values. The user is then given the option to run GSAT or GAP on selected binary comparisons generated by the Smith Waterman search. The user can select any result, or let the program automatically pick “good” scores of 10 S.Ds or better. Protocol2 automatically runs HMMGAP on results analyzed by GSAT or GAP, and generates a detailed HTML report displaying the best results in S.D. values in descending order. Each analysis shows the Smith Waterman local alignment, a Needleman Wunsch global alignment, numbered TMS regions in the global alignment, and tools such as: MPVT, WHAT and HMMTOP.

Usage: Type "protocol2" to begin...

GSAT

GSAT stands for Global Sequence Alignment Tool. This program performs a Needleman Wunsch global alignment on two protein sequences. GSAT uses a BLOSUM62 scoring matrix provided by EMBOSS. Protein A is compared to protein B to calculate a Needleman Wunsch (N.W.) score. Protein B is then shuffled twice for every amino acid when the protein is over 60 residues long. Thus, protein B is 100 residues long, it will be shuffled 200 times. However, if protein B is under 60 residues long, it will be shuffled 3 times per amino acid. Shorter sequences tend to give a wider range of results, so to ensure consistency, GSAT analyzes these more rigorously. After protein B has been shuffled, it is compared to protein A so that each shuffle gets a new NW score. The average quality is then calculated as: AverageQuality= MeanNWScores +/- StandardDeviation. The
“Z” Value, or “Standard Score” is then calculated as follows: Z=(OriginalScore-AverageQuality)/StandardDeviation. Three Z values are reported; two for each potential AverageQuality value plus or minus the standard deviation value, and the average of the Z values. Z values are a measure of how many standard deviations an original NW score is from the average NW score. A result of 10 standard deviations or higher is strongly suggestive of homology.

Usage: Type "gsat" to begin...

TMSAlign

TMSAlign takes a single FASTA file or an alignment file and extracts all sequences and removes redundancies. It then performs a “Multiple Sequence Alignment” on all the sequences using ClustalW. Aligned sequences are then compared to their original FASTA sequence through HMMGAP, and TMS regions are marked and numbered. TMSAlign generates an HTML report containing the multiple alignment with TMS regions highlighted in red with their numeric positions indicated directly above them.

Usage: Type "tmsalign" to begin...

HMMGAP

HMMGAP is a tool that is used within several other programs and can also be used as a standalone feature. HMMGAP takes two parameters: “Input Sequence” and “Gap Segment”. Gap Segment refers a segment of the input sequence containing gaps delimited with periods, (“.”). HMMGAP returns the segment with highlighted TMS regions that are numbered relative to the entire sequence. The input segment does not have to contain gaps, but is usually determined by a Needleman Wunsch comparison. TMS regions are determined by HMMTOP.

Usage: hmmgap FULLSEQUENCE GAP_SEGMENT OUTFILE

MPVT

MPVT is the Movable Protein Visualizer Tool. Provided with two sequences and their GSAT/GAP output (global & local alignment), MPVT allows users to view and manipulate the two full sequences in a drag and drop setting. Their best local and global alignments are highlighted in bright pink, and TMS regions are annotated using HMMGAP.

SortTree

SortTree is a tool designed to sort Make_Table5's TAB file according to a phylogenetic tree. It can also sort FASTA sequences.

Usage: Type "SortTree" to begin...
- Use the SortTree program on BioTools to sort FASTA files.

ALN2FAS

This tool will convert an alignment file to FASTA format. Users may select a starting position and an end position, and ALN2FAS captures only the selected range in to be included in the exported FASA file.

Usage: Type "aln2fas" to begin...

GBLAST2

GBLAST2 takes a genome file in FASTA format and compares it against TCDB in order to find its transport proteins recognized by TCDB above a custom threshold value. GBLAST2 uses the NCBI BLAST tool to generate results. GBLAST2 generates a Tab Separated Value (TSV) format report containing annotations for the best hits for each Transport Classification ID (TCID).

Usage: Located on the 51 server.
% cd genome
% ./gblast.pl <input_genome>

GenomeCompare

GenomeCompare allows a user to compare one subject genome file to multiple target genomic proteome files simultaneously. This program generates a TSV result page displaying the best hit from the target genome(s) for each subject sequence. This program also annotates results, and reports total predicted TMS numbers.

Usage: Located on the 51 server.

% cd genome/blaster

% ./genome_compare.php

k, have fun,

-V

Saier Lab Protocols