SHELL Scripts for Simple Bioinformatics Analysis
Simple SHELL script for parsing BLAST output
1. To parse the sequence names from BLAST output.
“grep” is one of the very powerful unix command to retrieve the particular pattern from a file.
Syntax:
grep “” input_file
Example: grep “>” Blast_output.txt
In this above example grep command will retrieve the lines which are having “>” symbol. In Blast output file all the sequence names are starting with “>”. So you can get all the sequence names in the Blast output file.
2. Parsing the Sequence names and the sequences from the BLAST output
“egrep” is one of the powerful command in retrieving multiple patterns from a file.
Syntax:
egrep “pattern1 | pattern2 | pattern3″ filename
Example:
Below is the combination of SHELL and Perl script for parsing the BLAST Output.
egrep “> | sbjct” Blast_output | sed ’s/Sbjct://’ BLAST_output.txt >output.txt
open (FH, output.txt);
while(”"= $ln)
{
if($ln !~ m/>/)
{
@temp = split(/\t/,$ln);
print “$temp[1]\n”;
}
else
{
print $ln;
}
}
In the above example egrep will retrieve the lines which are matching with “>” and Sbjct and store the output in output.txt. Then the Perl script will parse the sequeunces.