AWK one-liners for Bioinformatics

January 24th, 2012

Adding a column in a file:

cat filename 4 4 3 1 8

To get the Sum do,

awk ‘{ for (i = 1; i <= NF; i++) s = s+$i }; END { print s+0 }’ filename

It prints the sum of all fields. You need not initialize variable s. to 0.

It was not necessary as variables come into existence dynamically. Also notice how it calls .print s+0. and not just print s.

Double-space a file:

awk ’1; { print “” }’ filename OR awk ’1 { print } { print “” }’filename

OR

awk ‘{ print } { print “” }’ filename

The first print statement with no arguments is equivalent to “print $0″, where $0 is a variable holding the entire line. The second print statement prints nothing. awk ‘NF { print $0 “\n” }’ filename This one liner says: “If there are any number of fields, print the whole line followed by newline.”

 

awk ‘NF > 4′ filename

This one-liner omits the action statement. A missing action statement is equivalent to ‘{ print }’.

Print every line where the value of the last field is greater than 4.

awk ‘$NF > 4′ filename

This one-liner is similar to above. It references the last field by NF variable. If it’s greater than 4, it prints it out.

Print the maximum number of fields on any input line.

awk ‘{ if (NF > max) max = NF } END { print max }’ filename

Print Random Numbers.

awk ’BEGIN { for (i = 1; i <= 7; i++) print int(101 * rand()) }’

This program prints 7 random numbers from 0 to 100, inclusive.

Print a sorted list

awk ‘BEGIN { FS = “:” } { print $1 | “sort” }’ /etc/passwd

This program prints a sorted list of the login names of all users.

Print the even-numbered lines in the file.

awk ‘NR % 2 == 0′ filename

Insert 5 blank spaces at beginning of each line.

awk ‘{ sub(/^/, “ “); print }’ filename

This one-liner substitutes the zero-length beginning of line anchor “^” with five empty spaces.

Align all text flush right on a 79-column width.

awk ‘{ printf “%79s\n”, $0 }’ filename

This one-liner asks printf() to print the string in $0 variable and left pad it with spaces until the total length is 79 chars. // –>

 Remove duplicate, consecutive lines (emulates "uniq")
 awk 'a !~ $0; {a=$0}'
 Delete ALL blank lines from a file (same as "grep '.' ")
 awk NF
 awk '/./'

Substitute “foo” with “bar” only on lines that do not contain “baz”.

awk ‘!/baz/ { gsub(/foo/, “bar”) }; { print }’ filename

Print the fields in reverse order on every line.

awk ‘{ for (i=NF; i>0; i–) printf(“%s “, $i); printf (“\n”) }’ filename

Awk sets the NF variable to number of fields found on that line.This one-liner loops in reverse order starting from NF to 1 and outputs the fields one by one. It starts with field $NF, then $(NF-1), …, $1. After that it prints a newline character.

Print the first two fields in reverse order on each line.

awk ‘{ print $2, $1 }’ filename

This one liner is obvious. It reverses the order of fields $1 and $2.

Join a line ending with a backslash with the next line.

awk ‘/\\$/ { sub(/\\$/,””); getline t; print $0 t; next }; 1′ filename

Swap first field with second on every line.

awk ‘{ temp = $1; $1 = $2; $2 = tem; print }’ filename

This one-liner uses a temporary variable called “tem”. It assigns the first field $1 to “tem”, then it assigns the second field to the first field and finally it assigns “tem” to $2. This procedure swaps the first two fields on every line.

Delete the second field on each line.

awk ‘{ $2 = “”; print }’ filename

This one liner just assigns empty string to the second field. It’s gone.

Remove duplicate, consecutive lines (emulate “uniq”)

awk ‘a !~ $0; { a = $0 }’ filename

Print the first 10 lines of a file (emulates “head -10?).

awk 'NR < 11' filename

OR

awk '1; NR == 10 { exit }' filename

Print the last 2 lines of a file (emulates “tail -2?).

awk '{ y=x "\n" $0; x=$0 }; END { print y }' filename

Print the last line of a file (emulates “tail -1?).

awk '{ rec=$0 } END{ print rec }' filename

Print only the lines that match a regular expression “/regex/” (emulates “grep”).

awk '/regex/' filename
 

Print only the lines that do not match a regular expression “/regex/” (emulates “grep -v”).

awk '!/regex/' filename

Print the line immediately before a line that matches “/regex/” (but not the line that matches itself).

awk '/regex/ { print x }; { x=$0 }' filename
 

This one-liner always saves the current line in the variable “x”. When it reads in the next line, the previous line is still available in the “x” variable. If that line matches “/regex/”, it prints out the variable x, and as a result, the previous line gets printed.

Print the line immediately after a line that matches “/regex/” (but not the line that matches itself).

awk ‘/regex/ { getline; print }’ filename

Print lines that match any of “AAA” or “BBB”, or “CCC”.

awk ‘/AAA|BBB|CCC/’ filename

Print lines that contain “AAA” and “BBB”, and “CCC” in this order.

awk ‘/AAA.*BBB.*CCC/’ filename

Print a section of file from regular expression to end of file.

awk ‘/regex/,0′ filename

This looks for the line that is starting with the “/regex/” and prints from that line till EOF

Print lines 8 to 12 (inclusive).

awk ‘NR==8,NR==12′ filename

Print line number 52. awk ‘NR==52′ filename Quit after line 52awk ‘NR==52 { print; exit }’ filename

Print section of a file between two regular expressions (inclusive).

awk ‘/Iowa/,/Montana/’ filename

Substitute (find and replace) “foo” with “bar” on each line.

awk ‘{ sub(/black/,”red”); print }’ filename

It uses the sub() function to replace “black” with “red”. Note that it replaces just the first match. OR To replace all “black”s with “red”s use the gsub() function, awk ‘{ gsub(/black/,”red”); print }’ filename

What is Perl ?

January 24th, 2012

Perl:

* Perl is a stable, cross platform programming language.
* Perl stands for Practical Extraction and Report Language.
* It is used for mission critical projects in the public and private
sectors.
* Perl is Open Source software, licensed under its Artistic

License or the GNU General Public License (GPL).
* Perl was created by Larry Wall.
* Perl 1.0 was released to usenet’s alt.comp.sources in 1987
* PC Magazine named Perl a finalist for its 1998 Technical
Excellence Award in the Development Tool category.
* Perl is listed in the Oxford English Dictionary.

Supported Operating Systems:

* Unix systems
* Macintosh – (OS 7-9 and X) see The MacPerl Pages.
* Windows – see ActiveState Tools Corp.
* VMS
* And many more…

Best Features Of Perl :

* Perl takes the best features from other languages, such as C, awk,
sed, sh, and BASIC, among others.
* Perls database integration interface supports third-party databases including Oracle, Sybase, Postgres MySQL and others.
* Perl works with HTML, XML, and other mark-up languages.
* Perl supports Unicode.
* Perl is Y2K compliant.
* Perl supports both procedural and object-oriented programming.
* Perl interfaces with external C/C++ libraries through XS or SWIG.
* Perl is extensible. There are over 500 third party modules available
from the Comprehensive Perl Archive Network.
* The Perl interpreter can be embedded into
other systems.

PERL and the Web

* Perl is the most popular web programming language due to its text
manipulation capabilities and rapid development cycle.
* Perl is widely known as ” the duct-tape of the Internet.
* Perl’s CGI.pm module, part of Perl’s standard distribution, makes
handling HTML forms simple.
* Perl can handle encrypted Web data, including e-commerce transactions.
* Perl can be embedded into web servers to speed up processing by as
much as 2000%.
* mod_perl allows the Apache web server to embed a Perl interpreter.
* Perl’s DBI package makes web-database integration easy.

Python PIP package

November 14th, 2011

sudo apt-get install python-pip

pip is a tool for installing and managing Python packages,

Example:

sudo pip install “numpy>1.4″

Bioinformatics Definition / Bioinformatics Definitions / What is Bioinformatics ?

November 20th, 2008


Bioinformatics is a tool to solve the Biological problems based on existing data.

Bioinformatics is a method to solve the Biological outcomes based on existing experimental results.

Bioinformatics = Biology + Informatics + Statistics + (Bio-Chemistry + Bio- Physics).

Bioinformatics creates the way for the Biologists to store all the data.

Bioinformatics makes some lab experiments easy by predicting the outcome of the lab experiment.

Somtimes Bioinformatics shows the initial way to start the lab experiment from existing results.

Bioinformatics helps the researchers to get an idea about any lab experiments before they start.