Monday, December 10, 2012

Comparing two files using grep/awk in Unix

If you have file1:

1
3
5
6
8

and file2:
1       a
2       b
3       c
4       d
5       e
6       f
7       g
8       h

And you want to get all the rows from file 2 containing the same field in column 1 as file1:
grep "$(cat file1)" file2 | awk '{print $0}' > file3
nano file3
1       a
3       c
5       e
6       f
8       h


In case you want the opposite all the rows from file2 that do not contain the same field in column1 as file1:

grep -v "$(cat file1)" file2 | awk '{print $0}' > file3
nano file3
2       b
4       d
7       g

In case you want to output only column 1 from file 2:
grep -v "$(cat file1)" file2 | awk '{print $1}' > file3
nano file3

2
4
7

or to plot just column 2:
grep -v "$(cat file1)" file2 | awk '{print $2}' > file3
nano file3

b
d
g

____________


Now in case of file1 with more than one column:
1       y
3       y
5       y
6       y
8       y

and file2:
1       a
2       b
3       c
4       d
5       e
6       f
7       g
8       h

You have to compare only the column1 of file1 with the file2, so type:
grep "$(cut -f1 file1)" file2 | awk '{print $0}' > file3
nano file3
1       a
3       c
5       e
6       f
8       h

___________

The problem with this code is that in case you have a line in file2 containing e.g. the number 88 it will be picked up because it contains a character 8 present in file1:
1       a
2       b
3       c
4       d
5       e
6       f
7       g
8       h
88       h

grep "$(cut -f1 file1)" file2 | awk '{print $0}' > file3
nano file3
1       a
3       c
5       e
6       f
8       h 
88       h

So this code is useless unless you're dealing with simple files.
___________

SOLUTION

One of the solutions is to use the awk in this way:

awk -F " " 'BEGIN{while(getline<"file1") a[$1]=1 } ; a[$1] ==1 {print $0 } ' file2 > file3
nano file3
1    a
3    c
5    e
6    f
8    h


To output lines of file2 that do not contain the same field in column1 as file1:

awk -F " " 'BEGIN{while(getline<"file1") a[$1]=1 } ; a[$1] !=1 {print $0 } ' file2 > file3
nano file3
2       b
4       d
7       g
88      h

Graph subtitle in R

To plot the subtitle in your graph in R:
After you have plotted the graph type:
mtext("sub title")

Friday, December 7, 2012

Keep unique lines in file using only first field of the line

To keep unique lines in the file but only using the first field of the line. Note that the first line in the file will be kept the rest will be discarded.
 
awk '!x[$1]++' filename > filename2

e.g.

Myh7    -3.41856493500000000000 0.00000000005092670000 
Mest    -2.74194127200000000000 0.00003431680000000000

Mest    -2.67886271200000000000 0.00005941270000000000 
Mest    -2.45088556100000000000 0.00000001772790000000
Mest    -2.43988796300000000000 0.00000001257030000000
Mest    -2.41913470400000000000 0.00000000293480000000
Mest    -2.41640532100000000000 0.00000001299950000000
Mest    -2.40872435300000000000 0.00000001836740000000
Mest    -2.37917197000000000000 0.00000002830130000000
Mest    -2.37905761900000000000 0.00000002079510000000

Mest    -2.74194127200000000000 0.00003431680000000000
Pde4dip -2.70488951700000000000 0.00000185955000000000  
...

will give:

Myh7    -3.41856493500000000000 0.00000000005092670000
Mest    -2.74194127200000000000 0.00003431680000000000
Pde4dip -2.70488951700000000000 0.00000185955000000000
...

Wednesday, December 5, 2012

How to select columns from file with two or more numeric criteria in Unix using awk

To select lines from files that contain specific numeric criteria in two or more columns use awk:

awk -F "|" '{ if ( $2 > 10 || $3 > 10 || $4 > 10 && $5 < 10 ) print $0 }'

-F "|" defines the delimiter if necessary, in case of tab delimited file this is not necessary.

$2 is the second column etc.

so this command will select all lines with number greater than 10 in either 2nd 3rd or 4th column and that have number less than 10 in 5th column.

How to index BAM files for viewing in IGV browser

If you have tried to upload BAM files to IGV you may get an error stating the files are not indexed.
Install samtools.
http://samtools.sourceforge.net/

Go to the folder where your BAM file is 
samtools index accepted_hits.bam

After this accepted_hits.bam.bai file will be created and the original BAM file can be uploaded to IGV 

Monday, December 3, 2012

How to create a matrix starting from variables in R


> c1 <- c(2.2, 6.123, 4.333, 5.34567)
> c2 <- c(5, 1.123456, 12.1123, 0.322)
> c3 <- c1/c2
> c4 <- c1*c2
> 
> x <- data.frame(c1, c2, c3, c4)
> row.names(x) <- c("r1", "r2", "r3", "r4")
> x
        c1        c2         c3        c4
r1 2.20000  5.000000  0.4400000 11.000000
r2 6.12300  1.123456  5.4501467  6.878921
r3 4.33300 12.112300  0.3577355 52.482596
r4 5.34567  0.322000 16.6014596  1.721306
> 
> fx <- format(x, digits=3)
> fx
     c1     c2     c3    c4
r1 2.20  5.000  0.440 11.00
r2 6.12  1.123  5.450  6.88
r3 4.33 12.112  0.358 52.48 
r4 5.35  0.322 16.601  1.72

To print the first column:

> fx[,1]
[1] "2.20" "6.12" "4.33" "5.35"

To print the second row:

> fx[2,]
     c1     c2     c3    c4
r2 6.12  1.123  5.450  6.88

Web browsers as file menagers

To view files from your computer in a web browser type file:
file:// 
followed by the folder path,

e.g. in Mac OS10 to view files in my Downloads folder I type:
file:///Users/mpjanic/Downloads/

To view textual files in the browser just click on them.

How to prevent accidental changing of the brightness (F1, F2 keys) on Mac keyboard

In case you have a Mac you may end up with accidental pressing of the brightness keys (F1, F2) that will lower the brightness of the screen and you may realize this only at the end of the day, after you have spent whole day deteriorating your vision.

Do the following:

Go to System Preferences,
Keyboard
Select Use all F1, F2 etc. keys as standard function keys (When this option is selected press Fn key to use the special features printed on each key)

Fn key on the Mac keyboard is located below F13 key!