Tuesday, April 29, 2014

Quickly reverse a matrix in R

If you have a matrix in R, and want to reverse it do the following brilliant step:
matrix <- matrix[nrow(matrix):1, ]
This will list rows from the matrix in reverse order from nrow(matrix) to 1


   Number Year per100k
1     419 2013  37.394
2     342 2012  32.038
3     241 2011  23.931
4     155 2010  16.618
5      76 2009   8.736
6      34 2008   4.096
7       5 2007   0.640
8       0 2006   0.000
9       0 2005   0.000
10      0 2004   0.000
11      0 2003   0.000
12      0 2002   0.000
13      0 2001   0.000
14      0 2000   0.000
15      0 1999   0.000

Will become

   Number Year per100k
15      0 1999   0.000
14      0 2000   0.000
13      0 2001   0.000
12      0 2002   0.000
11      0 2003   0.000
10      0 2004   0.000
9       0 2005   0.000
8       0 2006   0.000
7       5 2007   0.640
6      34 2008   4.096
5      76 2009   8.736
4     155 2010  16.618
3     241 2011  23.931
2     342 2012  32.038
1     419 2013  37.394

Monday, April 28, 2014

Problems connecting to SSH server via Filezilla

If you try to connect to the SSH server using by entering host username/password and then pressing quickconnect, you may not be able to connect to the server:
Connection attempt failed with "ECONNREFUSED - Connection refused by server".
Error:    Could not connect to server

This is because you didn't specify this is a SSH connection.
Open File/Site Manager
New site
Enter name

Change the protocol to SFTP.
Enter Host and username/password.

You should be able to log in.

Friday, April 25, 2014

My blog stats

If I look at my blog stats I see this monthly profile:
We can see peoples work habits in science from this graph, i.e. when people will search for solutions to their work-related problems.
So it appears that weekdays are preferred days of people visiting my blog, and among them Tuesdays and Wednesdays are the most visited days. I would summarize it like this: Mondays are always painful, Tuesday  and Wednesday work work, Thursday work less weekend is close, Friday who cares can't wait to go home.
Sutraday and Sunday relax.

Thursday, April 24, 2014

FastQC to check fastq files from terminal via SSH

If you want to check quality of fastq files and get a full report use FastQC.
Download FastQC http://www.bioinformatics.babraham.ac.uk/projects/fastqc/
Unpack unzip fastqc_v0.10.1.zip
Check if you have java runtime environment JRE, above 1.6 or 1.7: java -version
Change permissions to fastqc: chmod 755 fastqc
Run FastQC: ./fastqc
This will run fastqc as a graphical interaction interface.

If you run it on a server, log in with: ssh -X

Choose file you want to analyze and run the analysis.

If you want to run it in a non interactive mode to analyze multiple files at once or to pipe it in a script use: ./fastqc filename1.fastq filename2.fastq filename3.fastq
Each file will be analyzed and an html report will be created and no interactive mode will be launched when you put the filename after ./fastqc

If you copy the executable file fastqc to a folder where you have your fastq files and try to run it with ./fastqc you will get this message:
Error: Could not find or load main class uk.ac.babraham.FastQC.FastQCApplication

as java cannot find the initial class file FastQCApplication.class that is located in the FastQC folder.

You may want instead to create a link in that folder to the fastqc file or place a link in usr/local/bin/:
sudo ln -s /path/to/FastQC/fastqc /usr/local/bin/fastqc

Thursday, April 17, 2014

How to see a line in a file

To see line 50201 in a file combine head and tail commands

head -50201 filename | tail -1

head -50201 filename
will list you all lines in a file starting from the beggining to the line 50201
But when combined with
tail -1 
it will show only the last line of this head selection and that is line 50201

How to change spaces to tabs in a text file

I tried to use bedtools closest to get the nearest genes to the peaks from one chipseq experiment. Bedtools required tab-delimited bed file.

So, how to change spaces from a text file into tabs.
Use tr, type in terminal

tr ' ' '\t' <inputfile >outputfile

How to swap two or more columns in a text file in UNIX using awk

If you need to swap two columns in a text file use awk.
To swap colums 1 and 2, in Unix terminal type:
awk '{ t=$1 ; $1=$2; $2=t; print }' inputfile > outputfile

To swap columns 1, 2, 3 to the order 3, 1, 2, type:
awk '{ t=$1 ; $1=$3; $3=t; print }' inputfile > temp
awk '{ t=$2 ; $2=$3; $3=t; print }' temp > outputfile

Monday, April 14, 2014

Find command in terminal history

If you had typed a command and want to repeat it in UNIX terminal, you can either press up arrow till you find your command,

Or if this is to long, press ctrl R, type the beginning of your command and the first command in history that starts like that will appear.

Keep pressing ctrl R till you find that one command in history that you need.

Thursday, April 10, 2014

Merge multiple BAM files

If you have multiple lanes in the sequencing experiment after mapping to the genome using bowtie you will end up with multiple SAM files, as bowtie will allow only one fastq file as input. To continue with the pipeline you will need to merge SAM files with samtools. First convert them to BAM:

samtools view -Sb file.sam > file.bam

samtools merge output.bam file1.bam file2.bam file3.bam

Convert SAM to BAM

Use samtools to convert SAM to BAM format

samtools view -Sb file.sam > file.bam

-S Input is in SAM
-b Output in the BAM format

Wednesday, April 9, 2014

Running multiple shells using screen on a remote UNIX server

If you want to run one script overnight on a UNIX server while logging off and going home for a cosy sleep, see the previous post.

However if you want to run two or more scripts overnight on a same server you have to create two or more shells after running screen and creating a session.

To create new shells within a screen session use ctrl+a c
Create as many shells as you have scripts.

Switch between shells using command ctrl+a n. It will move to the next screen.
Go to the previous screen, ctrl+a p

Run scripts.

Detach from session ctrl+a d
Go home

Using command screen to run scripts on a remote UNIX server

Screen is a useful tool if you want to log on to the server and run a script, e.g overnight, log off, and take your laptop home. Without screen after losing connection with the server the script would simply stop. However running screen first on the server and then running your script allows you to detach from screen, then log off from the server, however your commands will still run on the server within the screen session you created. When home (or tomorrow at work) log in to the server, run screen and reattach session. You will have your script (possibly) finished.

> screen -S name
> run your script
ctrl-a d

Log out.

Next day:
> screen -r name

and the last screen comes back to your laptop!

Friday, April 4, 2014

Create barplot in R

If you have a simple table e.g. expression values of two genes in different tissues and want to plot it as a barplot in R, do the following:

WhiteBloodCells 36 4
LymphNode 37 11
Brain 23 1
Heart 46 9
SkeletalMuscle 45 2
Colon 50 29
Adipocyte 36 7
Kidney 31 30
Liver 19 6
Lung 42 70
Thyroid 37 4
AdrenalGland 42 5
Breast 31 3
Ovary 49 40
Prostate 49 24
Testis 35 22

Save table as a text file e.g. gene1_gene2_expressiondata
In R use read.table

gene1_gene2 <- read.table("gene1_gene2_expressiondata", header=T, sep=" ")

Make a rotate function to inverse the table:

rotate <- function(x) t(apply(x, 2, rev))

Rotate the table:
gene1_gene2_rotate <- rotate(gene1_gene2)

Start a graphics device driver

Plot the graph. Use cex to regulate size of main, lab and axis, font=2 for bold font, col=rainbow(2) to pick two colors from the rainbow color palette.

barplot(as.matrix(gene1_gene2_rotate), beside=T, main="RNA-Seq data from human tissues - Illumina Human BodyMap 2.0", ylab="FPKMx100", xlab="Tissue", col=rainbow(2), cex.main=2, font.main=2, cex.lab=1.2, font.lab=2, cex.axis=1.2)

Use res, width and height to adjust the size of the graph and resolution you want.
dev.copy(jpeg,filename="gene1_gene2_plot.jpg", res=120, units="in", width=22, height=8)

Make a legend with rownames of gene1_gene2_rotate matrix, bty="o" to draw box or bty="n" without box

legend("topleft", rownames(gene1_gene2_rotate), cex=1.5, bty="n", fill=rainbow(2), text.font=2);

The output look like this: