Monday, April 20, 2015

Adding folder to PATH in Ubuntu Linux

To add folder to the PATH for the current terminal session write:
export PATH=$PATH:path/to/folder
However, you need export if PATH is not in the environment and it usually is. So write only:
PATH=$PATH:path/to/folder
or
PATH=path/to/folder:$PATH

First example is specifying the folder to be read at the end (after all other folders in PATH) and the other one is specifying the folder to be read first. Sometimes if you have an older version of the binary in one folder that is already in PATH and you specify folder with the new version to be read at the end, the older version will be read first and you wont be using the new version of binary. In this case specify folder to be read first (example2).
To make changes permanent edit ~/.bashrc file and place the line above in it.
~/.bashrc gets sourced every time you open a terminal.

Some examples of directlabels with ggplot2

p<- ggplot(test4, aes(x=test4$FC, y=test4$PVALUE)) + geom_point(shape=19, alpha=1/8, color="red", size=8)
+ geom_dl(aes(label=test3, colour=test4$CATEGORY), list("smart.grid", cex = 1.5, hjust = 1)) 
+ scale_colour_discrete(name="CATEGORIES")
+ scale_x_continuous(limits = c(-5, 25))
+ xlab("Fold change")
+ ylab("-log P-value")
+ ggtitle("Title of the graph")



p<- ggplot(test4, aes(x=test4$FC, y=test4$PVALUE)) 
+ geom_point(shape=19, alpha=1/8, color="red", size=8)
+ geom_dl(aes(label=test3, colour=test4$CATEGORY), list("first.bumpup", cex = 1.5, hjust = 1))
+ scale_colour_discrete(name="CATEGORIES")
+ scale_x_continuous(limits = c(-5, 25))
+ xlab("Fold change")
+ ylab("-log P-value")
+ ggtitle("Title of the graph")






p<- ggplot(test4, aes(x=test4$FC, y=test4$PVALUE)) 
+ geom_point(shape=19, alpha=1/8, color="red", size=8)
+ geom_dl(aes(label=test3, colour=test4$CATEGORY), list("last.bumpup", cex = 1.5, hjust = 1)) 
+ scale_colour_discrete(name="CATEGORIES")
+ scale_x_continuous(limits = c(-5, 25))
+ xlab("Fold change")
+ ylab("-log P-value")
+ ggtitle("Title of the graph")






 p<- ggplot(test4, aes(x=test4$FC, y=test4$PVALUE)) 
+ geom_point(shape=19, alpha=1/8, color="red", size=8)
+ geom_dl(aes(label=test3, colour=test4$CATEGORY), list("top.bumptwice", cex = 1.5, hjust = 1)) 
+ scale_colour_discrete(name="CATEGORIES")
+ scale_x_continuous(limits = c(-5, 25))
+ xlab("Fold change")
+ ylab("-log P-value")
+ ggtitle("Title of the graph") 






Create matrix with repeating element in R

gwas<-matrix(rep("GWAS phenotype", 20),nrow=20,ncol=1)

> gwas
      [,1]
 [1,] "GWAS phenotype"
 [2,] "GWAS phenotype"
 [3,] "GWAS phenotype"
 [4,] "GWAS phenotype"
 [5,] "GWAS phenotype"
 [6,] "GWAS phenotype"
 [7,] "GWAS phenotype"
 [8,] "GWAS phenotype"
 [9,] "GWAS phenotype"
[10,] "GWAS phenotype"
[11,] "GWAS phenotype"
[12,] "GWAS phenotype"
[13,] "GWAS phenotype"
[14,] "GWAS phenotype"
[15,] "GWAS phenotype"
[16,] "GWAS phenotype"
[17,] "GWAS phenotype"
[18,] "GWAS phenotype"
[19,] "GWAS phenotype"
[20,] "GWAS phenotype"

Tuesday, April 14, 2015

Thursday, April 9, 2015

List the file backwards

In Unix command cat will list your file and command tac will list your file backwards (= cat spelled backwards)

How to limit search to subfolders in shell

If you have a script where you search for a specific subgroup of files, you may pick up files that are in subfolders which you did not intend to modify. To prevent this use -maxdepth 1 parameter with find function

E.g. this script will find all files that contain gwasbed in their name, cut first 3 columns, sort and uniq, and save them with _uniq extension, then intersect _uniq files using bedtools, but it will not pick files from subfolders.

for f in $(find . -maxdepth 1 -name \*gwasbed\*)
do
echo "$f"
cut -f1,2,3 "$f" | sort -k1,2 | uniq> "$f"_uniq
done


for f in $(find . -maxdepth 1 -name \*_uniq\*)
do
bedtools intersect -a "$f" -b L2_TCCTGAGC_L002_peaks.bed > ./INTERSECT_output/"$f"_ATAC;
done

Monday, April 6, 2015

How to get number of reads from fastq.gz

If you have a gzipped fastq file and want to get the number of reads without unpacking use zcat to read the file and pipe it to wc -l to get the number of lines, then divide it with 4 (since each read will occupy 4 lines in total): 

zcat file.fastq.gz | echo $((`wc -l`/4))

Converting ^M linebreak with sed

To convert ^M linebreaks that appear in your file, if you read it with vi or less, to a unix type linebreak use sed command:

sed -e 's/\r/\n/g' file.txt > file.txt_sed

which will substitute globally ^M with \n

(this work if you transfer a file from Mac OS to Linux since, /r is a Mac linebreak, for the DOS type of files use: 

sed -e 's/\r\n/\n/g' file.txt > file.txt_sed

since /r/n is a linebreak in DOS files).

Converting sam to bam

In case getopt() is not provided arguments cannot be intermingled, thus the command:

samtools view -bS file.sam -o file.bam

would give some binary output and "random alignment retrieval only works for indexed BAM files".

Instead group arguments such that options are separated from filenames:

samtools view -bS -o Aligned.out Aligned.out.sam

Or use > to direct output to bam file

samtools view -bS Aligned.out.sam > Aligned.out.bam