Wednesday, November 28, 2012

How to modify properties of wig custom tracks in UCSC Genome Browser

If you uploaded wig file as a custom track in UCSC and you want to modify the properties of it do the following.
When in the Genome Browser click on the Manage custom tracks bellow, then click on the wig track you want to modify.
Enter for example:

track type=wiggle_0 name="NSC_H2AU_vs_IgG_control_all" smoothingWindow=4 description="Extended tag pileup from MACS version 1.4.2 20120305 for every 10 bp" color=123,100,50 autoScale=off viewLimits=1:20 visibility=full windowingFunction=maximum

name="", name of the track displayed on the side of the Genome browser
description="", name of the track displayed on the top of the track
color=x,y,z, color of the track in RGB decimal values

See this link for the examples of colors and their respective RGB decimal values:

autoScale=off, set the autoscale off if you want to compare different tracks, otherwise each track will be set to its own scale
viewLimits=1:x, set the view limits for the track, keep this the same for different wig files if you want to compare them
visibility=full, set to full to expand the track, otherwise put dense
windowingFunction=maximum, when the individual data points of the wig file cannot be presented in the plot, then UCSC will combine the nearby values to produce a plot point. Set this to maximum, mean, minimum, or mean+whiskers.
smoothingWindow=4, number of the pixels of the smoothing window that will pass over the plot to smooth the edges of the peaks

Monday, November 26, 2012

How to set limits to a graph in R using plot

If you use plot command to do a scatterplot between two variables in R and if you have one variable containing a numerical value that is way off limits the scatterplot will consider a complete range of that variable (including the one off limit) and will not represent the relationship between variables in a good manner.

For example x and y are variables with 9986 numerical values.

To do a scatterplot type:


What you get is a graph that contains one outlier that is messing up the whole graph (that particular point contains the value for x of ~ -400000 and it completely masks the entire relationship between x and y).


plot(x,y, ylim=c(-10,10), xlim=c(-10,10))

to limit the x and y axes to -10 to 10 regions.

How to change numerical values from scientific to standard notation in R

It happened to me that i have imported a datasheet into R with read.table and one of the columns was not in a standard decimal but in a scientific notation.

comp_rn_rnra <- read.table("rn_rn+ra_Galaxy8-[Concatenate_datasets_on_data_5_and_data_7](1).tabular")

The data from column 18th was in a scientific format:

 comp_rn_rnra [,18]


[9961] -2.733851e+00 -2.927036e+00 -2.675253e+00 -2.007405e+00 -2.049330e+00
[9966] -1.604401e+00 -1.940523e+00 -4.006117e+00 -4.449443e+00 -2.298376e+00
[9971] -2.298376e+00 -6.736356e+00 -3.351846e+00 -1.122227e+01 -1.873104e+00
[9976] -5.679507e+00 -2.303457e+00 -2.079998e+04 -2.694311e+00 -1.539045e+00
[9981] -4.985677e+00 -5.308146e+00 -2.291038e+00 -1.578426e+00 -2.274986e+00
[9986] -2.439064e+01

How to change it back to a standard decimal notation?


options (scipen=10)

Now print 18th column:

comp_rn_rnra [,18] 


[9961]      -2.733851      -2.927036      -2.675253      -2.007405
[9965]      -2.049330      -1.604401      -1.940523      -4.006117
[9969]      -4.449443      -2.298376      -2.298376      -6.736356
[9973]      -3.351846     -11.222271      -1.873104      -5.679507
[9977]      -2.303457  -20799.979589      -2.694311      -1.539045
[9981]      -4.985677      -5.308146      -2.291038      -1.578426
[9985]      -2.274986     -24.390642

To revert to scientific type:

options (scipen=0)

How to change file attributes in workflows in Galaxy

If you extract a workflow in Galaxy from the current history that you created, and if in that history you have changed the type of the files you are manipulating with, you will need to specify this is the workflow (otherwise the workflow will not work).

For example sometimes you need to specify the change of the datatype from txt to tabular:

Click on the Edit Step Actions/Change datatype on the sidebar where the step that involves the changing of the datatype is. Click Create.

Then click on the New Datatype and change to tabular and you are done.

Tuesday, November 20, 2012

How to select lines with a numeric value in specific column greater than cetrain value in Unix

Lets say that you have a text file with 3 columns and you want to select lines that contain in column 2 a numerical value greater than certain value. Use awk to do this task:

awk '{ if ( $2 >= 1.5 ) print $0 }'

This command will print out all lines from the file that contain a numerical value greater or equal than 1.5 in the second column.

Thursday, November 15, 2012

How to create workflow in Galaxy

If you need to repeat the analysis in Galaxy with another input dataset/s create workflow and let the job run automatically, it will save you a lot of time.

Click on the small gear icon in the right corner then click Extract workflow. After this select those steps that you want to be in the workflow, write the name of the workflow and lick Create workflow.

After this click on the Workflow at the top, then click on the workflow you created. Then click Run. Select the input file/s for the Step 1 and click Run workflow.

How to filter files using the sed command in Unix

If you need to filter a file and keep only those line thta contain specific chracter/string you can use grep command

but also you can use sed

sed -n 's/yes/&/p' file.txt>file2.txt

sed -n will output nothing unless the printing command is found (p). In this case sed will find those lines that contain "yes" and substitute them with the same string (&), i.e. there will be no substitute but it will print this line that is found in the output (p).

Output will be saved in file2.txt.

Nice tutorial on sed command:

How to substitute a character or string using sed command in Unix

If you need to substitute a character or string/s in a file in Unix use sed command in Terminal.

sed 's/string/string2/' file.txt >file2.txt

This command will set sed in a substitute mode (s), and it will substitute string with string2 in every line in file.txt and save the output in file2.txt

If string is present two or more times in the line it will be substituted only in the firsts occurense and then the sed command will move to the next line. So in this case write g to specify to sed to do the substitution globally.

sed 's/string/string2/g' file.txt >file2.txt

To substitute a character or string with a new line separste the command in two lines:

sed 's/,/\                                                                
n/g' file.txt > ile2.txt

This command will substitute every "," with a new line.

If you want to give multiple substitute commands in one go use -e

sed -e 's/string1/string2/g' -e 's/string3/string4/g' file.txt >file2.txt

Wednesday, November 14, 2012

Comparing two lists of genes

If you have two gene lists, e.g. differentially regulated genes from two experiments and you want to compare these lists and find genes that are common for the two lists (i.e. that are differentially regulated in both conditions) use this web tool

Simply paste the two or three lists.

How to select lines in a file that do not contain specific string/s in Unix

In Unix you can use command grep to cut lines from files that contain specific string/s

If you want to use grep to cut the lines that do not contain specific string/s use grep -v

E.g. in file knockdown_without_ra_vs_knockdown_with_ra_genelist.diff I needed to clean the list of genes from the lines containing "-".


I used the following command: 
grep -v "-" knockdown_without_ra_vs_knockdown_with_ra_genelist.diff > knockdown_without_ra_vs_knockdown_with_ra_genelist_clean.diff
which removed lines with "-" character and returned only lines with the gene names (that were saved in the new file knockdown_without_ra_vs_knockdown_with_ra_genelist_clean.diff)

How to cut columns from files in Unix via Terminal

If you have a file that is too big to manipulate in Excel (>1,000,000 rows) and you need to cut column/s from that file use Unix command cut.

E.g. if you need to cut column 3 from the file and write it into a new file
cut -f 3 file.txt > newfile.txt

If you need to cut multiple columns separate them with comma:
cut -f 3,4 file.txt > newfile.txt

How to use grep to select specific lines from a file in Unix

Lets say you have a file with over 1,000,000 lines that you can not load to and manipulate with in Excel ( as this is the limit in Excel for the number of rows).
Use Terminal in Unix and powerful grep command to select lines with specific characters or strings of characters or even specific combination of strings.

E.g. the file gene_exp.diff contains 2289014 lines. Each line contains multiple strings:
XLOC_000001    XLOC_000001    Lypla1    chr1:4797973-4836816    Random_without_RA    Knockdown_without_RA    OK    125.173    61.8913    -1.01611    6.08719    1.1491e-09    1.67416e-08    yes

To select lines containing only specific string use grep:
grep "Knockdown_without_RA" gene_exp.diff
This will select lines that contain "Knockdown_without_RA" from the file gene_exp.diff

To select lines containing specific combination of strings:
grep "Knockdown_without_RA.*Knockdown_with_RA.*OK.*yes" gene_exp.diff
This will select lines that contain strings "Knockdown_without_RA", "Knockdown_with_RA", "OK" and "yes" no matter what characters/strings are in between them (the sign .* corresponds to this).

To write the output to the less (rather then to the screen that may list the endless number of lines).
grep "Knockdown_without_RA.*Knockdown_with_RA.*OK.*yes" gene_exp.diff | less

To write the output to a file use > filename
grep "Knockdown_without_RA.*Knockdown_with_RA.*OK" gene_exp.diff > knockdown_witout_ra_vs_knockdown_with_ra.diff

Monday, November 12, 2012

How to extract DNA sequence from the genome of interest via UCSC Genome Browser

To extract a sequence of interest if you have the coordinates:
Go to
Select Genome and Assembly that corespond to your coordinates
Paste coordinates into Search term

When the location opens in the Genome Browser
Click View / DNA

Alternativelly if you don't have the coordinates you can simply select the region of interest in the UCSC Genome Browser and then:
Click View / DNA