Friday, August 19, 2011

Widespread sequence differences between exome and transcriptome in human

Review of the paper:

Science. 2011 Jul 1;333(6038):53-8. Epub 2011 May 19.
Widespread RNA and DNA sequence differences in the human transcriptome.
Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG.

http://www.ncbi.nlm.nih.gov/pubmed/21596952

In this paper authors show widespread differences in the sequences derived from DNA and RNA levels in human (DNA and RNA were extracted from B-cells in 27 human individuals). The differences between exome and transcriptome are quite significat and might point out that unknown RNA editing mechanism exist in human cells, other than already known editing mechanisms which mediate A->G and C->U changes.

In total they discovered 28,766 editing events residing in exons of 4,741 known genes. They discovered all 12 possible categories of changes of the bases.

However it is important to point out than their criteria for defining an editing event is rather weak, at least 10% of the reads from RNA-Seq needs to carry the base that is different from the DNA sequence. They were considering all sites that are covered with at least 10 RNA-Seq reads, which theoretically leaves a possibility they were analyzing sites with only one RNA-Seq read that contain a mismatched base (10% of 10 reads). This indicates they might have been analyzing a significant number of sequencing errors, actually.

The examples of similar editing events present in multiple individuals or when other tissue were analyzed, in combination with identification of their editing events in the human proteome, strengthens their claim made in this paper and indicates that many of these events might be true. 

Although total number of editing events is 28,766, as not every event is present in all individuals, each individual possessed smaller number of editing events - ranging from 282 to 1,863. Editing events were more common in 3ÚTR exons then in other exons. The levels of mRNA that actually contain the changed base were presented by the distribution of the percentage of RNA-Seq reads with the changed base and this gives bimodal distribution with the peaks close to 0 and 100%. As many of such edited mRNA were minor variants it is not surprising their corresponding peptides were not found massively by mass-spec analysis in this study.

Thursday, August 18, 2011

Transcriptome assembly from RNA-Seq data using annotated reference transcripts

Review of the paper:

Identification of novel transcripts in annotated genomes using RNA-Seq. Bioinformatics. 2011 Jun 21. Roberts A, Pimentel H, Trapnell C, Pachter L.
Department of Computer Science, UC Berkeley, Berkeley, CA.

http://www.ncbi.nlm.nih.gov/pubmed/21697122

Assembly of a transciptome using only a reference genome (for mapping of the sequenced reads) but without prior reference transcript annotation suffers from several problems.

First, lowly expressed genes have low sequence coverage and thus the assembly of transcripts that originate from such genes is quite difficult and error prone.

Second, the exact positions of the 5' and 3' ends of transcripts is sometimes difficult to establish, possibly due to the lack of sufficient sequence coverage at the ends (especially 5' ends are covered with less sequences if a poly-A selection has been included in the sequencing protocol).

The picture shows an example of a miscellaneous 5' gene end detection in several human RNA-Seq samples for highly expressed GAPDH gene. The correct position of the 5' end could not be established and each transcript would be called as a unique transcript if data were to be pooled from these RNA-Seq experiments.


Third, assemblers may output several transcripts (transfrags) actually originating from from a single transcript simply due to the lack of connecting reads that will asemble transfrags into a single transcript.

In this paper, authors suggest using a reference annotation of transcripts to correct for these issues. However, the use of a reference is different than simply taking already annotated transcripts and calculating their sequence coverage. This method only uses a reference to correct for the known problems. In other words, novel RNA transcripts will still be detected.
miscellaneous 5' gene end detection in several human RNA-Seq samples. The correct position of the 5' end could not be established and each transcript would be called as a unique transcript if data were to be pooled from these RNA-Seq experiments.

Third, assemblers may output several transcripts (transfrags) actually originating from from a single transcript simply due to the lack of connecting reads that will assemble transfrags into a single transcript.

In this paper, authors suggest using a reference annotation of transcripts to correct for these issues. However, the use of a reference is different than simply taking already annotated transcripts and calculating their sequence coverage. This method only uses a reference to correct for the known problems. In other words, novel RNA transcripts will still be detected in addition to already annotated ones.

The correction seems to work quite nice for the example given in the paper. Four incomplete transfrags originating from two actual transcripts were called using Cufflinks assembler. Authors' RABT assembler produced two complete transcripts produced from this locus.

All together, authors' RABT assembly when applied to human brain tissue found 70,241 transfrags (transcripts) and 36,494 gene loci. 36,494 transfrags and 15,504 genes were novel. On average each gene had 1.92 transfrags (transcripts).

Monday, August 1, 2011

Open file with other than default program in Total Commander

In Total Commander,
If one wants to open a file in other than the default program, for example a text file in Notepad++ instead of Notepad:

Configuration/Button Bar
Add,
Browse for the exe file of the program (click on >>)
Parameters: %P%N

After this you will get an icon of the program in Button bar.
To open a file in that program navigate to the file in Total Commander, and simply click the program icon at the Button bar.