Thursday, March 31, 2016

Use ls to grab multiple paired file names, assign them to variables using nested for loops - example of fastq paired files

Lets say we have a set of files with similar names, for example, a set of paired fastq files that are marked with a barcode.

Lets assume we want to use these files in pairs in a for loop and assigned their full path to the variables Reads1 and Reads2 and execute some code for each pair (in the example bellow just echo $Reads1 and echo $Reads2).

Make an array variable with barcodes that will be used to identify files. Create a for loop that will span the range of the array variable. Make another for loop that goes from 1 to 2 to grab each file of the pair.

Within the second loop - use bash ls command to get the full name of a file,  Greb the output of ls in a variable (tmp). Use export command to assign tmp to Reads1 and subsequently to Reads2.  Use Reads 1 and Reads2 (here it was just echo or e.g. map the fastqs to the genome after the loop is done).

The advantage here is that you don't have to write full file names, you can just find and greb them with their respective barcodes.


a=(TTAGGC ACTTGA ACAGTG AGTCAA ATCACG AGTTCC CAGATC ATGTCA CCGTCC CGATGT CTTGTA GATCAG GGCTAC GTCCGC GTGAAA TAGCTT TGACCA);

for i in $(seq 0 16);
do
for j in $(seq 1 2);
do

tmp="$(ls /home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/*${j}_${a[$i]}*)"
export "Reads${j}=$tmp"

done
echo $Reads1
echo $Reads2
echo done
done
Output:

mpjanic@valkyr:/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data$ source tmp2
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1401.1_26437_merged_1_TTAGGC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1401.1_26437_merged_2_TTAGGC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2315_26438_merged_1_ACTTGA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2315_26438_merged_2_ACTTGA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59386143_26427_merged_1_ACAGTG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59386143_26427_merged_2_ACAGTG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2115_26430_merged_1_AGTCAA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2115_26430_merged_2_AGTCAA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/9071501.8_26436_merged_1_ATCACG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/9071501.8_26436_merged_2_ATCACG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1483_26431_merged_1_AGTTCC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1483_26431_merged_2_AGTTCC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1060602_26428_merged_1_CAGATC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1060602_26428_merged_2_CAGATC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1795_26432_merged_1_ATGTCA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1795_26432_merged_2_ATGTCA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2108_26433_merged_1_CCGTCC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/2108_26433_merged_2_CCGTCC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59885590_26425_merged_1_CGATGT.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/59885590_26425_merged_2_CGATGT.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/200212_26429_merged_1_CTTGTA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/200212_26429_merged_2_CTTGTA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/102901.8_26439_merged_1_GATCAG.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/102901.8_26439_merged_2_GATCAG.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/24635_26441_merged_1_GGCTAC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/24635_26441_merged_2_GGCTAC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/3101801.2_26434_merged_1_GTCCGC.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/3101801.2_26434_merged_2_GTCCGC.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1522_26435_merged_1_GTGAAA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1522_26435_merged_2_GTGAAA.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1587_26440_merged_1_TAGCTT.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/1587_26440_merged_2_TAGCTT.fastq.gz
done
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/8072501_26426_merged_1_TGACCA.fastq.gz
/home/diskstation/RNAseq/HCASMC/Merged_Raw_Data/8072501_26426_merged_2_TGACCA.fastq.gz

Removing folders with corrupt files

Lets say that you have a folder that you cannot remove, you issue a rm -r command:


mpjanic@valkyr:/home/diskstation/RNAseq/HCASMC/Raw_Data/Penn_2nd_round_RNAseq_GOOD_cvrg_8-11-15$ rm -r 2913_26442_GCCAAT/
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfde9: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfdf1: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfdf9: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe01: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe0d: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe19: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe21: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe2b: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe33: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe3b: No such file or directory
rm: cannot remove 2913_26442_GCCAAT/GenomeForPass2/cifsfe43: No such file or directory
Apparently you have some files in that folder, but when rm command tries to remove them the output is - No such file...

Similarly, it is not possible to remove files separately nor if you find inodes for each element in the folder with ls -i and try to delete them using find . -inum NUMBER -delete


ls -i
24422 cifsfde9  24423 cifsfdf1  24424 cifsfdf9  24425 cifsfe01  24426 cifsfe0d  24427 cifsfe19  24428 cifsfe21  24429 cifsfe2b  24430 cifsfe33  24431 cifsfe3b  24432 cifsfe43
find . -inum 24422 -delete
find: cannot delete `./cifsfde9': No such file or directory
One could possibly use fsck to check the filesystem or run sudo debugfs to delete files using their inodes (look up for detailed info on this as you can severely damage your filesystem).

However, a simple solution worked for me, rename a folder with mv and delete it.

mv 2913_26442_GCCAAT/ 2913_26442_GCCAAT2
rm -r 2913_26442_GCCAAT2/
ls -l 2913_26442_GCCAAT*
ls: cannot access 2913_26442_GCCAAT*: No such file or directory

Wednesday, March 30, 2016

Finding SAM files in subfolders using find and regex, and convert to BAM format, sort, index within subfolders

Continuing on the previous post, if you want to find all SAM files in subdirectories and convert them to BAM, sort them and index within the subdirectories, use the following code.
Unix find command with -regex option will find all SAM files, make sure to add .* or .*\/ as the find command will find files using their full path.

List SAM files:

root@valkyr:~/tmp_rnaseq# sudo find ./  -regex '^.*\/.*sam$' -exec ls -l {} \;
-rw-r--r-- 1 root root 39437555424 Mar 29 03:07 ./9071501.8_26436_ATCACG/Pass2/Aligned.out.sam
-rw-r--r-- 1 root root 53567653787 Jan 15 20:01 ./59885590_26425_CGATGT/Pass2/Aligned.out.sam
-rw-r--r-- 1 root root 49754863117 Jan 15 21:01 ./8072501_26426_TGACCA/Pass2/Aligned.out.sam

Convert to BAM:

sudo find ./ -regex '^.*\/.*sam$' -execdir sh -c 'samtools view -bS {} > Aligned.out.bam' \;
Sort BAM files:

sudo find ./ -regex '^.*\/.*bam$' -execdir sh -c 'samtools sort {} {}.sort' \;
List sorted BAM files:

sudo find ./ -regex '^.*\/.*sort.bam$' -execdir sh -c 'ls -l {}' \;
-rw-r--r-- 1 root root 4516449286 Mar 30 13:05 ./Aligned.out.bam.sort.bam
-rw-r--r-- 1 root root 5993809569 Mar 30 13:44 ./Aligned.out.bam.sort.bam
-rw-r--r-- 1 root root 5404073437 Mar 30 14:18 ./Aligned.out.bam.sort.bam

Index BAM files:

sudo find ./ -regex '^.*\/.*sort.bam$' -execdir sh -c 'samtools index {}' \;

Tuesday, March 29, 2016

Find SAM files in subfolders and convert them to BAM format by executing samtools within the subfolders

Lets say you have SAM files in subfolders of a current folder and you would like to convert them into BAM format using samtools. To do this you can copy them all to another folder using find command and then execute samtools in a loop from that folder. However, you can find each SAM file within its subfolder and execute samtools within the subfolder by specifying -execdir option in the find command.

If you dont specify -execdir but only -exec, the output BAM file will be written in the parent folder, and probably will be overwritten in each cycle.

Find all SAM files in subforlders:

root@valkyr:~/tmp_rnaseq# sudo find ./  -regex '^.*\/.*sam$' -exec ls -l {} \;
-rw-r--r-- 1 root root 39416554017 Mar 29 02:23 ./9071501.8_26436_ATCACG/Pass1/Aligned.out.sam
-rw-r--r-- 1 root root 53346566430 Jan 15 19:11 ./59885590_26425_CGATGT/Pass1/Aligned.out.sam
-rw-r--r-- 1 root root 49704309954 Jan 15 20:13 ./8072501_26426_TGACCA/Pass1/Aligned.out.sam
Execute samtools within subfolders:

sudo find ./ -regex '^.*\/.*sam$' -execdir sh -c 'samtools view -bS {} > Aligned.out.bam' \;
You could achieve the same by creating a for loop that will go into each subdirectory and execute samtools, however creating such loop may be challenging if the names of the subfolders are complicated. The find oneliner is quicker and easier.


Friday, March 25, 2016

Scatter plot in R ggplot2 using wesanderson palette

To plot a scatter plot in R using ggplot2 package use geom_point and to add some visually appealing color palettes. Install and load a wesanderson package and add it to your ggplot code with scale_color_manual.


> library (ggplot2)
> library (wesanderson)

> x<- read.delim("shown-GOBiologicalProcess.short.tsv", header=F)

> x
                                                         V1       V2        V3
1                                              ossification 8.868518  2.500906
2                             skeletal system morphogenesis 8.550280  2.381594
3                                      response to nutrient 8.495432  2.987107
4  regulation of protein import into nucleus, translocation 8.426276  9.436530
5                                        tyrosine transport 7.810678 21.485540
6                                          response to heat 7.566548  3.999154
7                                cell junction organization 7.435220  2.296243
8                   embryonic skeletal system morphogenesis 7.358300  2.930280
9                                    osteoblast development 7.150353  5.649584
10                                      response to vitamin 6.760223  3.561107
11                          cell-cell junction organization 6.756272  2.283825
12                 calcium-independent cell-matrix adhesion 6.739017 14.800580
13                         T cell differentiation in thymus 6.604179  3.837571
14                             cellular response to vitamin 6.595923  8.535726
15   dichotomous subdivision of an epithelial terminal unit 6.577308  6.239491
16                                      gland morphogenesis 6.556537  2.381006
17 establishment of protein localization to plasma membrane 6.492166  4.960975
18                   response to purine-containing compound 6.463773  2.947403
19                                    response to vitamin K 6.402319 34.013860
20                         eye pigment biosynthetic process 6.373886 13.015510


> p<- ggplot(x, aes(x=x$V3, y=x$V2, colour = V1)) + geom_point(shape=19, alpha=3/3, size=8) + xlab("Fold change") + ylab("-log P-value") + ggtitle ("GO Biological process enrichment in AHR-ARNT/TCF12 PWM overlaps")+xlim(0, 35)+scale_color_manual(name="GO Biological process",values = c(wes_palette("Cavalcanti"),wes_palette("GrandBudapest"), wes_palette("Royal1"), wes_palette("Royal2"), wes_palette("Moonrise2")))+ theme_gray()
> p
The resulting plot:




Wednesday, March 23, 2016

Remove multiple consecutive characters in a file - example of removing multiple space separators

This may come handy to all of those who like to paste tab separated files into nano, and then tabs get converted into series of spaces. To convert back to a single space delimiter use tr with a squeeze -s option:

mpjanic@valkyr:~/REBUTTAL_F1000RESEARCH$ cat tmp
rs1951351       14      91832947        G       A       7.1E-02 1.04    1.00    1.08    9580    53810
rs4900109       14      91833144        T       G       1.6E-01 1.03    0.99    1.07    9580    53810
rs4904864       14      91834272        G       A       6.6E-02 1.04    1.00    1.08    9580    53810
rs1885193       14      91835107        G       A       3.3E-01 1.02    0.98    1.05    12171   56862
rs12892379      14      91836910        T       C       6.6E-02 1.04    1.00    1.08    9580    53810
rs1957283       14      91837433        A       G       5.9E-01 1.02    0.95    1.09    9580    53810
rs4904866       14      91838256        T       C       1.5E-01 1.03    0.99    1.07    9580    53810
rs7145248       14      91839145        T       C       5.5E-01 1.02    0.96    1.09    9580    53810
rs12434570      14      91840166        T       C       5.6E-01 1.02    0.96    1.09    9580    53810
rs8021744       14      91843048        T       C       5.6E-01 1.02    0.96    1.09    9580    53810
rs12896399      14      91843416        T       G       1.3E-01 1.03    0.99    1.07    9580    53810
rs746588        14      91845133        T       C       5.7E-01 1.02    0.95    1.09    9580    53810
rs11622569      14      91845155        A       T       5.7E-02 1.04    1.00    1.08    9580    53810
rs746586        14      91845720        T       C       1.2E-01 1.03    0.99    1.07    9580    53810
rs1075830       14      91845915        C       A       5.1E-01 1.01    0.98    1.05    12171   56862
rs8018017       14      91846171        C       G       7.2E-01 1.01    0.95    1.07    11223   52455
rs941799        14      91846578        T       C       1.2E-01 1.03    0.99    1.07    9580    53810
rs1885194       14      91847215        C       T       1.2E-01 1.03    0.99    1.07    9580    53810
rs10484035      14      91848310        C       T       2.6E-01 1.02    0.99    1.06    12171   56862
rs17128162      14      91850110        A       G       3.0E-01 1.06    0.95    1.19    3788    20758
rs17184180      14      91850140        A       T       1.2E-01 1.03    0.99    1.07    9580    53810
mpjanic@valkyr:~/REBUTTAL_F1000RESEARCH$ tr -s " " <tmp >tmp2
mpjanic@valkyr:~/REBUTTAL_F1000RESEARCH$ cat tmp2
rs1951351 14 91832947 G A 7.1E-02 1.04 1.00 1.08 9580 53810
rs4900109 14 91833144 T G 1.6E-01 1.03 0.99 1.07 9580 53810
rs4904864 14 91834272 G A 6.6E-02 1.04 1.00 1.08 9580 53810
rs1885193 14 91835107 G A 3.3E-01 1.02 0.98 1.05 12171 56862
rs12892379 14 91836910 T C 6.6E-02 1.04 1.00 1.08 9580 53810
rs1957283 14 91837433 A G 5.9E-01 1.02 0.95 1.09 9580 53810
rs4904866 14 91838256 T C 1.5E-01 1.03 0.99 1.07 9580 53810
rs7145248 14 91839145 T C 5.5E-01 1.02 0.96 1.09 9580 53810
rs12434570 14 91840166 T C 5.6E-01 1.02 0.96 1.09 9580 53810
rs8021744 14 91843048 T C 5.6E-01 1.02 0.96 1.09 9580 53810
rs12896399 14 91843416 T G 1.3E-01 1.03 0.99 1.07 9580 53810
rs746588 14 91845133 T C 5.7E-01 1.02 0.95 1.09 9580 53810
rs11622569 14 91845155 A T 5.7E-02 1.04 1.00 1.08 9580 53810
rs746586 14 91845720 T C 1.2E-01 1.03 0.99 1.07 9580 53810
rs1075830 14 91845915 C A 5.1E-01 1.01 0.98 1.05 12171 56862
rs8018017 14 91846171 C G 7.2E-01 1.01 0.95 1.07 11223 52455
rs941799 14 91846578 T C 1.2E-01 1.03 0.99 1.07 9580 53810
rs1885194 14 91847215 C T 1.2E-01 1.03 0.99 1.07 9580 53810
rs10484035 14 91848310 C T 2.6E-01 1.02 0.99 1.06 12171 56862
rs17128162 14 91850110 A G 3.0E-01 1.06 0.95 1.19 3788 20758
rs17184180 14 91850140 A T 1.2E-01 1.03 0.99 1.07 9580 53810

Selecting lines before and after search pattern using grep

To select lines before and after a pattern you want to match use grep and options -A and -B.


mpjanic@valkyr:~/REBUTTAL_F1000RESEARCH$ grep rs12896399 DIAGRAMv3.2012DEC17.txt
rs12896399      14      91843416        T       G       1.3E-01 1.03    0.99    1.07    9580    53810
mpjanic@valkyr:~/REBUTTAL_F1000RESEARCH$ grep rs12896399 -A 10 -B 10 DIAGRAMv3.2012DEC17.txt
rs1951351       14      91832947        G       A       7.1E-02 1.04    1.00    1.08    9580    53810
rs4900109       14      91833144        T       G       1.6E-01 1.03    0.99    1.07    9580    53810
rs4904864       14      91834272        G       A       6.6E-02 1.04    1.00    1.08    9580    53810
rs1885193       14      91835107        G       A       3.3E-01 1.02    0.98    1.05    12171   56862
rs12892379      14      91836910        T       C       6.6E-02 1.04    1.00    1.08    9580    53810
rs1957283       14      91837433        A       G       5.9E-01 1.02    0.95    1.09    9580    53810
rs4904866       14      91838256        T       C       1.5E-01 1.03    0.99    1.07    9580    53810
rs7145248       14      91839145        T       C       5.5E-01 1.02    0.96    1.09    9580    53810
rs12434570      14      91840166        T       C       5.6E-01 1.02    0.96    1.09    9580    53810
rs8021744       14      91843048        T       C       5.6E-01 1.02    0.96    1.09    9580    53810
rs12896399      14      91843416        T       G       1.3E-01 1.03    0.99    1.07    9580    53810
rs746588        14      91845133        T       C       5.7E-01 1.02    0.95    1.09    9580    53810
rs11622569      14      91845155        A       T       5.7E-02 1.04    1.00    1.08    9580    53810
rs746586        14      91845720        T       C       1.2E-01 1.03    0.99    1.07    9580    53810
rs1075830       14      91845915        C       A       5.1E-01 1.01    0.98    1.05    12171   56862
rs8018017       14      91846171        C       G       7.2E-01 1.01    0.95    1.07    11223   52455
rs941799        14      91846578        T       C       1.2E-01 1.03    0.99    1.07    9580    53810
rs1885194       14      91847215        C       T       1.2E-01 1.03    0.99    1.07    9580    53810
rs10484035      14      91848310        C       T       2.6E-01 1.02    0.99    1.06    12171   56862
rs17128162      14      91850110        A       G       3.0E-01 1.06    0.95    1.19    3788    20758
rs17184180      14      91850140        A       T       1.2E-01 1.03    0.99    1.07    9580    53810

Saturday, March 12, 2016

Overcome problems with sorting on x-axes and legend in ggplot2

This is one example of R ggplot2 code that will load your data matrix with three columns and make a bar chart with a legend. Data are from GO analysis of ChIP-Seq peaks intersections for two experiments and containg GO term ID, GO name, and p-value.
Now you want to plot -log p-value in a decreasing order, and ggplot2 will not order x-axes by the p-value, but rather in alphabetical order. This is why you would have to specify this your self and it is relatively easy in ggplot2 with setting limits=factor(x$V2) in scale_x_discrete - where x$V2 is the vector with categories in order; if data frame is not sorted by the p-value this vector would not make sense so first sort data frame with:
x<-x[order (x$value, decreasing=TRUE),]
and breaks = x$V2 in scale_fill_discrete for the legend .

> library("ggplot2")
> library("reshape2")

> x<-read.delim("tmp.csv", sep=",", header=F)
> x
           V1                    V2       V3
1  GO:0001523   metabolic process 1 0.000129
2  GO:0016101   metabolic process 2 0.000287
3  GO:0010035 inorganic substance 1 0.000293
4  GO:0042981   metabolic process 3 0.000433
5  GO:0043067          cell procces 0.000504
6  GO:0032270   metabolic process 5 0.000520
7  GO:0007204         concentration 0.000548
8  GO:0006721   metabolic process 6 0.000550
9  GO:0051716              stimulus 0.000554
10 GO:0031401  modification process 0.000597
11 GO:0071248             metal ion 0.000673
12 GO:0034754   metabolic process 7 0.000947
13 GO:0006720   metabolic process 8 0.000947
14 GO:0051480     ion concentration 0.000963
> x$V3=-log(x$V3)
> x<-melt(x)
Using V1, V2 as id variables
> x
           V1                    V2 variable    value
1  GO:0001523   metabolic process 1       V3 8.955698
2  GO:0016101   metabolic process 2       V3 8.156028
3  GO:0010035 inorganic substance 1       V3 8.135338
4  GO:0042981   metabolic process 3       V3 7.744773
5  GO:0043067          cell procces       V3 7.592934
6  GO:0032270   metabolic process 5       V3 7.561682
7  GO:0007204         concentration       V3 7.509235
8  GO:0006721   metabolic process 6       V3 7.505592
9  GO:0051716              stimulus       V3 7.498346
10 GO:0031401  modification process       V3 7.423593
11 GO:0071248             metal ion       V3 7.303765
12 GO:0034754   metabolic process 7       V3 6.962211
13 GO:0006720   metabolic process 8       V3 6.962211
14 GO:0051480     ion concentration       V3 6.945457
>  ggplot(x, aes(V2,value, fill = V2)) + geom_bar(stat="identity")+ scale_x_discrete(labels=x$V1, limits=factor(x$V2))+ scale_fill_discrete(breaks = x$V2, name="GO term")+theme(legend.text = element_text(colour="black", size = 11, face = "bold"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+ggtitle("ChIP-Seq sites intersection")+labs(x="GO term ID",y="-log p-value")
The resulting figure looks like this:
























If you did not correct this the graph would not be sorted.

>  ggplot(x, aes(V2,value, fill = V2)) + geom_bar(stat="identity")+ scale_x_discrete(labels=x$V1)+ scale_fill_discrete(name="GO term")+theme(legend.text = element_text(colour="black", size = 11, face = "bold"))+ theme(axis.text.x = element_text(angle = 90, hjust = 1))+ggtitle("ChIP-Seq sites intersection")+labs(x="GO term ID",y="-log p-value")