Tuesday, September 30, 2014

Delete all files except files that match expression

If you need to delete all files that do not match a certain expression, e.g. all the files that are not ending with gz, use find command

First test if the code is picking up the files you really want to delete, (since removal is permanent it is good to be sure what you're removing):

find . \! -name '*gz' 

Then add -delete

find . \! -name '*gz' -delete

Similarly, to delete files that do match expression, remove \!

Check the selection:

find . -name '*gz'


find . -name '*gz' -delete

How to download multiple files from URL using wget

If you want to download multiple files from a URL using wget :

wget -e robots=off -r -l1 --no-parent -A.gz http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/latest/

This will download all gz archives from http://hapmap.ncbi.nlm.nih.gov/downloads/ld_data/latest/

Thursday, September 18, 2014

Delete all rows with column N contains specific numerical entries

To delete all rows from a file that contain in column N numerical entry greater than certain value use awk:

awk '$12>=0.5' file > filenew

This command will remove all rows that have numerical entry in column 12 less than 0.5 

Friday, September 12, 2014

Substitute spaces with tabs using awk

To substitute spaces with tabs in your file use awk:

awk '$1=$1' FS=" " OFS="\t" file > file_tab

Remove first line in file using tail

If you need to remove first line of your file use tail command:

tail -n+2 RefSeq_genes_order.bed > RefSeq_genes_order_removefirstline.bed

tail -n+2 will plot all the lines starting with line 2

Similarly to remove first K lines:
tail -n+K

Change order of columns using awk in Unix

If you want to change order of the columns you cannot do this using cut command in Unix.
cut -f 2,4,5,3,1,6 RefSeq_genes.bed
will give you the same order of columns as original file.

Instead using awk command you can easily swap order of columns in your file.

awk '{print($2,"\t",$4,"\t",$5,"\t",$3,"\t",$1,"\t",$6)}' RefSeq_genes.bed > RefSeq_genes_order.bed