Thursday, May 7, 2015

Select columns that are delimited with irregular number of multiple spaces

To select columns in a file that are delimited with irregular number of consecutive spaces you may think of using cut command but this will not work.

In a file that is delimited with 7, 6, 5, 4 etc. space characters following commands will not select needed columns

cut -d ' ' -f 2,3,4,5 snp.142.txt| head

585     chr1    10019   10020   rs376643643     0       +       A       A       -/A     genomic deletion        unknown 0       0       near-gene-5     exact   1               1       SSMP,   0
585     chr1    10056   10056   rs373328635     0       +       -       -       -/A     genomic insertion       unknown 0       0       near-gene-5     between 1               1       SSMP,   0
585     chr1    10107   10108   rs62651026      0       +       C       C       C/T     genomic single  unknown 0       0       near-gene-5     exact   1               1       BCMHGSC_JDW,    0
585     chr1    10108   10109   rs376007522     0       +       A       A       A/T     genomic single  unknown 0       0       near-gene-5     exact   1               1       BILGI_BIOE,     0
585     chr1    10138   10139   rs368469931     0       +       A       A       A/T     genomic single  unknown 0       0       near-gene-5     exact   1               1       BILGI_BIOE,     0
585     chr1    10144   10145   rs144773400     0       +       A       A       -/A     genomic deletion        unknown 0       0       near-gene-5     exact   1               1       BL,     0



Also specifying multiple spaces is not possible:

cut -d '     ' -f 2,3,4,5 snp142.txt |head
 

cut: the delimiter must be a single character


Instead use awk command


awk '{print $2,$3,$4,$5}' snp142.txt | head

chr1 10019 10020 rs376643643
chr1 10056 10056 rs373328635
chr1 10107 10108 rs62651026
chr1 10108 10109 rs376007522
chr1 10138 10139 rs368469931
chr1 10144 10145 rs144773400

No comments:

Post a Comment