Topics: Life science. Biology. Biotechnology and Biomedical research. Bioinformatics and Computational Biology. Programming for Biologists. Lab protocols and methods. Paper reviews. Data science. Programming in R, C, C++, Perl, Python, Excel, basic and advanced Unix, shell scripting, awk scripting, vim editing, regular expressions. Custom script development.
Solution to messy gene tables - compare two file contents using grep
If you have a gene list and want to compare it to another gene list you can use awk and create a hash table reading first file that you will use while reading second file for comparison with a certain column of the second file, as discussed previously:
However, if you have a second file with a messy structure you would want to scan a complete file without focussing on a individual column of the second file. For, example here, I have a gene list that I want to compare with the list I created from BioGRID, that contains gene names, alternative gene names and other info. Clearly the first awk code would not work in this case:
Instead, easy solution is to use grep with -Fwf to comprehensively search the second file and find lines that contain gene name from the first file. In this case all lines from file 2 that contain gene name from file 1 will be grepped.