생물정보학/Bioinformatics

blast tabular format 추가 + Query coverage

케이든 2016. 3. 12. 17:31

-outfmt "6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore qlen slen"


1. qseqid          Query Seq-id

2. sseqid          Subject Seq-id

3. pident           Percentage of identical matches

4. length           Alignment length

5. mismatch      Number of mismatches

6. gapopen       Number of gap openings

7. qstart           Start of alignment in query

8. qend            End of alignment in query

9. sstart           Start of alignment in subject

10. send           End of alignment in subject

11. evalue         Expect value

12. bitscore       Bit score

*13. qlen          Query sequence length

*14. slen          Subject sequence length


Query coverage

Use the following awk command on the blast tabular output:


awk '{if ($4/$13 > 0.75 && $4/$14 > 0.75 && $3>55 && $11<0.000000000000001) print $0}' blast_out.tab


$5/$13 > 0.75 = Alignment length should be > than 75% of query length;

$5/$14 > 0.75 = Alignment length should be > than 75% of Subject length;

$3>55 = Percent identity should be > than 55%;

$11<0.000000000000001 = e-value less than e-15.


https://www.biostars.org/p/57602/



Subject에 대한 Query coverage


awk '{if ((($10-($9-1))/$14)>0.7) print $0"\t"($10-($9-1))/$14}' blast_out.tab