생물정보학/Program

CViT: Chromosome Visualization Tool

케이든 2016. 9. 19. 14:53

CViT: Chromosome Visualization Tool


염색체 비쥬얼라이제이션 도구 CViT

CViT는 perl 기반 도구로 유전체 전체를 한번에 이미지화 할 수 있다. 이것은 GFF3 형태의 데이터를 기반으로 염색체를 만들고 그 위에 특징들을 보여 줄 수 있다.


373875.fig.001

콩(대두) 염색체의 중복된 지역을 보여준다.


373875.fig.002

동원체(centromere)와 유전자 밀도(gene density)를 나타낼 수 있다.


논문: https://www.hindawi.com/journals/ijpg/2011/373875/

다운로드: https://sourceforge.net/projects/cvit/


Example code execution:


Plot all genes:

> perl cvit.pl -c cvit_genes.ini -o gene_plot chrs.gff ZmB73_5a.59_genes.gff ZmCentromeresV2.gff


Plot as counts/histograms:

> sort -k 1,1 -k 4,4n ZmB73_5a.59_genes.gff > ZmB73_5a.59_genes.sorted.gff


> perl binCounter.pl 1000000 ZmB73_5a.59_genes.sorted.gff | awk 'BEGIN{OFS="\t"}{print $1,$2,"WGS",$4,$5,$6,$7,$8,$9}' > ZmB73_5a.59_genes.counts.gff


> perl cvit.pl -c cvit_genes.ini -o gene_density chrs.gff ZmCentromeresV2.gff ZmB73_5a.59_genes.counts.gff


파일 정보
Chromosome Flle: chrs.gff
1 ensembl chromosome 0 301354135 . . . Name=Chr1
2 ensembl chromosome 0 237068873 . . . Name=Chr2
3 ensembl chromosome 0 232140174 . . . Name=Chr3
4       ensembl chromosome 0 241473504 . . . Name=Chr4
5       ensembl chromosome 0 217872852 . . . Name=Chr5
6       ensembl chromosome 0 169174353 . . . Name=Chr6
7 ensembl chromosome 0 176764762 . . . Name=Chr7
8       ensembl chromosome 0 175793759 . . . Name=Chr8
9 ensembl chromosome 0 156750706 . . . Name=Chr9
10 ensembl      chromosome 0 150189435 . . . Name=Chr10
UNKNOWN ensembl chromosome 0 7140151 . . . Name=UNKNOWN
Pt ensembl chromosome 0 140384 . . . Name=Cp
Mt ensembl chromosome 0 569630 . . . Name=Mt


유전자 파일: ZmB73_5a.59_genes.gff

9       ensembl gene    156711985       156712748       .       -       .       ID=GRMZM5G877733;Name=GRMZM5G877733;biotype=protein_coding

9       ensembl gene    156716639       156716899       .       -       .       ID=GRMZM2G376150;Name=GRMZM2G376150;biotype=transposable_element

9       ensembl gene    156729561       156729662       .       -       .       ID=GRMZM2G545152;Name=GRMZM2G545152;biotype=pseudogene

9       ensembl gene    156729861       156730034       .       +       .       ID=GRMZM2G404827;Name=GRMZM2G404827;biotype=pseudogene

9       ensembl gene    156743193       156743307       .       -       .       ID=GRMZM2G545162;Name=GRMZM2G545162;biotype=pseudogene

1       ensembl gene    3       3807    .       +       .       ID=GRMZM2G060082;Name=GRMZM2G060082;biotype=transposable_element

1       ensembl gene    4854    9652    .       -       .       ID=GRMZM2G059865;Name=GRMZM2G059865;biotype=protein_coding

1       ensembl gene    9856    10388   .       +       .       ID=GRMZM2G059856;Name=GRMZM2G059856;biotype=protein_coding

1       ensembl gene    9882    10387   .       -       .       ID=GRMZM5G888250;Name=GRMZM5G888250;biotype=protein_coding

1       ensembl gene    11455   14988   .       -       .       ID=GRMZM2G059843;Name=GRMZM2G059843;biotype=transposable_element


카운트 파일:ZmB73_5a.59_genes.counts.gff

1       ensembl WGS     296000000       297000000       .       .       .       value=67;ID=16046

1       ensembl WGS     297000000       298000000       .       .       .       value=82;ID=16128

1       ensembl WGS     298000000       299000000       .       .       .       value=57;ID=16185

1       ensembl WGS     299000000       300000000       .       .       .       value=61;ID=16246

1       ensembl WGS     300000000       301000000       .       .       .       value=78;ID=16324

1       ensembl WGS     301000000       302000000       .       .       .       value=20;ID=16344

10      ensembl WGS     0       1000000 .       .       .       value=25;ID=16370

10      ensembl WGS     1000000 2000000 .       .       .       value=85;ID=16455

10      ensembl WGS     2000000 3000000 .       .       .       value=75;ID=16530


동원체 파일: ZmCentromeresV2.gf

#bp -- from Gernot Presting

1 Pchr centromere 132900000 134300000 . . . Name=

2 Pchr centromere 89500000 90900000 . . . Name=

3 Pchr centromere 94300000 95300000 . . . Name=

4 Pchr centromere 104200000 105000000 . . . Name=

5 Pchr centromere 101300000 108400000 . . . Name=

6 Pchr centromere 49800000 50400000 . . . Name=

7 Pchr centromere 55100000 55500000 . . . Name=

8 Pchr centromere 45900000 47100000 . . . Name=

9 Pchr centromere 68300000 69200000 . . . Name=

10 Pchr centromere 59300000 60500000 . . . Name=



Plot all genes:

> perl cvit.pl -c cvit_genes.ini -o gene_plot chrs.gff ZmB73_5a.59_genes.gff ZmCentromeresV2.gff


그림:



Plot as counts/histograms:

> sort -k 1,1 -k 4,4n ZmB73_5a.59_genes.gff > ZmB73_5a.59_genes.sorted.gff


> perl binCounter.pl 1000000 ZmB73_5a.59_genes.sorted.gff | awk 'BEGIN{OFS="\t"}{print $1,$2,"WGS",$4,$5,$6,$7,$8,$9}' > ZmB73_5a.59_genes.counts.gff


> perl cvit.pl -c cvit_genes.ini -o gene_density chrs.gff ZmCentromeresV2.gff ZmB73_5a.59_genes.counts.gff