Documentation

Citation

For all resources provided on the Intensification website, including the motifs, amino acid frequencies and supplementary materials, please cite:

Chen J, Wang B, Regan L, Gerstein M. Intensification: A resource for amplifying population-genetic signals with protein repeats (in submission)

Querying the Database

Please follow the instructions on the 'Query' page to get SNV information regarding motifs of the repeat protein domains (RPDs) from our database. Users can now input (1) a genomic region, or a SNV position (1-based), (2) choose one of our 12 SMART database RPDs, or (3) input a PDB ID, to find SNVs within a RPD-containing protein. For more information on the SMART motifs, please visit the 'Download' page.

Resource

For each repeat protein motif, the following are available in the zip file under the 'Download' section:

Amino acid frequencies

(1) .aamat (2) .rfreq (3) .global (4) .re (5) .pdf

SNV profiles, with ExAC SNV, SMART domain and VEP annotation

(6) .sift.bed (7) .sift.enrich (8) .ddaf.txt

Data sources

Genomic Variation:

1000 Genomes Project (Abecasis G. et al., Nature 2012); PMID: 23128226

ESP6500, Exome Sequencing Project (Tennessen et al., Science 2012); PMID: 22604720

ExAC (Exome Aggregation Consortium et al., bioRxiv, 2016); DOI: http://dx.doi.org/10.1101/030338


Protein Motifs:

SMART database (Letunic I. et al., Nucleic Acids Res, 2014); PMID: 25300481

Ensembl (Yates A., et al., Nucleic Acids Res, 2016); PMID: 26687719

Scripts

The Intensification pipeline and resource use the following scripts, we call the 'MotifVar' suite (short for 'Motif Variation'), for integrating heterogeneous datasets and generating motif-MSAs and SNV profiles, using various public data source, including SMART domain database, ExAC, VEP, and Ensembl:

Questions/Comments

Please contact J. Chen (jieming dot chen at yale dot edu) or B. Wang (wang dot bo at yale dot edu) for questions, comments or feedback on the Intensification resource and website.