|
|
|
|
|
A Harvard Medical School Affiliate
Welcome to the Joslin Research Website
Joslin Investigator:
Igor Leykin, MD, PhD
Investigator Specifics:
Professional Details:
Publications
CV
Member of Section:
Vascular Cell Biology
Investigators
Adjunct Investigators
Fellows & Team Members
DERC Cores
Research Sections
Joslin Resources
Igor Leykin, MD, PhD
Investigator
Joslin Diabetes Center
Instructor in Medicine
Harvard Medical School
Director: Bioinformatics
Joslin Diabetes Center
10/1/2004 - 10/1/2009
Please visit
Bioinformatics Web Resources
site at Joslin Diabetes Center
I. Human and Mouse Upstream Sequence Databases. Regulatory Sequence Analysis Tool
Gene promoters are essential regulatory structures that regulate the initiation and level of transcription of a gene. Promoter is an integral part of the gene and often makes sense only in the context of its own gene, especially if important parts of the regulation are determined outside of the promoter. Computational prediction of eukariotic promoters solely from the nucleotide sequence is an attractive but difficult aspect of sequence analysis. Polymerase II promoters usually consist of multiple binding sites for transcription factors that must occur in a specific contex, apparently shared only by a small group of promoters (Werner T, 1999).
The majority of the known transcription factors recognize short DNA sequences of 5-15 bps in length with different degrees of internal variation. The individual binding of a transcription factor to a regulatory element is rarely sufficient to confer context-specific expression. Thus, the combination and orientation of the transcription factors is the crucial information, rather than the mere occurrence of several binding sites (Werner T, 2000). Cooperation between multiple factors interacting at multiple sites appears to be essential for gene regulation, but the biochemical rules governing these interactions remain largely unknown (Fickett JW, 1998).
We have constructed the databases of upstream sequences based on information from the genomes of the respective species and the NCBI database of expressed sequence tags (dbEST). Our starting point was the Ensembl ESTGene, which are assemblies of ESTs and full or partial mRNAs. These virtual transcripts and one kilobase of their upstream regions were obtained from Ensembl MartView. Each ESTGene sequence was then aligned against all known 5 sequences in dbEST, in order to decide if it contains the 5 end of the transcript. In this way we obtained databases of reliable promoters, called HumanUpstream and MouseUpstream.
Co-regulated genes may be regulated by common regulatory elements.
Promoters of such genes can be analyzed for the common organizational framework of the transcription factor binding sites using Morpheus Regulatory Sequence Analysis Tool:
For each promoter sequence list of TF binding sites indicating their position was created (using TRANSFAC weight matrices)
Then each list was computationally analyzed in order to get all combinations of two or three TF binding sites where distance between two neighboring factors is less than 100bp
Finally program returns individual elements and their combinations are common to all or user-defined percentage of the analyzed promoters
II. Comparative Linkage Analysis and Visualization of High-Density Oligonucleotide SNP Array Data
The identification of disease-associated genes using single nucleotide polymorphisms (SNPs) has been reported. In particular, the Affymetrix Mapping10K microarray platform uses one PCR primer to determine the genotype of more than 11,000 SNPs in the human genome. However, the analysis of such datasets is nontrivial because of the number of markers and the potential size of the pedigrees, and integrating linkage scores with genome maps and visualization of the results remain less automated.
To analyze large pedigrees rapidly and to compare the linkage analysis results of different software packages, we developed a software tool called CompareLinkage to automate the following processes: (1) Converting of Affymetrix Mapping 10K genotype data, pedigree files and marker information into Linkage format, detect and fix incompatibilities in pedigree genotypes. The input genotype text file for CompareLinkage can be a single text file containing genotypes for each sample or a combined text file as exported by Affymetrix GDAS 3.0 software. (2) Automatically call the software packages Merlin and Allegro for linkage analysis and convert the analysis results (LOD or NPL (non-parametric linkage) scores) into input files for dChip to visualize the results in the context of genes and cytobands. (3) Genotype data in Linkage format can be converted into dChip input files (genotype, pedigree and marker information files) to perform parametric linkage analysis by dChipLinkage.
The new software has the ability to visualize the results for all these programs in dChip in the context of genome annotations and cytoband information.
In addition we implemented a variant of the Lander-Green algorithm in the dChipLinkage module of dChip software (V1.3) to perform parametric linkage analysis and haplotyping of SNP array data. These functions are integrated with the existing modules of dChip to visualize SNP genotype data together with LOD score curves. We have analyzed three families with recessive and dominant diseases using the new software programs and results of this analysis is recently submitted for publication.