
Full text loading...
Category: Microbial Genetics and Molecular Biology
Transcription Factor Binding Site Mapping Using ChIP-Seq, Page 1 of 2
< Previous page | Next page > /docserver/preview/fulltext/10.1128/9781555818845/9781555818838_Chap08-1.gif /docserver/preview/fulltext/10.1128/9781555818845/9781555818838_Chap08-2.gifAbstract:
This chapter describes a chromatin immunoprecipitation followed by sequencing (ChIP-Seq) method tailored for the study of Mycobacterium tuberculosis transcription factors (TFs) but amenable for the study of other prokaryotes. Noteworthy features of this method include the following: (i) it is conducted using a standard and readily reproducible growth condition that is the same for each TF studied; (ii) the binding behavior of each TF is studied using a tagged variant of the TF whose production is driven by an inducible promoter; (iii) each TF is studied using the same concentration of an exogenously added chemical inducer where the strength of TF gene induction is varied to systematically measure the effect of TF concentration on binding site strength and location; (iv) the resulting binding site data is spatially well resolved and highly reproducible, and binding strength is correlated with the degree of binding site motif conservation; and (v), because this method does not require knowledge of the physiological conditions that normally cause TF gene expression or of antibodies specific to each TF, it is applicable for the high-throughput study of multiple TFs; thus far it has generated binding site data for over 119 annotated M. tuberculosis TFs. Despite the use of a standard growth condition for all TFs and an inducible promoter system, the binding site data was found to agree well with data from a subset of TFs expressed under their own promoter by physiologically relevant conditions and was immunoprecipitated using specific antibodies to the native TF. The development and use of this method has been combined with the development of a data analysis pipeline; features of this pipeline are described below. Taken together, these binding data provide additional evidence that the long-accepted spatial relationship between TF binding site, promoter motif, and the corresponding regulated gene may be too simple a paradigm, failing to adequately capture the variety of TF binding sites found in prokaryotes.
Full text loading...
Diagram of the ChIP-Seq method. ChIP-Seq is performed on log-phase M. tuberculosis cells. The cross-linked cells are first lysed using BeadBeater. The cells are further lysed, and DNA is sheared using Covaris. Anti-FLAG antibody is used to immunoprecipitate the protein of interest, and the protein-DNA complexes are further captured using protein-G agarose beads. The cross-links are reversed using proteinase K, and the DNA is purified using a PCR purification kit. The standard Illumina protocol is used to prepare the library, which is then sent for next generation sequencing.
Overview of analysis pipeline. First, sequencing reads are aligned to the genome of the organism, and a profile of coverage is calculated by counting the number of reads that overlap a given position along the chromosome. This profile is used to fit a log normal background model of coverage. This model is used to identify regions of the genome that are statistically enriched relative to background. A cross-correlation filter is applied to identify regions consistent with the bimodal profile associated with ChIP-Seq binding, and comparison to control experiments is used to identify regions specific to the transcription factor of interest. Finally, a blind deconvolution approach combined with motif identification is used to identify binding sites at high resolution.
Blind deconvolution analysis of ChIP-Seq data. This schematic representation illustrates the steps of BRACIL ( 12 ) in the analysis of ChIP-Seq data. BRACIL uses a blind deconvolution algorithm that takes advantage of ChIP-Seq coverage and genome sequence to refine the resolution of ChIP-Seq binding regions into single-nucleotide resolution. The top panel illustrates the steps of the blind-deconvolution algorithm. The algorithm starts with a guess about the shape of the impulse response as well as the location of binding sites. Both the shape of the impulse response and the binding site locations are updated iteratively until convergence. The predicted binding site locations are used in motif discovery to predict a binding motif. This motif is used to constrain the search space for deconvolution and refine prediction of binding site locations. A set of high-resolution binding site locations is obtained as a final prediction.
Example ChIP-Seq results from M. tuberculosis. (Upper) The top panel displays the fold read coverage for a single binding region with two known binding sites for the TF KstR. ChIP-Seq coverage visually resolves both binding sites and confirms the experimental observation that the site closest to Rv3571 is a weaker affinity. Total coverage is shown in blue, and the forward and reverse coverage is shown in red and green, respectively. The binding event also displays the expected shift in position between the forward and reverse reads. The bottom panel displays the genome-wide fold coverage for the same experiment. Peaks above a coverage threshold are shown in blue. The peak shown in the top panel is marked with a star in the bottom panel. The horizontal gray lines are multiples of the standard deviation of background coverage. (Lower) Binding site identification is highly reproducible. Bar plots show the distance between corresponding sites in two replicates for two TFs. The blue line indicates the length of known motifs. Insets show relationship of peak height between corresponding peaks in two replicates (R2 > 0.83 for all TFs). Figures from reference 1 .
Binding sites replicate between normoxia and hypoxia. Each panel compares the results of ChIP-Seq under both normoxic (x axis, top traces) and hypoxic (y axis, bottom traces). Strong concordance was seen in both peak heights (scatter plots) and coverage profiles (sequencing traces) in experiments performed under both conditions. While no binding sites were identified in normoxia that were not identified in hypoxia, a small number of sites exhibited greatly increased binding under hypoxic conditions. The three sites showing the largest increases are shown in red on the scatter plots.
Binding sites replicate between normoxia and hypoxia. Each panel compares the results of ChIP-Seq under both normoxic (x axis, top traces) and hypoxic (y axis, bottom traces). Strong concordance was seen in both peak heights (scatter plots) and coverage profiles (sequencing traces) in experiments performed under both conditions. While no binding sites were identified in normoxia that were not identified in hypoxia, a small number of sites exhibited greatly increased binding under hypoxic conditions. The three sites showing the largest increases are shown in red on the scatter plots.
Independent replication of EspR binding sites. The ChIP-Seq experiment performed with the native EspR antibody ( 31 ) compares well to the ChIP-Seq with the inducible promoter. (A) Binding sites are categorized by their locations relative to target genes. Motifs and binding site categories detected by independent protocols are very similar. (B) Coverage tracks between two experiments are in concordance.
Binding site affinity corresponds to sequence and occupancy at different levels of expression. For transcription factor KstR, the heat map shows experimentally detected binding sites at the bottom (bound sites) and binding sites found by sequence similarity but not experimentally at the top (unbound strong motifs). Each row is a binding site, and each column is a particular position of the binding motif. The KstR binding motif describing all sites in the heat map is shown at the bottom of the figure (full motif). Binding site coverage is shown in four bar plots, corresponding to four induction levels as indicated by Atc concentration. As shown by the arrows, high-coverage binding sites correspond to a wide high-affinity motif, while low-coverage sites correspond to a degraded version of the same motif.
Diversity of binding site locations. Binding sites are assigned to five categories depending on their location relative to the target. The top figure shows a few binding sites located in divergent and covergent areas of the genome. The bottom figure shows the percentage of binding sites in each category for 49 TFs.
M. tuberculosis TF binding data are available at TBDatabase (TBDB). Binding data for 50 TFs generated by the NIAID-funded TB Systems Biology Project have been integrated with the genome sequence and annotation of M. tuberculosis and released at TBDB.org. Selected screen shots show online tools available for searching, browsing, and downloading these data.
Buffer 1 and buffer 1 + PI a
IPP150 buffer
“Elution from beads” buffer