<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art><ui>gb-2011-12-5-r47</ui><ji>1465-6906</ji><fm>
<dochead>Research</dochead>
<bibl>
<title>
<p>A high resolution map of a cyanobacterial transcriptome</p>
</title>
<aug>
<au id="A1"><snm>Vijayan</snm><fnm>Vikram</fnm><insr iid="I1"/><insr iid="I2"/><email>vvijayan@fas.harvard.edu</email></au>
<au id="A2"><snm>Jain</snm><mi>H</mi><fnm>Isha</fnm><insr iid="I2"/><email>ijain@fas.harvard.edu</email></au>
<au ca="yes" id="A3"><snm>O&apos;Shea</snm><mi>K</mi><fnm>Erin</fnm><insr iid="I1"/><insr iid="I2"/><email>erin_oshea@harvard.edu</email></au>
</aug>
<insg>
<ins id="I1"><p>Graduate Program in Systems Biology, Harvard University, 52 Oxford Street, Northwest 445.40, Cambridge, MA 02138, USA</p></ins>
<ins id="I2"><p>Howard Hughes Medical Institute, Harvard Faculty of Arts and Sciences Center for Systems Biology, Departments of Molecular and Cellular Biology and Chemistry and Chemical Biology, Harvard University, 52 Oxford Street, Northwest 445.40, Cambridge, MA 02138, USA</p></ins>
</insg>
<source>Genome Biology</source>
<issn>1465-6906</issn>
<pubdate>2011</pubdate>
<volume>12</volume>
<issue>5</issue>
<fpage>R47</fpage>
<url>http://genomebiology.com/2011/12/5/R47</url>
<xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2011-12-5-r47</pubid><pubid idtype="pmpid">21612627</pubid></pubidlist></xrefbib>
</bibl>
<history><rec><date><day>3</day><month>2</month><year>2011</year></date></rec><revrec><date><day>23</day><month>4</month><year>2011</year></date></revrec><acc><date><day>25</day><month>5</month><year>2011</year></date></acc><pub><date><day>25</day><month>5</month><year>2011</year></date></pub></history>
<cpyrt><year>2011</year><collab>Vijayan et al.; licensee BioMed Central Ltd.</collab><note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
<abs>
<sec>
<st>
<p>Abstract</p>
</st>
<sec>
<st>
<p>Background</p>
</st>
<p>Previous molecular and mechanistic studies have identified several principles of prokaryotic transcription, but less is known about the global transcriptional architecture of bacterial genomes. Here we perform a comprehensive study of a cyanobacterial transcriptome, that of <it>Synechococcus elongatus </it>PCC 7942, generated by combining three high-resolution data sets: RNA sequencing, tiling expression microarrays, and RNA polymerase chromatin immunoprecipitation sequencing.</p>
</sec>
<sec>
<st>
<p>Results</p>
</st>
<p>We report absolute transcript levels, operon identification, and high-resolution mapping of 5' and 3' ends of transcripts. We identify several interesting features at promoters, within transcripts and in terminators relating to transcription initiation, elongation, and termination. Furthermore, we identify many putative non-coding transcripts.</p>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>We provide a global analysis of a cyanobacterial transcriptome. Our results uncover insights that reinforce and extend the current views of bacterial transcription.</p>
</sec>
</sec>
</abs>
</fm><bdy>
<sec>
<st>
<p>Background</p>
</st>
<p>Over the past few decades considerable progress has been made in understanding the mechanisms and regulation of bacterial transcription. However, relatively few studies have attempted to identify the prevalent features of bacterial transcription <it>de novo </it>using an unbiased genome-wide approach. This approach to analyzing the bacterial transcriptome may not only help reinforce the progress made from traditional molecular and mechanistic studies, but may also identify new global features in transcription that have previously been underappreciated.</p>
<p>The advent of next-generation sequencing allows for a complete characterization of bacterial genomes that was previously not possible. RNA sequencing gives unprecedented insights into transcription unit architecture, while RNA polymerase chromatin immunoprecipitation (ChIP) sequencing reveals the flow of information into the transcriptome. We provide a comprehensive analysis of a cyanobacterial transcriptome - that of <it>Synechococcus elongatus </it>PCC 7942 - integrating data from RNA sequencing, tiling expression microarrays, and RNA polymerase (RNA pol) ChIP sequencing.</p>
<p>The unicellular cyanobacterium <it>S. elongatus </it>PCC 7942 is a genetically tractable model organism for prokaryotic photosynthesis <abbrgrp>
<abbr bid="B1">1</abbr>
</abbrgrp>, bioenergy production, and circadian rhythms <abbrgrp>
<abbr bid="B2">2</abbr>
</abbrgrp>. The circadian clock of <it>S. elongatus </it>is built on a three-protein central oscillator that controls the global rhythmic expression of the majority of the genome <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>. Our transcriptome characterization will facilitate the further use of <it>S. elongatus </it>as a model organism.</p>
</sec>
<sec>
<st>
<p>Results and discussion</p>
</st>
<sec>
<st>
<p>The transcriptome</p>
</st>
<p>We used RNA sequencing, tiling expression microarrays, and RNA pol ChIP sequencing to interrogate transcription in the cyanobacterium <it>S. elongatus</it>. RNA was isolated at 4-hour intervals from circadian free-running cells grown in constant light conditions and RNA from a pool of circadian timepoints was sequenced (Materials and methods). Strand-specific RNA sequencing was performed on the Illumina platform yielding over 22 million uniquely mappable non-rRNA reads and over 620 million nucleotides of coverage, strand-specifically covering each nucleotide of the approximately 2.7 Mb genome an average of approximately 115 times <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> (Materials and methods). Agilent two-color microarrays with a total of approximately 488,000 strand-specific 60-nucleotide probes spaced every 12 nucleotides were hybridized with cDNA from individual circadian timepoints to supplement RNA sequencing analysis (Materials and methods). RNA pol ChIP sequencing of subjective dawn and subjective dusk circadian timepoints was performed on the Illumina platform, yielding a total of over 19 million uniquely mappable reads, covering each nucleotide over approximately 1,055 times after extension of reads by 150 bp to cover the average length of sequenced DNA fragments (Materials and methods). All analysis of RNA pol ChIP was performed on the combination of the two circadian timepoints unless otherwise specified.</p>
<p>The RNA sequencing and RNA pol ChIP sequencing profiles demonstrate that the transcription landscape in <it>S. elongatus </it>is rather dense with very small inter-transcript regions (Figure <figr fid="F1">1a</figr>). Assuming a relatively strict cutoff of at least two reads per nucleotide for transcription, approximately 88% of the genome is transcribed on either the plus or minus strand, and approximately 55% of each strand is transcribed (Materials and methods). Approximately 82% of all non-coding sequence is transcribed on either the plus or minus strand, highlighting the density of transcription in <it>S. elongatus</it>. Fewer than 10% of the 2,612 chromosomally encoded Joint Genome Institute (JGI) predicted ORFs have negligible transcription (less than a mean of two reads per nucleotide across the ORF), and the remaining ORFs have absolute expression distributed over a dynamic range of nearly 10,000. In this study we only sample standard exponential growth conditions during circadian free-run in constant light conditions; both transcription density and the number of expressed ORFs are likely to be higher if multiple growth conditions are sampled.</p>
<fig id="F1"><title><p>Figure 1</p></title><caption><p>RNA sequencing and RNA pol ChIP in <it>S. elongatus</it></p></caption><text>
   <p><b>RNA sequencing and RNA pol ChIP in <it>S. elongatus</it></b>. <b>(a) </b>Strand-specific RNA sequencing over a representative 40-kb region in the <it>S. elongatus </it>chromosome. Positive strand transcription is shown in blue (positive y-axis), and negative strand transcription in red (negative y-axis). For visualization over full dynamic range, the y-axis shows log<sub>2 </sub>transformed reads per nucleotide of RNA sequencing coverage. The position of Joint Genome Institute predicted ORFs for each strand are shown below in black. High RNA sequencing signal is present at nearly all ORFs and anti-sense transcription is extensive. <b>(b) </b>RNA sequencing and RNA pol ChIP sequencing for representative highly expressed transcripts. Top panel: zoomed in view of RNA sequencing coverage of particular mRNA transcripts. Transcripts are color coded by strand as in (a). Transcription units with precise 5' and 3' ends are defined from RNA sequencing data for all mRNAs (black arrow) (Figure S1 in Additional file <supplr sid="S2">2</supplr>; Materials and methods). Bottom panel: RNA pol ChIP sequencing associated with the transcripts from the top panel. The y-axis is normalized such that the genome average is 200 units per nucleotide. Peaks in RNA pol occupancy are often found near the 5' end of the transcript and occasionally smaller peaks in RNA pol occupancy are located near the 3' ends or inside the transcript. 5' peaks tend to be located within the transcript as opposed to within the promoters.</p>
</text><graphic file="gb-2011-12-5-r47-1" hint_layout="double"/></fig>
<p>RNA sequencing affords high-resolution determination of the 5' and 3' ends of each transcription unit. Transcription units were defined using <it>a priori </it>knowledge of JGI ORF, tRNA, and rRNA annotations (Materials and methods). A total of 1,473 transcription units were identified, 1,415 of which were designated as mRNA transcripts as they are devoid of tRNA or rRNA and contain at least one JGI annotated ORF. 5' and 3' ends were determined for all transcripts and all subsequent analysis is performed on the subset defined as mRNA transcripts (Table S1 in Additional file <supplr sid="S1">1</supplr>, Figure S1 in Additional file <supplr sid="S2">2</supplr>; Materials and methods). Highly expressed transcripts show particularly clear 5' and 3' boundaries of transcription, each with an associated peak in RNA pol occupancy as measured by RNA pol ChIP (Figure <figr fid="F1">1b</figr>). The RNA pol ChIP data are characterized by the presence of several large peaks that tend to be located near the 5' end of transcripts, and many smaller peaks that tend to be located either at the 3' end of highly expressed transcripts or within transcripts (Figure S2 in Additional file <supplr sid="S2">2</supplr>). Surprisingly, most 5' RNA pol peaks are situated within the transcript rather than at the promoter. Sequence analysis of RNA pol peak positions reveals enrichment for the central AT nucleotides of the highly iterated palindrome 1 (HIP1) site, 5' GCGATCGC 3', at the RNA pol peak maximum (<it>P </it>&lt; 1e-10, binomial cumulative distribution). The HIP1 palindrome is highly over-represented in many cyanobacteria, including <it>S. elongatus - </it>it appears 185 times more frequently in the <it>S. elongatus </it>chromosome than expected for a random 8-mer sequence, but its function is unknown <abbrgrp>
<abbr bid="B6">6</abbr>
</abbrgrp>. It is known that the HIP1 motif is a target of methylation in some cyanobacteria <abbrgrp>
<abbr bid="B7">7</abbr>
</abbrgrp>, raising the possibility of an intriguing link between DNA methylation and transcription. Although RNA pol peaks are enriched at the HIP1 site, fewer than 1% of HIP1 sites (41 of 7,402) are situated at an RNA pol peak, and fewer than 2% of RNA pol peaks (41 of 2,159) are situated at HIP1 sites. Despite the fact that only 41 HIP1 sites are occupied by RNA pol, the probability of having at least this many sites occupied by chance is less than 1e-10 (binomial cumulative distribution).</p>
<suppl id="S1">
<title>
<p>Additional file 1</p>
</title>
<text>
<p>
<b>Tables S1 to S4</b>. All genome positions and strands are relative to GenBank <ext-link ext-link-id="CP000100" ext-link-type="gen">CP000100</ext-link>. Table S1 - all annotated transcripts: column A, transcript ID number; column B, strand (1 is plus strand, 0 is minus strand); column C, first ORF on transcript; column D, last ORF on transcript; column E, predicted 5' transcription start site; column F, predicted 3' end; column G, length of transcript; column H, is the transcript an mRNA? (all transcripts that included any rRNA or tRNA were not considered as mRNA; Materials and methods); column I, the number of ORFs per transcript; column J, length of 5' UTR; column K, length of 3' UTR; column L, mean of the raw RNA sequencing reads over the full transcript; column M, number of transcripts per cell assuming a total of 1,500 mRNAs per cell. Table S2 - all non-coding transcripts: column A, non-coding transcript ID number; column B, predicted 5' transcription start site; column C, predicted 3' end; column D, strand (1 is plus strand, 0 is minus strand); column E, mean of the raw RNA sequencing reads over the full non-coding transcript; column F, length of non-coding transcript; column G, percent overlap that a non-coding transcript has with an ORF that was designated as not transcribed (designated when mean RNA sequencing coverage of ORF is less than two reads per nucleotide); column H, percentage of non-coding transcript that is antisense to an annotated transcript; column I, does the non-coding transcript pass the high confidence criteria? (Materials and methods); column J, does the non-coding transcript pass the circadian criteria? (Materials and methods); column K, the difference in gene expression of the non-coding transcript in the dusk versus dawn circadian timepoints calculated by tiling microarray (all probes internal to the non-coding transcript were used to make this calculation); column L, RFAM homology. Table S3 - all RNA polymerase peaks: column A, peak ID number; column B, start of peak; column C, end of peak; column D, position of peak maximum; column E, total ChIP reads at peak maximum (sum of circadian timepoints, dawn and dusk, after normalization for total number of reads); column F, <it>P</it>-value for enrichment of reads in ChIP sample versus mock immmunoprecipitation. Table S4 - comparison of literature 5' versus RNA sequencing 5': column A, JGI ID for ORF; column B, common name for ORF; column C, strand (1 is plus strand, 0 is minus strand); column D, translation start position of ORF; column E, literature-based 5' transcription start site; column F, alternative 5' transcription start site from literature; column G, 5' transcription start site estimate from our RNA sequencing; column H, difference between our 5' transcription start site estimate and the closest literature estimate; column I, method of 5' transcription start site determination used in the literature reference; column J, literature reference. Table S5 - expression of all JGI predicted ORFs: column A, JGI ID for ORF; column B, <it>Synpcc7942 </it>ORF ID; column C, start of ORF (in the case when the ORF is on the plus strand, this is where the start codon is located); column D, end of ORF (in the case when the ORF is on the minus strand, this is where the start codon is located); column E, strand (1 is plus strand, 0 is minus strand); column F, mean of the raw RNA sequencing reads over the full ORF.</p>
</text>
<file name="gb-2011-12-5-r47-S1.XLSX">
   <p>Click here for file</p>
</file>
</suppl>
<suppl id="S2">
<title>
<p>Additional file 2</p>
</title>
<text>
<p>
<b>Supplementary Figures S1 to S8</b>. Figure S1: examples of 5' determination from RNA sequencing. <b>(a) </b>5' Determination of the <it>ntcA </it>transcript. A sharp drop in RNA sequencing reads is observed at the 5' end of the mRNA. 5' end determination by RNA sequencing and traditional methods <abbrgrp>
<abbr bid="B61">61</abbr>
</abbrgrp> differ only by a single nucleotide. <b>(b) </b>5' determination of the <it>purF </it>transcript. The RNA sequencing estimate is over 80 nucleotides different from that derived by traditional methods <abbrgrp>
<abbr bid="B62">62</abbr>
</abbrgrp>. Subsequent experiments <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp> have shown that the minimal promoter for the <it>purF </it>transcript contains the RNA sequencing 5' end but not the literature 5' end. A more complete comparison of RNA sequencing and traditional transcription start determination is provided in Table S4 in Additional file <supplr sid="S1">1</supplr>. Figure S2: representative RNA pol ChIP over a 40-kb region. <b>(a) </b>RNA sequencing data. Positive strand transcription is shown in blue (positive y-axis), and negative strand transcription in red (negative y-axis). ORFs on the positive and negative strands are indicated by horizontal black lines. RNA pol peaks significantly enriched over the mock immunoprecipitation (<it>P </it>&lt; 0.1) are indicated with vertical green lines and those that are not (<it>P </it>&#8805; 0.1) are indicated with vertical pink lines. Large RNA pol peaks tend to be located near the 5' end of transcripts, although there are many peaks in the middle of transcripts potentially caused by RNA pol pausing. <b>(b) </b>RNA pol ChIP and mock. RNA pol ChIP (black) and mock immunoprecipitation (green) are normalized such that the genome average is 200 reads per nucleotide. Almost all RNA pol peaks are enriched over the mock immunoprecipitation. A complete listing of RNA pol peaks and their enrichment is provided in Table S3 in Additional file <supplr sid="S1">1</supplr>. <b>(c) </b>RNA pol ChIP normalized by input. Normalization of RNA pol ChIP by input does not qualitatively change the data (compare Figure S2b and Figure S2c in Additional file <supplr sid="S2">2</supplr>). Figure S3: comparison of changes in gene expression and RNA pol ChIP at two points in the circadian cycle. <b>(a) </b>Changes in RNA pol occupancy at two separate times during the circadian cycle (dusk and dawn). Changes in RNA pol are reflective of changes in transcript level by microarray (Pearson correlation, r = 0.6860). The probability of getting a correlation as large by random chance (<it>P</it>-value) is 2.2286e-197. Figure S4: characteristics of transcription start. <b>(a) </b>Melting temperature at transcription start. The melting temperature of 10-nucleotide fragments from -200 to +200 of all mRNAs was averaged (Materials and methods). A drop in the melting temperature is observed at the promoter. <b>(b) </b>Nucleotide content at transcription start sites. Nucleotide content of all mRNAs aligned by transcription start. <b>(c) </b>Zoomed in nucleotide content at transcription start. Nucleotide content of all mRNAs aligned by transcription start. Preference for adenine at the +1 position and a -10 element can be observed. Figure S5: comparison of minimum free energy changes with that of dinucleotide-shuffled sequences. <b>(a) </b>Minimum free energy change at RNA pol peaks. The minimum free energy of 60-nucleotide RNA fragments with 10-nucleotide spacing was calculated and averaged for all mRNAs (Materials and methods). A drop in minimum free energy slightly prior to the position of the RNA pol peak is observed. To prevent sequence features of the transcription terminus or promoters from interfering with this analysis, a subset of 183 RNA pol peaks satisfying the following criteria were used: (1) RNA pol peak must be closer to a 5' end than a 3' end; and (2) RNA pol peak must be +100 to +300 relative to the 5' end. Since RNA pol ChIP does not specify the strand being transcribed, the strand of transcription was inferred from RNA sequencing data. Dinucleotide shuffled sequences show a qualitatively similar trend to native sequences, suggesting that there is no specific secondary structure at this transition (Materials and methods). <b>(b) </b>Sequence changes near RNA pol peaks. A sequence content change from low to high GC content can be observed near the position of the RNA pol peaks. The same subset of RNA pol peaks are used here as in Figure S5a in Additional file <supplr sid="S2">2</supplr>. A smoothing window of five nucleotides has been applied to smooth nucleotide contents. These sequence changes may be responsible for the free energy changes we observe. It is also possible that these changes in sequence content may contribute to RNA pol pausing by an unknown mechanism. <b>(c) </b>Minimum free energy change at transcription terminus. Minimum free energy was calculated as above after aligning all transcripts by transcription terminus. Dinucleotide-shuffled sequences do not resemble native sequences, suggesting that a discrete hairpin-like structure exists at the terminus of transcripts (Materials and methods). <b>(d) </b>Minimum free energy change at transcription start. Minimum free energy was calculated as above after aligning all transcripts by 5' transcription start. A drop in minimum free energy occurs globally within transcripts and may be related to our observation of global RNA pol pausing. Dinucleotide-shuffled sequences show a qualitatively similar trend to native sequences (Materials and methods). Figure S6: enrichment in RNA sequencing at 5'. <b>(a) </b>Increased RNA sequencing signal at 5' ends. An increase in RNA sequencing signal can be observed at the 5' end of mRNAs. Several biological phenomena may account for this enrichment, but one intriguing possibility is the existence of many partial or nascent transcripts caused by pausing of RNA pol near the 5' end of the transcript. <b>(b) </b>RNA pol pausing at 5' ends may contribute to RNA sequencing enrichment at 5' ends. A slight but significant correlation exists between the retention ratio of RNA pol and the enrichment of RNA sequencing prior to the RNA pol peak. The same subset of RNA pol peaks was used as in Figure S5a in Additional file <supplr sid="S2">2</supplr>. Pearson correlation is r = 0.4591, and the probability of getting a correlation as large by random chance (<it>P</it>-value) is 6.2879e-11. Figure S7: the phycocyanin operon - a functional case of partial transcription termination. <b>(a) </b>
<it>Partial transcription termination controls the stochiometry of cpc&#946; </it>and <it>cpc&#945; </it>to rod linker mRNA at approximately 6:1. This stochiometry reflects the organization of the phycobilisome - a hexameric &#945;-&#946; double disc with an associated linker <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp>. RNA sequencing data cannot be mapped to the <it>cpc&#946; </it>and <it>cpc&#945; </it>coding region because it is not unique in the genome (another copy of <it>cpc&#946; </it>and <it>cpc&#945;</it>, corresponding to the core proximal phycobilisomes exists in the genome). The position of predicted terminators (from TransTermHP) is indicated in green, and the position of JGI predicted ORFs is indicated in black. Figure S8: circadian gene expression of putative non-coding RNAs. <b>(a) </b>Gene expression by tiling microarray of high-confidence circadian non-coding RNAs. Gene expression of non-coding RNAs with potential for circadian gene expression are plotted by non-coding transcript ID (Table S2 in Additional file <supplr sid="S1">1</supplr>). Gene expression ratios for non-coding RNAs are computed by averaging the gene expression ratios for all tiling probes internal to the non-coding transcript.</p>
</text>
<file name="gb-2011-12-5-r47-S2.PDF">
   <p>Click here for file</p>
</file>
</suppl>
<p>One of the benefits of RNA sequencing is the ability to infer absolute mRNA transcript levels (Figure <figr fid="F2">2a</figr>). We calculated the absolute expression of each mRNA per cell, assuming a total of 1,500 mRNAs per cell <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp> (Materials and methods; Table S1 in Additional file <supplr sid="S1">1</supplr>). We find that using this estimate, over 80% of mRNA transcripts are present at fewer than one copy per cell, suggesting an enormous diversity in single-cell transcriptome profiles and the potential for stochastic effects to play a substantial role in bacterial gene expression. Even if the estimated number of mRNAs per cell is four times larger (6,000 per cell), still nearly half (46%) of mRNAs are present at less than one copy per cell. Although an enormous amount of diversity in mRNA exists in each cell at any given time, the relatively rapid mRNA decay rates in cyanobacteria <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp> - median 2.4 minutes in <it>Prochlorococcus </it>MED 4 - allow for rapid transcriptome turnover. The distribution of mRNAs per cell appears approximately log-normal with a dynamic range of almost 10,000. Most mRNAs fall within a smaller dynamic range of approximately 100, with a tail of higher expressed transcripts. The bottom part of the distribution was cut at 2<sup>-4 </sup>because transcripts below this level are almost undetectable at our sequencing coverage (Materials and methods). The highest expressed KEGG (Kyoto Encyclopedia of Genes and Genomes) <abbrgrp>
<abbr bid="B11">11</abbr>
</abbrgrp> categories include photosynthesis, ribosome, and RNA polymerase, with <it>P</it>-values of 2.6e-20, 1.3e-20, and 0.001, respectively (two-sided Wilcoxon rank sum test). The lowest expressed KEGG categories include mismatch repair, homologous recombination, and nucleotide excision repair - ORFs that may not be expressed in standard growth conditions (all <it>P </it>&lt; 0.002, two-sided Wilcoxon rank sum test). Absolute transcript levels are generally correlated (Pearson correlation, r = 0.65) with RNA pol occupancy (Figure <figr fid="F2">2b</figr>), suggesting that transcription and not decay is the primary determinant for setting absolute transcript abundance. The variation (approximately one order of magnitude scatter) observed is roughly proportional to the expected distribution of mRNA decay rates in cyanobacteria <abbrgrp>
<abbr bid="B10">10</abbr>
</abbrgrp>. However, this variation may also arise from: (1) different RNA pol elongation rates for different transcripts; (2) variable amounts of RNA pol pausing for different transcripts; and/or (3) lack of strand-specific information in the RNA pol ChIP data.</p>
<fig id="F2"><title><p>Figure 2</p></title><caption><p>Basic features of the <it>S. elongatus </it>transcriptome</p></caption><text>
   <p><b>Basic features of the <it>S. elongatus </it>transcriptome</b>. <b>(a) </b>Distribution of absolute transcript abundance per cell. Only transcripts with mean coverage of over two reads per nucleotide (corresponding to approximately 1 mRNA per 15 cells) are shown, and a total of 1,500 mRNA per cell is assumed <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr></abbrgrp> (Materials and methods). <b>(b) </b>RNA sequencing versus RNA pol ChIP. Absolute transcription (RNA sequencing averaged over transcript) and absolute RNA pol occupancy (RNA pol ChIP averaged over transcript) are generally correlated (Pearson correlation, r = 0.65). The probability of getting a correlation as large by random chance (<it>P</it>-value) is 7.41e-169. <b>(c) </b>Distribution of ORFs per mRNA. Most mRNAs contain one to two ORFs. The extreme case is that of an operon composed primarily of ribosomal proteins that includes 31 ORFs and is 17,158 nucleotides in length. <b>(d) </b>Operon estimations based on RNA sequencing versus bioinformatic predictions. Comparison of RNA sequencing based operon determination and bioinformatic predictions from MicrobesOnline <abbrgrp><abbr bid="B12">12</abbr><abbr bid="B13">13</abbr></abbrgrp>. <b>(e) </b>Distribution of mRNA lengths. The median mRNA length is 1,320 nucleotides, approximately twice the median size of an ORF (776 nucleotides) in S<it>. elongatus</it>.</p>
</text><graphic file="gb-2011-12-5-r47-2" hint_layout="double"/></fig>
<p>Of the 1,415 mRNA transcripts identified, many (approximately 38%) have more than one ORF per transcript (Figure <figr fid="F2">2c</figr>). Most mRNAs contain only one or two ORFs, but the ribosomal protein operon presents an extreme case of 31 ORFs on a transcript spanning over 17,000 nucleotides. Our operon identification via RNA sequencing shows good correlation with bioinformatic operon predictions from MicrobesOnline <abbrgrp>
<abbr bid="B12">12</abbr>
<abbr bid="B13">13</abbr>
</abbrgrp> (Figure <figr fid="F2">2d</figr>), which are based on: (1) distance between ORFs; (2) conservation of synteny in other genomes; and (3) commonality of Gene Ontology or COG category. The relatively high correspondence between RNA sequencing and bioinformatic predictions suggests that the operon structure in <it>S. elongatus </it>may be used to infer the operon structure in other cyanobacterial genomes. The median operon size is 1,320 nucleotides (Figure <figr fid="F2">2e</figr>), approximately twice the median size of an ORF (776 nucleotides) in <it>S. elongatus</it>.</p>
</sec>
<sec>
<st>
<p>Transcription start</p>
</st>
<p>Identification of the 5' ends of all mRNAs allows for more detailed characterization of the promoter and initial steps in transcription. When we align all mRNAs by their 5' transcription start and average their AT content, we observe an increase at the -10 element, also known as the Pribnow box <abbrgrp>
<abbr bid="B14">14</abbr>
<abbr bid="B15">15</abbr>
</abbrgrp> (Figure <figr fid="F3">3a</figr>). At this same location we observe a large drop in DNA melting temperature, a signature of bacterial promoters (Figure S4a in Additional file <supplr sid="S2">2</supplr>). Downstream of the -10 element, we detect a peak in AT content at the first nucleotide of the transcript, indicative of a preference for incorporating adenine (Figure S4b,c in Additional file <supplr sid="S2">2</supplr>). We computed the sequence alignment of the 30 nucleotides prior to the transcription start and find a -10 element similar to that found in a genome-wide map of transcription start sites in <it>Synechocystis </it>PCC 6803 and 25 experimentally determined promoters in <it>Prochlorococcus </it>MED4 <abbrgrp>
<abbr bid="B16">16</abbr>
<abbr bid="B17">17</abbr>
<abbr bid="B18">18</abbr>
</abbrgrp> (Figure <figr fid="F3">3b</figr>; Materials and methods). Sequence alignment or motif analysis at the expected location of the -35 element or spacer does not reveal a strong consensus or motif. The absence of a strong -35 element signature has been observed in <it>Prochlorococcus </it>MED4 and in the <it>psbA </it>transcripts of many cyanobacteria <abbrgrp>
<abbr bid="B16">16</abbr>
<abbr bid="B19">19</abbr>
</abbrgrp>, suggesting that the -35 elements in cyanobacteria may be very diverse in sequence. This diversity in -35 element may be related to the extensive control of gene expression by sigma factors in cyanobacteria <abbrgrp>
<abbr bid="B20">20</abbr>
</abbrgrp>.</p>
<fig id="F3"><title><p>Figure 3</p></title><caption><p>Transcription initiation in <it>S. elongatus</it></p></caption><text>
   <p><b>Transcription initiation in <it>S. elongatus</it></b>. <b>(a) </b>AT content of the transcription start. The AT content from -200 to +200 from the start site of transcription was averaged for all mRNAs. A strong enrichment in AT content is observed at the -10 element as well as a strong preference for adenine at the first nucleotide of a transcript. <b>(b) </b>-10 element consensus logo. A consensus -10 element similar in sequence to that determined for <it>E. coli </it>was identified through sequence alignment (Materials and methods). <b>(c) </b>Normalized RNA pol occupancy at promoter. For each of the top 500 expressed mRNAs, the RNA pol occupancy was normalized to a mean occupancy of 0.5 per nucleotide, and then averaged across mRNAs from -500 to +500. A peak in RNA pol occupancy is observed, on average, 63 nucleotides within the RNA transcript, suggesting potential stalling of RNA pol after initiation of transcription rather than at the promoter. <b>(d) </b>RNA pol retention ratio at the promoter is variable. The relative amount of RNA pol at the 5' end versus RNA pol in the ORF varies from transcript to transcript. RNA pol at the 5' end was calculated as the mean occupancy in a 200-nucleotide window centered at the +63 nucleotide. RNA pol transcribing was calculated as the mean occupancy in a 200-nucleotide window centered in the middle of the first ORF.</p>
</text><graphic file="gb-2011-12-5-r47-3" hint_layout="double"/></fig>
<p>To investigate the presence of RNA pol peaks near the transcription start site, we aligned the top 500 expressed transcripts by their 5' ends, and averaged the normalized RNA pol occupancy profiles (Figure <figr fid="F3">3c</figr>). On average, the maximum of the RNA pol peak is situated 63 nucleotides downstream of the transcription start site. The exact peak position varies from transcript to transcript; this peak can be located either in the 5' UTR or within the first ORF, with the majority of peaks occurring at the beginning of the ORF. This is in stark contrast to previous bacterial RNA pol ChIP-chip studies in which the RNA pol peaks are observed at the promoter <abbrgrp>
<abbr bid="B21">21</abbr>
<abbr bid="B22">22</abbr>
<abbr bid="B23">23</abbr>
<abbr bid="B24">24</abbr>
</abbrgrp>, possibly due to a lack of resolution. A more recent high-resolution RNA pol ChIP-chip study in <it>Escherichia coli </it>was able to localize these RNA pol peaks to within the transcript <abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>.</p>
<p>To assess a potential functional role for these RNA pol peaks, we calculated the RNA pol retention as the ratio of RNA pol occupancy at the 5' end to the RNA pol occupancy in the middle of the first ORF for all mRNAs. We find that over 80% of transcripts have a retention ratio greater than one and that this retention ratio is variable from transcript to transcript (Figure <figr fid="F3">3d</figr>), allowing for the possibility that bacteria can tune the amount of retained RNA pol to affect gene expression.</p>
<p>One possible explanation for these RNA pol peaks is RNA pol pausing due to RNA secondary structure in nascent transcribed RNA, which may cause the RNA pol to pause or pause and subsequently terminate <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>. To determine if RNA secondary structure may be involved in pausing RNA pol, we selected a subset of 183 RNA pol peaks that were located 100 to 300 nucleotides within the transcript and were closer to a 5' end than a 3' end. This subset was chosen to specifically isolate the RNA pol peaks from features at the promoter or terminus, which may bias the analysis. The minimum free energy of 60-nucleotide RNA fragments from the transcribed strand, in 10-nucleotide increments, was calculated around each RNA pol peak and averaged, revealing a steep drop in the minimum free energy slightly prior to the RNA pol peak (Figure S5a in Additional file <supplr sid="S2">2</supplr>; Materials and methods). However, this decrease in free energy is still observed in dinucleotide shuffled sequences, suggesting that a specific stem loop structure is not formed in this region. Instead, we observe a shift in sequence bias from low to high GC content at the RNA pol peak (Figure S5b in Additional file <supplr sid="S2">2</supplr>), which may be influencing the RNA minimum free energy calculation. Thus, the mechanism underlying the global accumulation of RNA pol at the 5' end of transcripts remains unclear.</p>
<p>According to this hypothesis of 5' proximal RNA pol pausing, we should also observe enrichment of RNA sequencing reads at the 5' end of transcripts. Indeed, over 80% of transcripts have more RNA sequencing reads recovered at their 5' ends than in the middle of their first ORF (Figure S6a in Additional file <supplr sid="S2">2</supplr>), and a small but significant correlation exists between enrichment in RNA sequencing at 5' ends and RNA pol retention ratio (Figure S6b in Additional file <supplr sid="S2">2</supplr>).</p>
<p>Our genome-wide observations of 5' RNA pol peaks suggest that this may be a more important and widespread phenomenon in bacterial gene expression than previously appreciated. Our observations of RNA pol pausing may be different from the canonical examples of transcriptional attenuation observed in amino acid biosynthetic operons of <it>E. coli </it>where specific terminator structures attenuate transcription <abbrgrp>
<abbr bid="B26">26</abbr>
</abbrgrp>, although the peaks in RNA pol we observe are qualitatively similar to the peaks at the <it>trp </it>and <it>pyrBI </it>operons observed by tiling microarray in <it>E. coli </it>
<abbrgrp>
<abbr bid="B25">25</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Transcription termination</p>
</st>
<p>In addition to analysis of the transcription start, our catalog of 3' ends allows analysis of transcription termination. Two signals for transcription termination have been previously identified in bacteria: intrinsic Rho-independent terminators, typically low energy RNA hairpins; and Rho-dependent terminators, whose activity relies on the binding of the Rho protein to particular sites on the nascent transcript <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. The majority of bacteria have a homolog of the <it>E. coli </it>Rho protein, but notable exceptions include the cyanobacteria <it>S. elongatus </it>and <it>Synechocystis </it>PCC 6803 <abbrgrp>
<abbr bid="B27">27</abbr>
</abbrgrp>. A previous study analyzing the 3' ends of ORFs in <it>Synechocystis </it>PCC 6803 found no noticeable drop in RNA minimum free energy, suggesting the potential for a previously uncharacterized mechanism for transcription termination in this organism <abbrgrp>
<abbr bid="B28">28</abbr>
</abbrgrp>. With knowledge of the actual 3' positions of transcripts, a more accurate analysis of transcription termination in <it>S. elongatus </it>is possible.</p>
<p>To analyze the secondary structure at the 3' end of transcripts, we averaged the minimum free energy of all transcripts aligned by the 3' end (Figure <figr fid="F4">4a</figr>). We observe a dip in minimum free energy slightly prior to the transcript terminus, indicative of a stem-loop structure involved in Rho-independent transcription termination. This dip in free energy is not present in dinucleotide shuffled sequences, suggesting that a discrete stem-loop structure exists at the end of transcripts (Materials and methods; Figure S5c in Additional file <supplr sid="S2">2</supplr>). To further assess the role of Rho-independent transcription termination in <it>S. elongatus</it>, we assembled all Rho-independent intrinsic terminators predicted in <it>S. elongatus </it>from TransTermHP <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>. These predicted Rho-independent intrinsic terminators typically consist of short, often GC-rich hairpins followed by sequence enriched in thymine nucleotides. We find these terminators tend to be significantly closer to 3' ends than to random locations distributed at the same frequency (Figure <figr fid="F4">4b</figr>). Together, these analyses suggest that the classical Rho-independent termination plays a large role in cyanobacterial transcription termination.</p>
<fig id="F4"><title><p>Figure 4</p></title><caption><p>Transcription termination in <it>S. elongatus</it></p></caption><text>
   <p><b>Transcription termination in <it>S. elongatus</it></b>. <b>(a) </b>Minimum RNA free energy at the end of transcripts. The minimum free energy of 60-nucleotide RNA fragments with 10-nucleotide spacing was calculated and averaged for all mRNAs (Materials and methods). A drop in minimum free energy at the 3' end is indicative of Rho-independent transcription termination. <b>(b) </b>Distance between TransTermHP bioinformatically predicted terminators and 3' ends. Predicted intrinsic terminators (from TransTermHP <abbrgrp><abbr bid="B29">29</abbr></abbrgrp>) tend to be much closer to the 3' end of transcripts than to random positions occurring at the same frequency as 3' ends. Blue bars show distance from a predicted terminator to the closest 3' end. As a control, we randomized the location of 3' ends in the genome. Grey bars show distance from a predicted terminator to the closest randomized 3' end. <b>(c) </b>Energy distributions of TransTermHP terminators. Not all predicted TransTermHP terminators cause transcription termination. Several terminator-like structures are located in non-transcribed regions or in the middle of transcripts. The free energy of terminators that cause transcription termination tends to be lower than the free energy of those that do not. The <it>P</it>-value is 3.00e-20 by two-sided Wilcoxon rank sum test. <b>(d) </b>Partial transcription termination creates complex transcriptional structures. Positive strand transcription is shown in blue and negative strand transcription in red. The positions of predicted terminators (from TransTermHP) are shown in green, and the position of JGI predicted ORFs are shown in black. Terminators located within transcripts often result in a decrease in the transcription of downstream ORFs.</p>
</text><graphic file="gb-2011-12-5-r47-4" hint_layout="double"/></fig>
<p>Not all of the predicted intrinsic terminators cause complete transcription termination. The hairpin energy score (as calculated by TransTermHP <abbrgrp>
<abbr bid="B29">29</abbr>
</abbrgrp>) of those terminators that are within 100 nucleotides of a transcription terminus tend to be lower (more negative) than those that are located elsewhere (Figure <figr fid="F4">4c</figr>). These more stable hairpins may be more competent to cause transcription termination because they are either more likely to fold and/or more likely to cause termination after folding <abbrgrp>
<abbr bid="B30">30</abbr>
</abbrgrp>. In some cases, terminators that do not cause complete termination are involved in creating complex transcription structures. In several of these cases, terminators are found in between ORFs in the same operon, leading to lower transcription of the ORFs proximal to the 3' end (Figure <figr fid="F4">4d</figr>). This strategy could potentially be used to regulate the stoichiometry of transcript abundance of ORFs, and subsequently proteins, regardless of the state of the promoter. A potential physiological example is that of the phycocyanin operon where a terminator that causes incomplete termination sets the stoichiometry of mRNA for <it>cpc&#946; </it>and <it>cpc&#945; </it>to phycobilisome rod linkers at 6:1 - the same stoichiometry as in the organized phycobilosome <abbrgrp>
<abbr bid="B31">31</abbr>
</abbrgrp> (Figure S7 in Additional file <supplr sid="S2">2</supplr>).</p>
</sec>
<sec>
<st>
<p>Putative non-coding transcripts and 5' UTRs</p>
</st>
<p>One particularly interesting feature of the <it>S. elongatus </it>transcriptome is the presence of widespread non-coding transcription. We identify 1,579 putative non-coding transcripts from RNA sequencing, 983 of which are considered high-confidence after verification by tiling microarray, and annotate their 5' and 3' ends (Table S2 in Additional file <supplr sid="S1">1</supplr>; Materials and methods). The number of non-coding transcripts is comparable to the number of annotated protein-coding transcripts (1,415). It is possible that some of the transcripts designated as non-coding may have a protein coding region that was not identified in the JGI annotation. Those putative non-coding transcripts that have any overlap with annotated transcripts on the opposite strand were considered anti-sense and the remaining were considered not anti-sense.</p>
<p>Several hundred non-coding RNAs have previously been identified in <it>E. coli </it>and <it>Bacillus subtilis </it>
<abbrgrp>
<abbr bid="B32">32</abbr>
</abbrgrp> and recently 276 novel transcriptional units were identified in <it>Prochlorococcus </it>MED4 by tiling microarray <abbrgrp>
<abbr bid="B33">33</abbr>
</abbrgrp>, 117 in <it>Mycoplasma pneumonia </it>by tiling microarray and transcriptome sequencing <abbrgrp>
<abbr bid="B34">34</abbr>
</abbrgrp>, 390 in <it>Sulfolobus solfataricus </it>P2 by transcriptome sequencing <abbrgrp>
<abbr bid="B35">35</abbr>
</abbrgrp>, and 137 in <it>Salmonella </it>Typhi by transcriptome sequencing <abbrgrp>
<abbr bid="B36">36</abbr>
</abbrgrp>. As RNA from these and other genomes are sequenced at further depth, we may find that non-coding transcription is more prevalent in bacteria than previously thought <abbrgrp>
<abbr bid="B37">37</abbr>
<abbr bid="B38">38</abbr>
<abbr bid="B39">39</abbr>
</abbrgrp>. A recent RNA sequencing-based map of transcription start sites in another unicellular cyanobacterium, <it>Synechocystis </it>PCC 6803, identified 1,541 potential non-coding transcription start sites, making up 64% of all transcription start sites in the organism <abbrgrp>
<abbr bid="B17">17</abbr>
</abbrgrp>.</p>
<p>We find that some of the non-coding transcripts in <it>S. elongatus </it>display differential expression in the subjective dawn and subjective dusk timepoints, indicative of circadian expression, as assayed by tiling microarray (Table S2 in Additional file <supplr sid="S1">1</supplr>, Figure S8 in Additional file <supplr sid="S2">2</supplr>; Materials and methods). Although several non-coding RNAs appear to exhibit circadian oscillations in expression, the physiological role for circadian gene expression remains unclear and no expression correlation exists between anti-sense circadian non-coding RNAs and the transcripts on the opposite strand.</p>
<p>Very few well-described examples of non-coding RNAs have been noted in cyanobacteria. One previously identified functional non-coding RNA, Yfr1, is required for growth under several stress conditions <abbrgrp>
<abbr bid="B40">40</abbr>
</abbrgrp> (Figure <figr fid="F5">5a</figr>). In <it>S. elongatus</it>, there appears to be occasional co-transcription of Yfr1 with the neighboring ORF <it>guaB</it>, but the extent of co-transcription is negligible compared to the expression of Yfr1. In the same genomic region as Yfr1, we observe several previously unidentified transcripts anti-sense to the <it>trxA </it>and <it>guaB </it>coding regions. The Yfr1 non-coding transcript is approximately 60 nucleotides in length, and the median size of all identified non-coding transcripts is approximately 200 nucleotides, roughly 15% of the size of mRNA transcripts (Figure <figr fid="F5">5b</figr>).</p>
<fig id="F5"><title><p>Figure 5</p></title><caption><p>Non-coding transcripts</p></caption><text>
   <p><b>Non-coding transcripts</b>. <b>(a) </b>Extensive non-coding transcription. Positive strand transcription is shown in blue (positive y-axis), and negative strand transcription in red (negative y-axis). The position of JGI predicted ORFs on the plus and minus strand are shown in black, and the position of the Yfr1 non-coding RNA is shown in green. In this same region, there is anti-sense transcription opposite to the <it>trxA </it>and <it>guaB </it>ORFs. <b>(b) </b>Length distribution of non-coding transcripts. Transcripts that have any overlap with an annotated transcript on the opposite strand are designated anti-sense. Most non-coding transcripts are anti-sense by this designation. The median size for a non-coding transcript is approximately 200 nucleotides.</p>
</text><graphic file="gb-2011-12-5-r47-5" hint_layout="single"/></fig>
<p>We find that most non-coding transcripts are at least partially anti-sense to an mRNA transcript (Figure <figr fid="F5">5b</figr>). These transcripts have the potential for base pairing with the transcript on the opposite strand. One such functional RNA, IsrR, has been identified in the cyanobacterium <it>Synechocystis </it>PCC 6803 <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. This 177-nucleotide RNA is down-regulated in iron stress and base-pairs with the iron stress-induced <it>isiA </it>transcript, subsequently decreasing its levels. IsiA enhances photosynthesis by forming a ring around photosystem I, and IsrR is currently the only RNA known to regulate a photosynthesis component <abbrgrp>
<abbr bid="B41">41</abbr>
</abbrgrp>. We find a transcript anti-sense to <it>isiA </it>in <it>S. elongatus </it>that shows significant similarity to IsrR in <it>Synechocystis </it>PCC 6803 (RNA Families (RFAM) bit score 97.96) <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. This transcript may have a similar role in modulating photosynthesis in <it>S. elongatus</it>.</p>
<p>To identify if any other known RNA families are present within our set of non-coding RNAs, we queried the RFAM database <abbrgrp>
<abbr bid="B42">42</abbr>
</abbrgrp>. In addition to Yfr1, IsrR, and RNase P, we identify a non-coding RNA containing a putative group I intron <abbrgrp>
<abbr bid="B43">43</abbr>
</abbrgrp>. Group I introns are ribozymes capable of catalyzing their own excision from an RNA, and ligating the upstream and downstream exons.</p>
<p>To extend our analysis of potential RNA-based regulators in <it>S. elongatus</it>, we queried our set of 5' UTRs against RFAM and identified metabolite-binding riboswitches for thiamine (vitamin B<sub>1</sub>) and coenzyme B<sub>12 </sub>(vitamin B<sub>12</sub>). The 5' leader of the <it>thiC </it>mRNA in <it>S. elongatus </it>contains a 'thi box' riboswitch domain that undergoes a structural change that has been shown to cause both a reduction in translation and transcription when bound to thiamine or its pyrophosphate derivative <abbrgrp>
<abbr bid="B44">44</abbr>
</abbrgrp>. Similarly, the 5' leader of a putative cobalt transporter (JGI 637799805, <it>Synpcc7942_1373</it>) contains the cobalamin riboswitch domain, which represses expression in the presence of coenzyme B<sub>12 </sub>
<abbrgrp>
<abbr bid="B45">45</abbr>
</abbrgrp>. Both of these mRNA transcripts have unusually large 5' UTRs of 210 and 153 nucleotides, respectively, compared to a median 5' UTR size in <it>S. elongatus </it>of 30 nucleotides. Although most 5' UTRs are small, 12% are longer than 100 nucleotides and 6% are longer than 150 nucleotides. Transcripts with long 5' UTRs may be good candidates for riboswitches or RNA-based regulators. Interestingly, both riboswitch-containing mRNAs show large RNA pol occupancy peaks near the riboswitch domain in the 5' UTR, suggesting that these riboswitches - likely when in their bound configuration - can cause RNA pol pausing or termination. These peaks in RNA pol are qualitatively similar to the peaks we observe globally, although mechanisms likely differ, as most RNA pol peaks are situated within the beginning of the ORF.</p>
</sec>
</sec>
<sec>
<st>
<p>Conclusions</p>
</st>
<p>Here we combine three high-resolution data sets - RNA sequencing, tiling expression microarray, and RNA pol ChIP sequencing - to present a characterization and analysis of the <it>S. elongatus </it>transcriptome. We report absolute transcript levels, operon identification, and high-resolution mapping of 5' and 3' transcript ends. At the 5' end of transcripts, we characterize promoter sequence and find widespread peaks in RNA pol occupancy. At 3' ends we observe significant Rho-independent transcription termination and occasional incomplete termination resulting in interesting transcriptional structures. In addition, we find extensive non-coding transcription, suggesting a larger role for these non-coding RNAs in bacteria, and cyanobacteria in particular, than previously anticipated. The presence of numerous non-coding RNAs and 5' proximal pausing of RNA pol suggest that post-transcriptional regulation - regulation after binding of RNA pol at the promoter - may be more widespread in bacteria than expected. We hope this work will serve as a catalog and primer for further studies of bacterial and cyanobacterial transcription.</p>
</sec>
<sec>
<st>
<p>Materials and methods</p>
</st>
<sec>
<st>
<p>Continuous culture of cyanobacteria</p>
</st>
<p>Cyanobacteria were cultured as previously described <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. A continuous culture apparatus kept cells in constant light and growth conditions and provided real-time bioluminescence readings. <it>S. elongatus </it>(stain AMC 408 <abbrgrp>
<abbr bid="B46">46</abbr>
</abbrgrp>): <it>psbAI::luxCDE </it>fusion in NS1 <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp> (spectinomycin and streptomycin) and <it>purF::luxAB </it>fusion in NSII <abbrgrp>
<abbr bid="B47">47</abbr>
</abbrgrp> (chloramphenicol) was grown in a 6-L cylindrical spinner flask (Corning, Corning, NY, USA) at a volume of 4.5 L. Cells were grown in BG-11 medium <abbrgrp>
<abbr bid="B48">48</abbr>
</abbrgrp> with the following modifications: 0.0010 g/L FeNH<sub>4 </sub>citrate was used instead of 0.0012 g/L FeNH<sub>4 </sub>citrate and citric acid was supplemented at 0.00066 g/L. Cells were initially inoculated in the presence of antibiotics (5 &#956;g/ml spectinomycin and 5 &#956;g/ml chloramphenicol), and subsequently diluted with modified BG-11 lacking antibiotics. Cells were exposed to surface flux of approximately 25 &#956;mol photons m<sup>-2 </sup>s<sup>-1 </sup>cool white florescent light, bubbled with 500 ml/minute 1% CO<sub>2 </sub>in air, maintained at 30&#176;C, and stirred at one rotation per second. Constant optical density (OD<sub>750 </sub>0.15) and volume are achieved via a two state controller. OD does not fluctuate greater than 8% during an experiment. Cells are exposed to two 12-hour light-dark cycles for entrainment before release into continuous light.</p>
</sec>
<sec>
<st>
<p>RNA preparation</p>
</st>
<p>Total RNA was prepared as previously described <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Cells (120 ml) from continuous culture were collected by vacuum filtration, snap frozen in liquid nitrogen, and stored at -80&#176;C for no more than 1 week prior to RNA extraction. RNA was extracted from frozen cells in two steps. First, cells were lysed in 65&#176;C phenol/SDS by vortexing and total RNA was purified by phenol/chloroform extraction. Second, total RNA was subjected to DNase I (Promega, Madison, WI, USA) treatment followed by a second phenol/chloroform extraction. Total RNA was analyzed on agarose gel and an Agilent Bioanalyzer to assess integrity.</p>
</sec>
<sec>
<st>
<p>Strand-specific RNA sequencing</p>
</st>
<p>Total RNA was prepared for timepoints collected at 4-hour intervals from 76 to 96 hours after release into continuous light and mixed in equal proportions. Mixed total RNA was supplemented with RNase Out (Invitrogen, Carlsbad, CA, USA) to a final concentration of 2 units/&#956;l and depleted of 23S and 16S ribosomal subunits using the MICROBExpress Bacterial mRNA Enrichment Kit (Ambion, Austin, TX, USA) according to manufacturer's instructions.</p>
<p>RNA sequencing libraries were prepared from total RNA depleted of 16S and 23S rRNA with modifications to a previously described procedure <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp>. RNA (8 &#956;g) was fragmented for 40 minutes at 95&#176;C in fresh 2 mM EDTA, 100 mM NaCO<sub>2</sub>, pH 9.2. Fragmentation reactions were immediately precipitated in 300 mM NaOAc, pH 5.2, glycogen, and isopropanol. Fragmented RNA was resuspended in RNA loading buffer (Fisher, Pittsburg, PA, USA), briefly denatured, and loaded in a 15% TBE-Urea polyacrylamide gel (BioRad, Hercules, CA, USA) for size selection. Gels were stained with Sybr Gold (Invitrogen) and a 25- to 30-nucleotide band was excised using a synthesized 28-nucleotide RNA and denatured 10-bp DNA ladder (Invitrogen) as standards. The gel slice was physically disrupted and RNA was recovered in 300 mM NaOAc, 1 mM EDTA, 0.1 units/&#956;l SUPERase&#183;In (Ambion) overnight at room temperature. Solution was transferred to a Spin-X cellulose acetate filter (Corning) to remove gel debris and precipitated with glycogen and isopropanol. Size selected fragmented RNA was denatured briefly and dephosphorylated in a 30 &#956;l reaction with 1 &#215; T4 polynucleotide kinase buffer without ATP (NEB, Ipswich, MA, USA), 20 units SUPERase&#183;In, and 15 units T4 polynucleotide kinase (NEB) at 37&#176;C for 1 hour. The reaction was precipitated, resuspended, briefly denatured, and poly-(A) tailed in a 25 &#956;l reaction with 1 &#215; poly-(A) polymerase buffer (NEB), 5 units SUPERase&#183;In, 1 mM ATP, and 1.25 units <it>E. coli </it>poly-(A) polymerase (NEB) at 37&#176;C for 10 minutes. Reactions were quenched with 80 &#956;l of 5 mM EDTA and precipitated.</p>
<p>Reverse transcription was carried out from the introduced poly-(A) tail anchor of denatured RNA using primer oNTI255 <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> with the SuperScript III reverse-transcriptase system (Invitrogen) supplemented with 2 units/&#956;l of SUPERase&#183;In at 48&#176;C for 30 minutes. RNA was subsequently hydrolyzed in 0.1 M NaOH at 98&#176;C for 15 minutes and loaded in a 10% TBE-Urea polyacrylamide gel (BioRad) and the extended first-strand product was excised and recovered as above in 300 mM NaCl, 10 mM Tris, pH 7.9, 1 mM EDTA. First-strand cDNA was circularized in a 20 &#956;l reaction with 1 &#215; CircLigase buffer (Epientre, Madison, WI, USA), 50 &#956;M ATP, 2.5 mM MnCl<sub>2</sub>, and 1 &#956;l CircLigase (Epientre) for 1 hour at 60&#176;C, and then heat-inactivated for 10 minutes at 80&#176;C.</p>
<p>Circularized cDNA template (1 &#956;l) was amplified using Phusion Hot Start High-Fidelity enzyme (NEB) and primers oNTI230 and oNTI231 <abbrgrp>
<abbr bid="B5">5</abbr>
</abbrgrp> to create DNA with Illumina cluster generation sequences on each end along with the Illumina small RNA sequencing primer binding site. PCR was carried out with an initial 30 second denaturation at 98&#176;C, followed by 8 cycles of 10 second denaturation at 98&#176;C, 10 second annealing at 60&#176;C, and 5 second extension at 72&#176;C. PCR product was loaded in a non-denaturing 10% TBE polyacrylamide gel (BioRad) and a 113- to 125-nucleotide band was excised using a 10-bp ladder as standard. DNA was recovered as previously described. Libraries were quantified using an Agilent Bioanalyzer and 4 to 6 pM of template was used for cluster generation and sequenced on Illumina Genome analyzer II with the Illumina small RNA sequencing primer. Sequence tags were stripped of the terminal poly-(A) sequence and aligned to the <it>S. elongatus </it>genome with Bowtie <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. Stripping of terminal poly-(A) sequence at the end of each read will remove the introduced poly-(A) tail but will also remove any trailing adenines at the 3' end of the reverse-transcribed RNA fragment, biasing the 3' end determination of RNAs that end in trailing adenines. GenBank <ext-link ext-link-id="CP000100" ext-link-type="gen">CP000100</ext-link>, <ext-link ext-link-id="CP000101" ext-link-type="gen">CP000101</ext-link>, and <ext-link ext-link-id="S89470" ext-link-type="gen">S89470</ext-link> were used to align reads to the chromosome and endogenous plasmids. Uniquely mappable reads with a maximum of three mismatches were mapped to the genome and extended by the length of the individual read.</p>
<p>A total of 22,375,035 uniquely mappable reads were mapped to the genome with approximately 624 million bases of sequences covering each nucleotide strand-specifically an average of approximately 115 times. These uniquely mappable reads exclude any reads from rRNA since multiple copies of each rRNA exist in the genome. Technical replicates showed very high Pearson correlation coefficients (r &gt; 0.99). RNA sequencing data are displayed and analyzed as coverage per nucleotide - defined as the number of times a given nucleotide position was observed in all the sequencing reads. Absolute transcript levels are assumed to be equal to the average coverage per nucleotide across the length of the transcript. All analysis was performed on the chromosome, although raw data for both endogenous plasmids are available.</p>
</sec>
<sec>
<st>
<p>Strand-specific expression tiling microarray</p>
</st>
<p>Expression was measured using two separate custom designed two-color 244 k microarrays - one for the forward strand and another for the reverse strand (forward strand tiling array, Agilent Array ID 022715; reverse strand tiling array, Agilent Array ID 022716). Arrays were designed using eArray software (Agilent). Forward and reverse strand sequence is as defined by GenBank <ext-link ext-link-id="CP000100" ext-link-type="gen">CP000100</ext-link>, <ext-link ext-link-id="CP000101" ext-link-type="gen">CP000101</ext-link>, and <ext-link ext-link-id="S89470" ext-link-type="gen">S89470</ext-link> - which define the chromosome and two plasmid sequences, respectively.</p>
<p>All tiling probes were 60 nucleotides in length with 12-nucleotide spacing between probe starts such that probe<sub>i </sub>and probe<sub>i+1 </sub>overlapped by 48 nucleotides. A 6-nucleotide offset of the tile between strands allows for 6-nucleotide resolution of double stranded targets and 12-nucleotide resolution for strand-specific targets. In addition, each array included four temperature matched probes (80&#176;C) against each JGI predicted ORF, <it>luxA </it>through <it>luxE</it>, and <it>Arabidopsis </it>spike-in controls (Ambion). These additional probes are identical to those in Agilent Array ID 020846, as previously described <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>.</p>
<p>cDNA was prepared for each individual timepoint (foreground channel) as well as for a pool of all timepoints (background channel). The background channel consisted of a pool of samples collected at 4-hour intervals from 24 to 84 hours after release into continuous light. The foreground channel consisted of individual timepoints 60, 68, 72, and 80 hours after release into continuous light. The same samples were analyzed by non-tiling microarray in <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Spike-in RNA was introduced at different concentrations and ratios to the foreground and background channels before reverse transcription to ensure proper ratio detection over a wide dynamic range. Total RNA (5 &#956;g; plus spike-ins) was reverse-transcribed with random 15-mer primers (Operon, Huntsville, AL, USA) and a 2:3 ratio of amino allyl-UTP:dTTP (Sigma, St. Louis, MO, USA) using the SuperScript III reverse-transcriptase system without amplification. RNA was hydrolyzed and cDNA was purified using Microcon 30 spin column (Millipore, Billerica, MA, USA).</p>
<p>First-strand cDNA was labeled with <it>N</it>-hydroxysuccinimide-ester cyanine 3 (Cy3, foreground) or cyanine 5 (Cy5, background) (GE Biosciences, Uppsala, Sweden) in 0.1 M sodium bicarbonate pH 9.0 for 6 hours. Labeled cDNA was purified (Microcon 30) in preparation for hybridization. Each array was hybridized with approximately 750 ng Cy3 and approximately 750 ng Cy5 labeled cDNA and rotated (five rotations per minute) at 60&#176;C for 17 hours in SureHyb chambers (Agilent). Arrays were subsequently washed in 6.7 &#215; SSPE and 0.005% N-lauryl sarcosine buffer for at least 1 minute, 0.67 &#215; SSPE and 0.005% N-lauryl sarcosine buffer for 1 minute, and then Agilent drying and ozone protection wash for 30 seconds at room temperature (1 &#215; SSPE = 0.15 M NaCl, 10 &#956;M sodium phosphate, 1 mM EDTA, pH 7.4). The arrays were immediately scanned using an Axon 4000B scanner at 5-&#956;m resolution. The median intensity of the Cy3 and Cy5 florescence at each spot was extracted using GenePix software (Molecular Devices, Sunnyvale, CA, USA). For calculation of logarithmic ratios, Loess and quantile normalization were performed in succession using the MATLAB (MathWorks, Natick, MA, USA) bioinformatics toolbox.</p>
</sec>
<sec>
<st>
<p>ChIP sequencing of RNA polymerase</p>
</st>
<p>We crosslinked 250 ml of cells from continuous culture (OD<sub>750 </sub>0.15) with 1% formaldehyde for 15 minutes and then quenched them with 125 mM glycine for 5 minutes at room temperature. Cells were collected by centrifugation and washed twice with cold phosphate-buffered saline buffer, pH 7.4. The cell pellet was snap frozen in liquid nitrogen and stored at -80&#176;C. Samples were collected 32 to 52 hours after release into continuous light at 4-hour intervals. At the same time, samples were collected and processed for non-tiling microarray as described in <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>.</p>
<p>ChIP was performed in a manner similar to that previously described <abbrgrp>
<abbr bid="B50">50</abbr>
<abbr bid="B51">51</abbr>
</abbrgrp>. Cells were mechanically lysed by beating with 0.1 mm glass beads in cold lysis buffer A (50 mM HEPES, pH 7.5, 140 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na-Deoxycholate) with protease inhibitors (Roche, Basel, Switzerland). Chromatin was fragmented by sonication of the lysate to a median of approximately 300 bp and the protein concentration of the supernatant was measured by BCA (bicinchoninic acid) (Thermo, Rockford, IL, USA) using bovine serum albumin as standard. Lysate (750 &#956;g) was incubated with 30 &#956;g of antibody - RNA polymerase &#946; subunit antibody WP023 (Neoclone, Madison, WI, USA) or mouse whole IgG mock (Jackson ImmunoResearch, West Grove, PA, USA) - and incubated overnight at 4&#176;C. We verified that the monoclonal RNA polymerase &#946; subunit antibody WP023 reacts with <it>S. elongatus </it>RNA polymerase &#946; by western blot analysis of whole cell extract, where it produces a single band of the expected size. Lysate was supplemented with Protein G Sepharose Fast-Flow beads (Invitrogen) and incubated for an additional 2 hours at 4&#176;C. After incubation, sepharose beads were washed in cold buffer at room temperature: 2 &#215; 5 minutes lysis buffer A; 1 &#215; 5 minutes lysis buffer B (50 mM HEPES, pH 7.5, 500 mM NaCl, 1 mM EDTA, 1% Triton X-100, 0.1% Na-deoxycholate); 1 &#215; 5 minutes wash buffer (10 mM Tris-HCl, pH 8.0, 250 mM LiCl, 1 mM EDTA, 0.5% NP-40, 0.1% Na-deoxycholate); 1 &#215; 5 minutes TE buffer (10 mM Tris-HCl, pH 8.0, 1 mM EDTA). Protein-DNA was eluted from beads by incubation of samples at 65&#176;C for 1 hour in elution buffer (50 mM Tris-HCl, pH 8.0, 10 mM EDTA, 1.0% SDS). Crosslinks were reversed in supernatant by incubation of samples at 65&#176;C overnight in elution buffer. Western blotting of supernatant of mock versus immunoprecipitation shows 45% efficiency pull-down of the &#946; subunit and 25% co- immunoprecipitation of the &#946;' subunit in the immunoprecipitation using Neoclone antibodies WP023 and WP001, respectively. Proteins were digested with 0.2 mg/ml proteinase K for 2 hours at 37&#176;C. Nucleic acid was then purified with phenol/chloroform extraction and precipitated with ethanol and LiCl. Nucleic acid was re-suspended in TE buffer and RNA was digested in 20 &#956;g/ml RNase and subsequently phenol/chloroform purified. For input control, 5% of the volume of cell lysate was removed after sonication and used to prepare the input DNA. The ChIP DNA concentration was estimated with the Pico-green DNA detection kit (Invitrogen).</p>
<p>ChIP sequencing libraries were prepared for samples zeitgeber time (ZT) 32 (subjective dusk) and ZT 44 (subjective dawn) as these timepoints showed maximal/minimal gene expression for canonical circadian mRNAs <it>kaiC </it>and <it>purF </it>by microarray. Mock ChIP sequencing libraries were prepared for an equal mix of lysate from ZT 32 through ZT 52 (collected at 4 hour intervals). A total of six sequencing libraries were prepared (Table <tblr tid="T1">1</tblr>).</p>
<tbl hint_layout="single" id="T1"><title><p>Table 1</p></title><caption><p>RNA polymerase ChIP samples</p></caption><tblbdy cols="2">
      <r>
         <c ca="left">
            <p>
               <b>Sample</b>
            </p>
         </c>
         <c ca="center">
            <p>
               <b>Total aligned reads</b>
            </p>
         </c>
      </r>
      <r>
         <c cspan="2">
            <hr/>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 32 RNA pol ChIP</p>
         </c>
         <c ca="center">
            <p>8,815,678</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 32 input</p>
         </c>
         <c ca="center">
            <p>20,203,310</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 44 RNA pol ChIP</p>
         </c>
         <c ca="center">
            <p>11,201,620</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 44 input</p>
         </c>
         <c ca="center">
            <p>19,864,425</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 32 through 52 mock</p>
         </c>
         <c ca="center">
            <p>10,595,684</p>
         </c>
      </r>
      <r>
         <c ca="left">
            <p>ZT 32 through 52 input</p>
         </c>
         <c ca="center">
            <p>16,712,868</p>
         </c>
      </r>
   </tblbdy><tblfn>
      <p>ZT, zeitgeber time.</p>
   </tblfn></tbl>
<p>Sequencing libraries were prepared from 10 ng DNA following the Illumina ChIP protocol (revision A) and libraries sized between 200 and 300 bp were selected for amplification. Libraries were assayed with the Agilent Bioanalyzer and 8 pM of template was used for cluster generation. Libraries were sequenced using Illumina primers on an Illumina Genome analyzer II, and each sequence tag was aligned to the <it>S. elongatus </it>genome with Bowtie <abbrgrp>
<abbr bid="B49">49</abbr>
</abbrgrp>. GenBank <ext-link ext-link-id="CP000100" ext-link-type="gen">CP000100</ext-link>, <ext-link ext-link-id="CP000101" ext-link-type="gen">CP000101</ext-link>, and <ext-link ext-link-id="S89470" ext-link-type="gen">S89470</ext-link> were used to align reads to the chromosome and endogenous plasmids. Uniquely mappable reads with a maximum of three mismatches were mapped to the genome. Reads were then extended 150 bp to cover the average length of insert DNA between sequencing adaptors as determined by the Agilent Bioanalyzer.</p>
<p>A comparison of change in RNA pol ChIP versus change in gene expression (measured by non-tiling microarray) at timepoints ZT 32 and ZT 44 is shown in Figure S3 in Additional file <supplr sid="S2">2</supplr>. All other analysis was performed on the sum of the normalized libraries from ZT 32 and ZT 44, which was normalized to a mean coverage of 200 reads per nucleotide. Additional normalization by input does not change conclusions (Figure S2 in Additional file <supplr sid="S2">2</supplr>). A representative region of the genome is presented in Figure S2 in Additional file <supplr sid="S2">2</supplr>. All analysis was performed on the chromosome, although raw data for both endogenous plasmids is available.</p>
</sec>
<sec>
<st>
<p>Calculation of percent of genome transcribed</p>
</st>
<p>The percent of transcription along the <it>S. elongatus </it>chromosome was calculated by imposing a coverage cutoff for transcription of two reads per nucleotide. If a nucleotide is expressed at or over this cutoff it is regarded as transcribed.</p>
<p>This conservative cutoff indicates that only approximately 84.7% of the nucleotides within annotated JGI chromosomal ORFs are transcribed. Of the approximately 15.3% of nucleotides within annotated ORFs that do not pass this cutoff, approximately 41.4% are within an ORF that has an average number of reads per nucleotide of less than 2, which corresponds to approximately 1 RNA per 15 cells when we assume a total of 1,500 mRNAs per cell (245 of 2,665 chromosomal ORFs have &lt;2 reads per nucleotide).</p>
<p>Using this cutoff we find 54.7% of each strand is transcribed and 88.0% of the chromosome is transcribed on either the plus or minus strand. That is, on any given strand and at any given chromosomal position, there is a 54.7% chance that the nucleotide is transcribed. Similarly, at any given chromosomal position, there is an 88% chance that the nucleotide is transcribed on either the plus or minus strand. Eighty-two percent of non-coding regions are transcribed on either the plus or minus strand.</p>
</sec>
<sec>
<st>
<p>Identification of 5' and 3' ends of Joint Genome Institute predicted ORFs and definition of operons</p>
</st>
<p>The 5' and 3' ends of all JGI predicted ORFs (and rRNA and tRNA) with an average coverage of at least two reads per nucleotide were identified using a probability-based approach using <it>a priori </it>knowledge of translation start and stop positions. Of 2,665 chromosomal ORFs (and rRNA and tRNA), 2,420 had an average number of reads per nucleotide of &#8805;2. For every predicted translation start, we searched for the first upstream nucleotide (<it>i </it>- 1 is upstream of <it>i</it>) on the same strand <it>i </it>that was not within a JGI predicted ORF and that satisfied one of the following three criteria: (1) binomial<sub>cdf </sub>( reads<sub>
<it>i-1</it>
</sub>, reads<sub>
<it>i </it>
</sub>+ reads<sub>
<it>i-1</it>
</sub>, 0.5 ) &lt; 0.01 and reads<sub>
<it>i</it>
</sub>/reads<sub>
<it>i-1 </it>
</sub>&#8805; 2; (2) binomial<sub>cdf </sub>( reads<sub>
<it>i-2</it>
</sub>, reads<sub>
<it>i </it>
</sub>+ reads<sub>
<it>i-2</it>
</sub>, 0.5 ) &lt; 0.01 and reads<sub>
<it>i</it>
</sub>/reads<sub>
<it>i-2 </it>
</sub>&#8805; 2; and (3) reads<sub>
<it>i-1 </it>
</sub>
<sub>&lt; </sub>2.</p>
<p>Where binomial<sub>cdf </sub>(k, n, p) is the probability of getting at least k success in n trials when p is the success probability of each trial. This <it>i </it>was designated the 5' transcription start site. The distance of predicted 5' ends to those published in previous studies is reported in Table S4 in Additional file <supplr sid="S1">1</supplr> and examples are shown in Figure S1 in Additional file <supplr sid="S2">2</supplr>. Similarly, for every predicted translation stop codon, we searched for the first downstream nucleotide <it>i </it>that was not within a JGI predicted ORF and that satisfied one of the same criteria. This <it>i </it>was designated the 3' transcription end. 5' Ends tend to be better defined than 3' ends, possibly related to the biology of transcription termination. ORFs that shared the same 5' transcription start site were defined as being on the same operon. We observed 43 cases of multiple transcription start sites - the presence of a 5' transcription start within another transcript. All identified transcripts are reported in Table S1 in Additional file <supplr sid="S1">1</supplr>. A total of 1,473 transcripts were identified. All analysis was performed on the subset of 1,415 transcripts defined as mRNA transcripts as they do not contain any tRNA or rRNA. Note, in some cases a tRNA was predicted to be on the same transcript as an ORF because the high expression of the tRNA obscures the transcription boundary.</p>
</sec>
<sec>
<st>
<p>Identification of non-coding transcripts</p>
</st>
<p>Non-coding transcripts were identified using a multi-tiered approach that first identifies transcribed regions and then estimates their 3' and 5' positions.</p>
<p>First, 15,000 nucleotide intervals of the chromosome (with overlap of 5,000 nucleotides) were optimally segmented into 30 segments of approximately constant signal, yielding a total of 8,070 segments per strand. Segmentation was performed in MATLAB to minimize the cost function:</p>
<p>
<display-formula>
<graphic file="gb-2011-12-5-r47-i1.gif"/>
</display-formula>
</p>
<p>where <it>y</it>
<sub>
<it>i </it>
</sub>is the log<sub>2</sub>(1 + reads<sub>i</sub>) at nucleotide <it>i</it>, <inline-formula>
<graphic file="gb-2011-12-5-r47-i2.gif"/>
</inline-formula> is the arithmetic mean of log<sub>2</sub>(1 + reads) along segment <it>s</it>, <it>t</it>
<sub>1</sub>, ..., <it>t</it>
<sub>
<it>s </it>
</sub>and are segment boundaries <abbrgrp>
<abbr bid="B52">52</abbr>
<abbr bid="B53">53</abbr>
<abbr bid="B54">54</abbr>
</abbrgrp>. This change-point approach more accurately discriminates transcribed and non-transcribed segments than the running window approach and requires only one user-defined parameter - the total number of transcribed segments - which we set at 1 per 500 nucleotides strand-specifically.</p>
<p>Next, all segments that correspond to non-transcribed regions - mean coverage less than two reads per nucleotide - were removed. Segments that overlapped with an annotated transcript (see previous section) were removed and the remaining segments were consolidated. The exact 5' and 3' end of each segment was determined using the same algorithm described in the previous section except 5' and 3' ends were not allowed to overlap with an annotated operon. A total of 1,579 non-coding transcripts were detected using this method. All non-coding transcripts are reported in Table S2 in Additional file <supplr sid="S1">1</supplr>.</p>
</sec>
<sec>
<st>
<p>Identification of high-confidence non-coding transcripts</p>
</st>
<p>Tiling microarray ratios were utilized to identify a set of high-confidence non-coding transcripts. We took advantage of the fact that transcripts have high Pearson cross-correlation among internal probes (probes that are fully internal to the transcript) across all circadian timepoints <abbrgrp>
<abbr bid="B55">55</abbr>
</abbrgrp>. That is, when the ratio of one probe changes at a particular circadian time, the ratio of the other probes within the transcript is similarly affected. First, we assembled the distribution of mean cross-correlation values among internal probes for all predicted JGI ORFs. This formed the expected cumulative distribution for mean cross-correlation of transcribed regions. All non-coding transcripts whose mean cross-correlation was above the 5% cutoff of the expected distribution were considered high-confidence. This assumes that all non-coding transcripts with mean cross-correlation larger than the bottom 5% of ORFs are high-confidence. Table S2 in Additional file <supplr sid="S1">1</supplr> indicates whether a non-coding transcript was designated as high-confidence. Of the 1,579 non-coding transcripts, 157 could not be assayed because they were smaller than the probe width of 60 nucleotides. Of the remaining 1,422 non-coding transcripts, 983 (approximately 70%) passed this cutoff.</p>
</sec>
<sec>
<st>
<p>Identification of high-confidence circadian non-coding transcripts</p>
</st>
<p>Circadian transcripts corresponding to annotated JGI ORFs have been previously described <abbrgrp>
<abbr bid="B3">3</abbr>
<abbr bid="B4">4</abbr>
</abbrgrp>. To identify potential non-coding circadian transcripts, we first calculated the relative gene expression of each non-coding transcript at each timepoint by taking the arithmetic mean of gene expression ratios across all microarray probes internal to the transcript. This gives us the relative expression of each non-coding transcript at each timepoint relative to the background. Then we calculated the gene expression ratio between the two most extreme (in gene expression) circadian timepoints (circadian time (CT) 12 (subjective dusk) and CT 20 (subjective dawn), corresponding to ZT 60 and ZT 72, respectively) <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. Large negative ratios are indicative of dawn-peaking transcripts and large positive ratios are indicative of dusk-peaking transcripts. To assign a designation of circadian behavior to each non-coding transcript, we calculated the same ratios for all annotated ORFs - where the circadian behavior is already known from <abbrgrp>
<abbr bid="B3">3</abbr>
</abbrgrp>. We found the ratio for annotated ORFs at which a cumulative 10% false positive rate existed for dawn or dusk genes, and used these cutoffs to identify potential circadian non-coding transcripts. Expression ratios and indication of potential circadian behavior are shown in Table S2 in Additional file <supplr sid="S1">1</supplr>. The timecourse expression of all high-confidence circadian non-coding RNAs is shown in Figure S8 in Additional file <supplr sid="S2">2</supplr>. Although only 106 of 1,579 non-coding transcripts pass this strict cutoff (10% false-positive rate), by comparing the distribution of ratios for annotated and non-coding transcripts, we estimate that a total of 817 non-coding transcripts are circadian.</p>
</sec>
<sec>
<st>
<p>Identification of RNA polymerase peaks</p>
</st>
<p>RNA pol ChIP peaks were identified in the sum of timepoints ZT 32 and ZT 44 hours using a maxgap/minrun approach similar to the first pass of PeakSeq <abbrgrp>
<abbr bid="B56">56</abbr>
</abbrgrp>. All peaks larger than 100 nucleotides and separated by at least 20 nucleotides in the ChIP sample were assembled for thresholds starting from the mean coverage to ten times the mean coverage with increments of one-twentieth mean coverage. The unique peaks were selected and consolidated such that no peak maximums are within 150 nucleotides of each other. This method accurately captures the wide dynamic range of peaks present in the data. All RNA pol peaks and their enrichment over mock are reported in Table S3 in Additional file <supplr sid="S1">1</supplr>; 87% of RNA pol peaks are enriched over the mock (<it>P </it>&lt; 0.1). Those peaks that are not enriched over mock appear to be actual peaks in RNA pol ChIP, but these RNA pol ChIP peaks are smaller than the mock background, which is elevated with respect to the ChIP background after both data sets are normalized for the number of reads (Figure S2 in Additional file <supplr sid="S2">2</supplr>). All RNA pol peaks were used in analysis and results do not change when only peaks enriched over the mock are used. Figure S2 in Additional file <supplr sid="S2">2</supplr> shows peak identification over a representative genomic region.</p>
</sec>
<sec>
<st>
<p>Distribution of mRNA per cell</p>
</st>
<p>The distribution of mRNA per cell was calculated by assuming a total of 1,500 mRNAs per cell <abbrgrp>
<abbr bid="B8">8</abbr>
<abbr bid="B9">9</abbr>
</abbrgrp>. For each mRNA species <it>m</it>
<sub>
<it>1</it>
</sub>,..., <it>m</it>
<sub>
<it>1415</it>
</sub>, the abundance of the species <it>m</it>
<sub>
<it>i </it>
</sub>per cell was given by:</p>
<p>
<display-formula>
<graphic file="gb-2011-12-5-r47-i3.gif"/>
</display-formula>
</p>
<p>where <it>&#947;</it>
<sub>
<it>i </it>
</sub>is the mean number of reads per nucleotide within the mRNA species <it>i</it>. All mRNA-per-cell estimates are reported in Table S1 in Additional file <supplr sid="S1">1</supplr>. Only mRNAs with <it>&#947;</it>
<sub>
<it>i </it>
</sub>greater than 2 are shown in Figure <figr fid="F2">2a</figr>.</p>
</sec>
<sec>
<st>
<p>Calculation of minimum free energy of secondary structure of RNA</p>
</st>
<p>Minimum free energy of secondary structure of RNA was calculated with MATLAB Bioinformatics Toolbox command <it>rnafold </it>- minimum free energy is calculated using a thermodynamic nearest-neighbor approach <abbrgrp>
<abbr bid="B57">57</abbr>
<abbr bid="B58">58</abbr>
</abbrgrp> and is reported in kcal/mol. All free energies are calculated on 60-nucleotide RNA fragments using a sliding window of 10 nucleotides.</p>
<p>To test whether minimum free energy changes were dependent on dinucleotide frequency of the RNA, dinucleotide shuffled sequences with the same overall dinucleotide content distribution were generated using a first order Markov model. That is, for each position in the sliding window, the dinucleotide content of all sequences was assembled. Then an equal number of dinucleotide shuffled sequences were randomly generated maintaining the same overall dinucleotide content distribution.</p>
<p>At the 3' end of transcripts, a dip in minimum free energy was not observed in the dinucleotide shuffled sequences, but was observed in native sequences (Figure S5c in Additional file <supplr sid="S2">2</supplr>). In addition, the minimum free energy at the dip in native sequences (mean = -16.11 kcal/mol) was significantly lower than that in dinucleotide shuffled sequences at the same position (mean = -13.95 kcal/mol; Z = -0.52, <it>P </it>= 1.66e-31). Z-scores were calculated as the difference in mean of native and dinucleotide shuffled sequences divided by the standard deviation of dinucleotide shuffled sequences and <it>P</it>-value was calculated using the two-sided Wilcoxon rank sum test. This suggests that a particular stem-loop feature, likely associated with transcription termination, is present at the end of transcripts.</p>
<p>At the RNA pol peaks at the 5' ends of genes (Figure S5a in Additional file <supplr sid="S2">2</supplr>), the change in minimum free energy in native and dinucleotide shuffled sequences was nearly identical, suggesting that changes in dinucleotide (or nucleotide) frequency and not a discrete stem loop structure are responsible for the transition in free energy. A change in nucleotide content does occur at the position of the RNA pol peaks (Figure S5b in Additional file <supplr sid="S2">2</supplr>), and may play a role in RNA pol pausing by an unknown mechanism. A drop in minimum free energy in native and dinucleotide shuffled sequences is also observed globally when all transcripts are aligned by their 5' end (Figure S5d in Additional file <supplr sid="S2">2</supplr>). A similar change in nucleotide content occurs approximately 100 nucleotides from the 5' end of transcripts (Figure S4b in Additional file <supplr sid="S2">2</supplr>). These global sequence changes proximal to the 5' end of transcripts may coincide with our observation of global RNA pol pausing internal to the 5' ends of transcripts.</p>
</sec>
<sec>
<st>
<p>Calculation of DNA melting temperature</p>
</st>
<p>Melting temperature was calculated with MATLAB Bioinformatics Toolbox command <it>oligoprop </it>- melting temperatures are calculated using a nearest-neighbor approach with default parameters <abbrgrp>
<abbr bid="B59">59</abbr>
</abbrgrp>.</p>
</sec>
<sec>
<st>
<p>Identification of -10 element in promoters</p>
</st>
<p>All unique mRNA transcription start sites were aligned and the +1 to -30 sequences were input into CONSENSUS-V6C <abbrgrp>
<abbr bid="B60">60</abbr>
</abbrgrp>, which finds a consensus pattern of defined width (width = 8 nucleotides) in unaligned sequences. This procedure identified 5' --Ta-aaT 3' motif, corresponding to the -10 element (Pribnow box), with ln(p) = -4092.23 where p is the probability of identifying a motif with the same or higher information content in an arbitrary alignment. This motif was found at slightly different positions in each of the sequences. To identify the true -10 element while removing any potential false positives, the motif from the subset of alignments that identified the initial nucleotide of the motif at -8 (285 of 1,416 transcripts) is shown in Figure <figr fid="F3">3b</figr>. In subsequent searches using CONSENSUS-V6C or other motif algorithms, no motif was found downstream of the -10 motif where a -35 motif may be expected.</p>
</sec>
</sec>
<sec>
<st>
<p>Abbreviations</p>
</st>
<p>bp: base pair; ChIP: chromatin immunoprecipitation; CT: circadian time; HIP1: highly iterated palindrome 1; JGI: Joint Genome Institute; OD: optical density; ORF: open reading frame; RFAM: RNA Families; RNA pol: RNA polymerase; UTR: untranslated region; ZT: zeitgeber time.</p>
</sec>
<sec>
<st>
<p>Competing interests</p>
</st>
<p>The authors declare that they have no competing interests.</p>
</sec>
<sec>
<st>
<p>Authors' contributions</p>
</st>
<p>VV and EKO designed experiments; VV and IHJ performed experiments; VV and IHJ analyzed data; VV and EKO wrote the paper. All authors have read and approved the manuscript.</p>
</sec>
<sec>
<st>
<p>Data availability</p>
</st>
<p>All data sets have been uploaded to the Gene Expression omnibus under accession [GEO:GSE29264].</p>
</sec>
</bdy><bm>
<ack>
<sec>
<st>
<p>Acknowledgements</p>
</st>
<p>We thank members of the O'Shea laboratory for discussion and commentary. We thank Dr Susan Golden for the <it>S. elongatus </it>strain AMC 408. This work was supported by the Howard Hughes Medical Institute, National Defense Science and Engineering Fellowship (VV), and National Science Foundation Graduate Research Fellowship (VV).</p>
</sec>
</ack>
<refgrp><bibl id="B1"><title><p>Photosynthesis and photoreduction by the blue green alga, <it>Synechococcus elongatus</it>, Nag.</p></title><aug><au><snm>Frenkel</snm><fnm>A</fnm></au><au><snm>Gaffron</snm><fnm>H</fnm></au><au><snm>Battley</snm><fnm>EH</fnm></au></aug><source>Biol Bull</source><pubdate>1950</pubdate><volume>99</volume><fpage>157</fpage><lpage>162</lpage><xrefbib><pubidlist><pubid idtype="doi">10.2307/1538735</pubid><pubid idtype="pmpid" link="fulltext">14791416</pubid></pubidlist></xrefbib></bibl><bibl id="B2"><title><p>Circadian rhythms in prokaryotes: luciferase as a reporter of circadian gene expression.</p></title><aug><au><snm>Kondo</snm><fnm>T</fnm></au><au><snm>Strayer</snm><fnm>CA</fnm></au><au><snm>Kulkarni</snm><fnm>RD</fnm></au><au><snm>Taylor</snm><fnm>W</fnm></au><au><snm>Ishiura</snm><fnm>M</fnm></au><au><snm>Golden</snm><fnm>SS</fnm></au><au><snm>Johnson</snm><fnm>CH</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1993</pubdate><volume>90</volume><fpage>5672</fpage><lpage>5676</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.90.12.5672</pubid><pubid idtype="pmcid">46783</pubid><pubid idtype="pmpid" link="fulltext">8516317</pubid></pubidlist></xrefbib></bibl><bibl id="B3"><title><p>Supercoiling drives circadian gene expression in cyanobacteria.</p></title><aug><au><snm>Vijayan</snm><fnm>V</fnm></au><au><snm>Zuzow</snm><fnm>R</fnm></au><au><snm>O&apos;Shea</snm><fnm>EK</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2009</pubdate><volume>106</volume><fpage>22564</fpage><lpage>22568</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0912673106</pubid><pubid idtype="pmcid">2799730</pubid><pubid idtype="pmpid" link="fulltext">20018699</pubid></pubidlist></xrefbib></bibl><bibl id="B4"><title><p>Cyanobacterial daily life with Kai-based circadian and diurnal genome-wide transcriptional control in <it>Synechococcus elongatus</it>.</p></title><aug><au><snm>Ito</snm><fnm>H</fnm></au><au><snm>Mutsuda</snm><fnm>M</fnm></au><au><snm>Murayama</snm><fnm>Y</fnm></au><au><snm>Tomita</snm><fnm>J</fnm></au><au><snm>Hosokawa</snm><fnm>N</fnm></au><au><snm>Terauchi</snm><fnm>K</fnm></au><au><snm>Sugita</snm><fnm>C</fnm></au><au><snm>Sugita</snm><fnm>M</fnm></au><au><snm>Kondo</snm><fnm>T</fnm></au><au><snm>Iwasaki</snm><fnm>H</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2009</pubdate><volume>106</volume><fpage>14168</fpage><lpage>14173</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0902587106</pubid><pubid idtype="pmcid">2729038</pubid><pubid idtype="pmpid" link="fulltext">19666549</pubid></pubidlist></xrefbib></bibl><bibl id="B5"><title><p>Genome-wide analysis <it>in vivo </it>of translation with nucleotide resolution using ribosome profiling.</p></title><aug><au><snm>Ingolia</snm><fnm>NT</fnm></au><au><snm>Ghaemmaghami</snm><fnm>S</fnm></au><au><snm>Newman</snm><fnm>JRS</fnm></au><au><snm>Weissman</snm><fnm>JS</fnm></au></aug><source>Science</source><pubdate>2009</pubdate><volume>324</volume><fpage>218</fpage><lpage>223</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1168978</pubid><pubid idtype="pmcid">2746483</pubid><pubid idtype="pmpid" link="fulltext">19213877</pubid></pubidlist></xrefbib></bibl><bibl id="B6"><title><p>Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria.</p></title><aug><au><snm>Robinson</snm><fnm>NJ</fnm></au><au><snm>Robinson</snm><fnm>PJ</fnm></au><au><snm>Gupta</snm><fnm>A</fnm></au><au><snm>Bleasby</snm><fnm>AJ</fnm></au><au><snm>Whitton</snm><fnm>BA</fnm></au><au><snm>Morby</snm><fnm>AP</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1995</pubdate><volume>23</volume><fpage>729</fpage><lpage>735</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/23.5.729</pubid><pubid idtype="pmcid">306751</pubid><pubid idtype="pmpid" link="fulltext">7708486</pubid></pubidlist></xrefbib></bibl><bibl id="B7"><title><p>The Cyanobacterium <it>Synechocystis </it>sp. strain PCC 6803 expresses a DNA methyltransferase specific for the recognition sequence of the restriction endonuclease <it>PvuI</it>.</p></title><aug><au><snm>Scharnagl</snm><fnm>M</fnm></au><au><snm>Richter</snm><fnm>S</fnm></au><au><snm>Hagemann</snm><fnm>M</fnm></au></aug><source>J Bacteriol</source><pubdate>1998</pubdate><volume>180</volume><fpage>4116</fpage><lpage>4122</lpage><xrefbib><pubidlist><pubid idtype="pmcid">107406</pubid><pubid idtype="pmpid" link="fulltext">9696758</pubid></pubidlist></xrefbib></bibl><bibl id="B8"><aug><au><snm>Ingraham</snm><fnm>JL</fnm></au><au><snm>Maaloe</snm><fnm>O</fnm></au><au><snm>Neidhardt</snm><fnm>FC</fnm></au></aug><source>Growth of the Bacterial Cell</source><publisher>Sunderland, MA: Sinauer Associates</publisher><pubdate>1983</pubdate></bibl><bibl id="B9"><title><p>Quantifying <it>E. coli </it>proteome and transcriptome with single-molecule sensitivity in single cells.</p></title><aug><au><snm>Taniguchi</snm><fnm>Y</fnm></au><au><snm>Choi</snm><fnm>PJ</fnm></au><au><snm>Li</snm><fnm>GW</fnm></au><au><snm>Babu</snm><fnm>M</fnm></au><au><snm>Hearn</snm><fnm>J</fnm></au><au><snm>Emili</snm><fnm>A</fnm></au><au><snm>Xie</snm><fnm>XS</fnm></au></aug><source>Science</source><pubdate>2010</pubdate><volume>329</volume><fpage>533</fpage><lpage>538</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1188308</pubid><pubid idtype="pmcid">2922915</pubid><pubid idtype="pmpid" link="fulltext">20671182</pubid></pubidlist></xrefbib></bibl><bibl id="B10"><title><p>Short RNA half-lives in the slow-growing marine cyanobacterium <it>Prochlorococcus</it>.</p></title><aug><au><snm>Steglich</snm><fnm>C</fnm></au><au><snm>Lindell</snm><fnm>D</fnm></au><au><snm>Futschik</snm><fnm>M</fnm></au><au><snm>Rector</snm><fnm>T</fnm></au><au><snm>Chisholm</snm><fnm>SW</fnm></au></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><fpage>R54</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2010-11-5-r54</pubid><pubid idtype="pmcid">2897979</pubid><pubid idtype="pmpid" link="fulltext">20482874</pubid></pubidlist></xrefbib></bibl><bibl id="B11"><title><p>The KEGG databases at GenomeNet.</p></title><aug><au><snm>Kanehisa</snm><fnm>M</fnm></au><au><snm>Goto</snm><fnm>S</fnm></au><au><snm>Kawashima</snm><fnm>S</fnm></au><au><snm>Nakaya</snm><fnm>A</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2002</pubdate><volume>30</volume><fpage>42</fpage><lpage>44</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/30.1.42</pubid><pubid idtype="pmcid">99091</pubid><pubid idtype="pmpid" link="fulltext">11752249</pubid></pubidlist></xrefbib></bibl><bibl id="B12"><title><p>MicrobesOnline: an integrated portal for comparative and functional genomics.</p></title><aug><au><snm>Dehal</snm><fnm>PS</fnm></au><au><snm>Joachimiak</snm><fnm>MP</fnm></au><au><snm>Price</snm><fnm>MN</fnm></au><au><snm>Bates</snm><fnm>JT</fnm></au><au><snm>Baumohl</snm><fnm>JK</fnm></au><au><snm>Chivian</snm><fnm>D</fnm></au><au><snm>Friedland</snm><fnm>GD</fnm></au><au><snm>Huang</snm><fnm>KH</fnm></au><au><snm>Keller</snm><fnm>K</fnm></au><au><snm>Novichkov</snm><fnm>PS</fnm></au><au><snm>Dubchak</snm><fnm>IL</fnm></au><au><snm>Alm</snm><fnm>EJ</fnm></au><au><snm>Arkin</snm><fnm>AP</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2010</pubdate><volume>38</volume><fpage>D396</fpage><lpage>D400</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkp919</pubid><pubid idtype="pmcid">2808868</pubid><pubid idtype="pmpid" link="fulltext">19906701</pubid></pubidlist></xrefbib></bibl><bibl id="B13"><title><p>A novel method for accurate operon predictions in all sequenced prokaryotes.</p></title><aug><au><snm>Price</snm><fnm>MN</fnm></au><au><snm>Huang</snm><fnm>KH</fnm></au><au><snm>Alm</snm><fnm>EJ</fnm></au><au><snm>Arkin</snm><fnm>AP</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2005</pubdate><volume>33</volume><fpage>880</fpage><lpage>892</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gki232</pubid><pubid idtype="pmcid">549399</pubid><pubid idtype="pmpid" link="fulltext">15701760</pubid></pubidlist></xrefbib></bibl><bibl id="B14"><title><p>Nucleotide sequence of an RNA polymerase binding site at an early T7 promoter.</p></title><aug><au><snm>Pribnow</snm><fnm>D</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1975</pubdate><volume>72</volume><fpage>784</fpage><lpage>788</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.72.3.784</pubid><pubid idtype="pmcid">432404</pubid><pubid idtype="pmpid">1093168</pubid></pubidlist></xrefbib></bibl><bibl id="B15"><title><p>Nucleotide Sequence of an RNA polymerase binding site from the DNA of bacteriophage.</p></title><aug><au><snm>Schaller</snm><fnm>H</fnm></au><au><snm>Gray</snm><fnm>C</fnm></au><au><snm>Herrman</snm><fnm>K</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>1975</pubdate><volume>72</volume><fpage>737</fpage><lpage>741</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.72.2.737</pubid><pubid idtype="pmcid">432391</pubid><pubid idtype="pmpid">1054851</pubid></pubidlist></xrefbib></bibl><bibl id="B16"><title><p>Experimental and computational analysis of transcriptional start sites in the cyanobacterium <it>Prochlorococcus </it>MED4.</p></title><aug><au><snm>Vogel</snm><fnm>J</fnm></au><au><snm>Axmann</snm><fnm>IM</fnm></au><au><snm>Herzel</snm><fnm>H</fnm></au><au><snm>Hess</snm><fnm>WR</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>2890</fpage><lpage>2899</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg398</pubid><pubid idtype="pmcid">156731</pubid><pubid idtype="pmpid" link="fulltext">12771216</pubid></pubidlist></xrefbib></bibl><bibl id="B17"><title><p>An experimentally anchored map of transcriptional start sites in the model cyanobacterium <it>Synechocystis </it>sp. PCC6803.</p></title><aug><au><snm>Mitschke</snm><fnm>J</fnm></au><au><snm>Georg</snm><fnm>J</fnm></au><au><snm>Scholz</snm><fnm>I</fnm></au><au><snm>Sharma</snm><fnm>CM</fnm></au><au><snm>Dienst</snm><fnm>D</fnm></au><au><snm>Bantscheff</snm><fnm>J</fnm></au><au><snm>Voss</snm><fnm>B</fnm></au><au><snm>Steglich</snm><fnm>C</fnm></au><au><snm>Wilde</snm><fnm>A</fnm></au><au><snm>Vogel</snm><fnm>J</fnm></au><au><snm>Hess</snm><fnm>WR</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2011</pubdate><volume>105</volume><fpage>2124</fpage><lpage>2129</lpage></bibl><bibl id="B18"><title><p>Analysis of <it>E. coli </it>promoter sequences.</p></title><aug><au><snm>Harley</snm><fnm>CB</fnm></au><au><snm>Reynolds</snm><fnm>RP</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1987</pubdate><volume>15</volume><fpage>2343</fpage><lpage>2361</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/15.5.2343</pubid><pubid idtype="pmcid">340638</pubid><pubid idtype="pmpid" link="fulltext">3550697</pubid></pubidlist></xrefbib></bibl><bibl id="B19"><title><p>Specific recognition of the cyanobacterial psbA promoter by RNA polymerases containing principle sigma factors.</p></title><aug><au><snm>Shibato</snm><fnm>J</fnm></au><au><snm>Asayama</snm><fnm>M</fnm></au><au><snm>Shirai</snm><fnm>M</fnm></au></aug><source>Biochim Biophys Acta</source><pubdate>1998</pubdate><volume>1442</volume><fpage>296</fpage><lpage>303</lpage><xrefbib><pubid idtype="pmpid" link="fulltext">9804976</pubid></xrefbib></bibl><bibl id="B20"><title><p>Sigma factors for cyanobacterial transcription.</p></title><aug><au><snm>Imamura</snm><fnm>S</fnm></au><au><snm>Asayama</snm><fnm>M</fnm></au></aug><source>Gene Regul Syst Bio</source><pubdate>2009</pubdate><volume>3</volume><fpage>65</fpage><lpage>87</lpage><xrefbib><pubidlist><pubid idtype="pmcid">2758279</pubid><pubid idtype="pmpid" link="fulltext">19838335</pubid></pubidlist></xrefbib></bibl><bibl id="B21"><title><p>Studies of the distribution of <it>Escherichia coli </it>cAMP-receptor protein and RNA polymerase along the <it>E. coli </it>chromosome.</p></title><aug><au><snm>Grainger</snm><fnm>DC</fnm></au><au><snm>Hurd</snm><fnm>D</fnm></au><au><snm>Harrison</snm><fnm>M</fnm></au><au><snm>Holdsock</snm><fnm>J</fnm></au><au><snm>Busby</snm><fnm>SJW</fnm></au></aug><source>Proc Nat Acad Sci USA</source><pubdate>2005</pubdate><volume>102</volume><fpage>17693</fpage><lpage>17698</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0506687102</pubid><pubid idtype="pmcid">1308901</pubid><pubid idtype="pmpid" link="fulltext">16301522</pubid></pubidlist></xrefbib></bibl><bibl id="B22"><title><p>The transition from transcriptional initiation to elongation.</p></title><aug><au><snm>Wade</snm><fnm>JT</fnm></au><au><snm>Struhl</snm><fnm>K</fnm></au></aug><source>Curr Opin Genet Dev</source><pubdate>2008</pubdate><volume>18</volume><fpage>130</fpage><lpage>136</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.gde.2007.12.008</pubid><pubid idtype="pmcid">2563432</pubid><pubid idtype="pmpid" link="fulltext">18282700</pubid></pubidlist></xrefbib></bibl><bibl id="B23"><title><p>Association of RNA polymerase with transcribed regions in <it>Escherichia coli</it>.</p></title><aug><au><snm>Wade</snm><fnm>JT</fnm></au><au><snm>Struhl</snm><fnm>K</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2004</pubdate><volume>101</volume><fpage>17777</fpage><lpage>17782</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0404305101</pubid><pubid idtype="pmcid">539717</pubid><pubid idtype="pmpid" link="fulltext">15596728</pubid><pubid idtype="pubmed">    15596728  </pubid></pubidlist></xrefbib></bibl><bibl id="B24"><title><p>The transition between transcriptional initiation and elongation in <it>E. coli </it>is highly variable and often rate limiting.</p></title><aug><au><snm>Reppas</snm><fnm>NB</fnm></au><au><snm>Wade</snm><fnm>JT</fnm></au><au><snm>Church</snm><fnm>GM</fnm></au><au><snm>Struhl</snm><fnm>K</fnm></au></aug><source>Mol Cell</source><pubdate>2006</pubdate><volume>24</volume><fpage>747</fpage><lpage>757</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molcel.2006.10.030</pubid><pubid idtype="pmpid" link="fulltext">17157257</pubid></pubidlist></xrefbib></bibl><bibl id="B25"><title><p>Regulator trafficking on bacterial transcription units.</p></title><aug><au><snm>Mooney</snm><fnm>R</fnm></au><au><snm>Davis</snm><fnm>S</fnm></au><au><snm>Peters</snm><fnm>J</fnm></au><au><snm>Rowland</snm><fnm>J</fnm></au><au><snm>Ansari</snm><fnm>A</fnm></au><au><snm>Landick</snm><fnm>R</fnm></au></aug><source>Mol Cell</source><pubdate>2009</pubdate><volume>33</volume><fpage>97</fpage><lpage>108</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.molcel.2008.12.021</pubid><pubid idtype="pmcid">2747249</pubid><pubid idtype="pmpid" link="fulltext">19150431</pubid></pubidlist></xrefbib></bibl><bibl id="B26"><title><p>Attenuation in the control of expression of bacterial operons.</p></title><aug><au><snm>Yanofsky</snm><fnm>C</fnm></au></aug><source>Nature</source><pubdate>1981</pubdate><volume>289</volume><fpage>751</fpage><lpage>758</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/289751a0</pubid><pubid idtype="pmpid">7007895</pubid></pubidlist></xrefbib></bibl><bibl id="B27"><title><p>Prediction of transcriptional terminators in <it>Bacillus subtilis </it>and related species.</p></title><aug><au><snm>Hoon</snm><fnm>MJL</fnm></au><au><snm>Makita</snm><fnm>Y</fnm></au><au><snm>Nakai</snm><fnm>K</fnm></au><au><snm>Miyano</snm><fnm>S</fnm></au></aug><source>PLoS Comput Biol</source><pubdate>2005</pubdate><volume>1</volume><fpage>e25</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pcbi.0010025</pubid><pubid idtype="pmcid">1187862</pubid><pubid idtype="pmpid" link="fulltext">16110342</pubid></pubidlist></xrefbib></bibl><bibl id="B28"><title><p>Analysis of complete genomes suggests that many prokaryotes do not rely on hairpin formation in transcription termination.</p></title><aug><au><snm>Washio</snm><fnm>T</fnm></au><au><snm>Sasayama</snm><fnm>J</fnm></au><au><snm>Tomita</snm><fnm>M</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1998</pubdate><volume>26</volume><fpage>5456</fpage><lpage>5463</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/26.23.5456</pubid><pubid idtype="pmcid">148011</pubid><pubid idtype="pmpid" link="fulltext">9826772</pubid></pubidlist></xrefbib></bibl><bibl id="B29"><title><p>Rapid, accurate, computational discovery of Rho-independent transcription terminators illuminates their relationship to DNA uptake.</p></title><aug><au><snm>Kingsford</snm><fnm>CL</fnm></au><au><snm>Ayanbule</snm><fnm>K</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Genome Biol</source><pubdate>2007</pubdate><volume>8</volume><fpage>R22</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2007-8-2-r22</pubid><pubid idtype="pmcid">1852404</pubid><pubid idtype="pmpid" link="fulltext">17313685</pubid></pubidlist></xrefbib></bibl><bibl id="B30"><title><p>Applied force reveals mechanistic and energetic details of transcription termination.</p></title><aug><au><snm>Larson</snm><fnm>MH</fnm></au><au><snm>Greenleaf</snm><fnm>WJ</fnm></au><au><snm>Landick</snm><fnm>R</fnm></au><au><snm>Block</snm><fnm>SM</fnm></au></aug><source>Cell</source><pubdate>2008</pubdate><volume>132</volume><fpage>971</fpage><lpage>982</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.cell.2008.01.027</pubid><pubid idtype="pmcid">2295211</pubid><pubid idtype="pmpid" link="fulltext">18358810</pubid></pubidlist></xrefbib></bibl><bibl id="B31"><title><p>Cloning and light regulation of expression of the phycocyanin operon of the cyanobacterium <it>Anabaena</it>.</p></title><aug><au><snm>Belknap</snm><fnm>WR</fnm></au><au><snm>Haselkorn</snm><fnm>R</fnm></au></aug><source>EMBO J</source><pubdate>1987</pubdate><volume>6</volume><fpage>871</fpage><lpage>884</lpage><xrefbib><pubidlist><pubid idtype="pmcid">553477</pubid><pubid idtype="pmpid">3109890</pubid></pubidlist></xrefbib></bibl><bibl id="B32"><title><p>Identification of bacterial small non-coding RNAs: experimental approaches.</p></title><aug><au><snm>Altuvia</snm><fnm>S</fnm></au></aug><source>Curr Opin Microbiol</source><pubdate>2007</pubdate><volume>10</volume><fpage>257</fpage><lpage>261</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/j.mib.2007.05.003</pubid><pubid idtype="pmpid" link="fulltext">17553733</pubid></pubidlist></xrefbib></bibl><bibl id="B33"><title><p>Evidence for a major role of antisense RNAs in cyanobacterial gene regulation.</p></title><aug><au><snm>Georg</snm><fnm>J</fnm></au><au><snm>Voss</snm><fnm>B</fnm></au><au><snm>Scholz</snm><fnm>I</fnm></au><au><snm>Mitschke</snm><fnm>J</fnm></au><au><snm>Wilde</snm><fnm>A</fnm></au><au><snm>Hess</snm><fnm>WR</fnm></au></aug><source>Mol Syst Biol</source><pubdate>2009</pubdate><volume>5</volume><fpage>305</fpage><xrefbib><pubidlist><pubid idtype="pmcid">2758717</pubid><pubid idtype="pmpid" link="fulltext">19756044</pubid><pubid idtype="pubmed">    19756044  </pubid></pubidlist></xrefbib></bibl><bibl id="B34"><title><p>Transcriptome complexity in a genome-reduced bacterium.</p></title><aug><au><snm>Guell</snm><fnm>M</fnm></au><au><snm>van Noort</snm><fnm>V</fnm></au><au><snm>Yus</snm><fnm>E</fnm></au><au><snm>Chen</snm><fnm>WH</fnm></au><au><snm>Leigh-Bell</snm><fnm>J</fnm></au><au><snm>Michalodimitrakis</snm><fnm>K</fnm></au><au><snm>Yamada</snm><fnm>T</fnm></au><au><snm>Arumugam</snm><fnm>M</fnm></au><au><snm>Doerks</snm><fnm>T</fnm></au><au><snm>Kuhner</snm><fnm>S</fnm></au><au><snm>Rode</snm><fnm>M</fnm></au><au><snm>Suyama</snm><fnm>M</fnm></au><au><snm>Schmidt</snm><fnm>S</fnm></au><au><snm>Gavin</snm><fnm>AC</fnm></au><au><snm>Bork</snm><fnm>P</fnm></au><au><snm>Serrano</snm><fnm>L</fnm></au></aug><source>Science</source><pubdate>2009</pubdate><volume>326</volume><fpage>1268</fpage><lpage>1271</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1126/science.1176951</pubid><pubid idtype="pmpid" link="fulltext">19965477</pubid></pubidlist></xrefbib></bibl><bibl id="B35"><title><p>A single-base resolution map of an archael transcriptome.</p></title><aug><au><snm>Wurtzel</snm><fnm>O</fnm></au><au><snm>Sapra</snm><fnm>R</fnm></au><au><snm>Chen</snm><fnm>F</fnm></au><au><snm>Zhu</snm><fnm>Y</fnm></au><au><snm>Simmons</snm><fnm>BA</fnm></au><au><snm>Sorek</snm><fnm>R</fnm></au></aug><source>Genome Res</source><pubdate>2010</pubdate><volume>20</volume><fpage>133</fpage><lpage>141</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1101/gr.100396.109</pubid><pubid idtype="pmcid">2798825</pubid><pubid idtype="pmpid" link="fulltext">19884261</pubid></pubidlist></xrefbib></bibl><bibl id="B36"><title><p>A strand-specific RNA-Seq Analysis of the transcriptome of the typhoid bacilus Salmonella typhi.</p></title><aug><au><snm>Perkins</snm><fnm>TT</fnm></au><au><snm>Kingsley</snm><fnm>RA</fnm></au><au><snm>Fookes</snm><fnm>MC</fnm></au><au><snm>Gardner</snm><fnm>PP</fnm></au><au><snm>James</snm><fnm>KD</fnm></au><au><snm>Yu</snm><fnm>L</fnm></au><au><snm>Assefa</snm><fnm>SA</fnm></au><au><snm>He</snm><fnm>M</fnm></au><au><snm>Croucher</snm><fnm>NJ</fnm></au><au><snm>Pickard</snm><fnm>DJ</fnm></au><au><snm>Maskell</snm><fnm>DJ</fnm></au><au><snm>Parkhill</snm><fnm>J</fnm></au><au><snm>Choudhary</snm><fnm>J</fnm></au><au><snm>Thomson</snm><fnm>NR</fnm></au><au><snm>Dougan</snm><fnm>G</fnm></au></aug><source>PLoS Genet</source><pubdate>2009</pubdate><volume>5</volume><fpage>e1000569</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1371/journal.pgen.1000569</pubid><pubid idtype="pmcid">2704369</pubid><pubid idtype="pmpid" link="fulltext">19609351</pubid></pubidlist></xrefbib></bibl><bibl id="B37"><title><p>The primary transcriptome of the major human pathogen Heliobacter pylori.</p></title><aug><au><snm>Sharma</snm><fnm>CM</fnm></au><au><snm>Hoffmann</snm><fnm>S</fnm></au><au><snm>Darfeuille</snm><fnm>F</fnm></au><au><snm>Reignier</snm><fnm>F</fnm></au><au><snm>Findeiss</snm><fnm>S</fnm></au><au><snm>Sittka</snm><fnm>A</fnm></au><au><snm>Chabas</snm><fnm>S</fnm></au><au><snm>Reiche</snm><fnm>K</fnm></au><au><snm>Hackermuller</snm><fnm>J</fnm></au><au><snm>Reinhardt</snm><fnm>R</fnm></au><au><snm>Stadler</snm><fnm>PF</fnm></au><au><snm>Vogel</snm><fnm>J</fnm></au></aug><source>Nature</source><pubdate>2010</pubdate><volume>464</volume><fpage>250</fpage><lpage>255</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature08756</pubid><pubid idtype="pmpid" link="fulltext">20164839</pubid></pubidlist></xrefbib></bibl><bibl id="B38"><title><p>Structure and complexity of a bacterial transcriptome.</p></title><aug><au><snm>Passalacqua</snm><fnm>KD</fnm></au><au><snm>Varadarajan</snm><fnm>A</fnm></au><au><snm>Ondov</snm><fnm>BD</fnm></au><au><snm>Okou</snm><fnm>DT</fnm></au><au><snm>Zwick</snm><fnm>ME</fnm></au><au><snm>Bergman</snm><fnm>NH</fnm></au></aug><source>J Bacteriol</source><pubdate>2009</pubdate><volume>191</volume><fpage>3203</fpage><lpage>3211</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1128/JB.00122-09</pubid><pubid idtype="pmcid">2687165</pubid><pubid idtype="pmpid" link="fulltext">19304856</pubid></pubidlist></xrefbib></bibl><bibl id="B39"><title><p>The transcription unit architecture of the <it>Escherichia coli </it>genome.</p></title><aug><au><snm>Cho</snm><fnm>BK</fnm></au><au><snm>Zengler</snm><fnm>K</fnm></au><au><snm>Qiu</snm><fnm>Y</fnm></au><au><snm>Park</snm><fnm>YS</fnm></au><au><snm>Knight</snm><fnm>EM</fnm></au><au><snm>Barrett</snm><fnm>CL</fnm></au><au><snm>Gao</snm><fnm>Y</fnm></au><au><snm>Palsson</snm><fnm>BO</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2009</pubdate><volume>27</volume><fpage>1043</fpage><lpage>1049</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1582</pubid><pubid idtype="pmpid" link="fulltext">19881496</pubid></pubidlist></xrefbib></bibl><bibl id="B40"><title><p>A cyanobacterial non-coding RNA, Yfr1, is required for growth under multiple stress conditions.</p></title><aug><au><snm>Nakamura</snm><fnm>T</fnm></au><au><snm>Naito</snm><fnm>K</fnm></au><au><snm>Yokota</snm><fnm>N</fnm></au><au><snm>Sugita</snm><fnm>C</fnm></au><au><snm>Sugita</snm><fnm>M</fnm></au></aug><source>Plant Cell Physiol</source><pubdate>2007</pubdate><volume>48</volume><fpage>1309</fpage><lpage>1318</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/pcp/pcm098</pubid><pubid idtype="pmpid" link="fulltext">17664182</pubid></pubidlist></xrefbib></bibl><bibl id="B41"><title><p>An internal antisense RNA regulates expression of the photosynthesis gene isiA.</p></title><aug><au><snm>Duhring</snm><fnm>U</fnm></au><au><snm>Axmann</snm><fnm>IM</fnm></au><au><snm>Hess</snm><fnm>WR</fnm></au><au><snm>Wilde</snm><fnm>A</fnm></au></aug><source>Proc Natl Acad Sci USA</source><pubdate>2006</pubdate><volume>103</volume><fpage>7054</fpage><lpage>7058</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0600927103</pubid><pubid idtype="pmcid">1459017</pubid><pubid idtype="pmpid" link="fulltext">16636284</pubid></pubidlist></xrefbib></bibl><bibl id="B42"><title><p>Rfam: an RNA family database.</p></title><aug><au><snm>Griffith-Jones</snm><fnm>S</fnm></au><au><snm>Bateman</snm><fnm>A</fnm></au><au><snm>Marshall</snm><fnm>M</fnm></au><au><snm>Khanna</snm><fnm>A</fnm></au><au><snm>Eddy</snm><fnm>SR</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2003</pubdate><volume>31</volume><fpage>439</fpage><lpage>441</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/gkg006</pubid><pubid idtype="pmcid">165453</pubid><pubid idtype="pmpid" link="fulltext">12520045</pubid></pubidlist></xrefbib></bibl><bibl id="B43"><title><p>Group I introns: moving in new directions.</p></title><aug><au><snm>Nielsen</snm><fnm>H</fnm></au><au><snm>Johansen</snm><fnm>SD</fnm></au></aug><source>RNA Biol</source><pubdate>2009</pubdate><volume>6</volume><fpage>375</fpage><lpage>383</lpage><xrefbib><pubidlist><pubid idtype="doi">10.4161/rna.6.4.9334</pubid><pubid idtype="pmpid" link="fulltext">19667762</pubid></pubidlist></xrefbib></bibl><bibl id="B44"><title><p>Thiamine derivatives bind messenger RNAs directly to regulate bacterial gene expression.</p></title><aug><au><snm>Winkler</snm><fnm>W</fnm></au><au><snm>Nahvi</snm><fnm>A</fnm></au><au><snm>Breaker</snm><fnm>RR</fnm></au></aug><source>Nature</source><pubdate>2002</pubdate><volume>419</volume><fpage>952</fpage><lpage>956</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature01145</pubid><pubid idtype="pmpid" link="fulltext">12410317</pubid></pubidlist></xrefbib></bibl><bibl id="B45"><title><p>Genetic control by a metabolite binding mRNA.</p></title><aug><au><snm>Nahvi</snm><fnm>A</fnm></au><au><snm>Sudarsan</snm><fnm>N</fnm></au><au><snm>Ebert</snm><fnm>MS</fnm></au><au><snm>Zou</snm><fnm>X</fnm></au><au><snm>Brown</snm><fnm>KL</fnm></au><au><snm>Breaker</snm><fnm>RR</fnm></au></aug><source>Chem Biol</source><pubdate>2002</pubdate><volume>9</volume><fpage>1043</fpage><lpage>1049</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1016/S1074-5521(02)00224-7</pubid><pubid idtype="pmpid" link="fulltext">12323379</pubid></pubidlist></xrefbib></bibl><bibl id="B46"><title><p>Phase determination of circadian gene expression in <it>Synechococcus elongatus </it>PCC 7942.</p></title><aug><au><snm>Min</snm><fnm>H</fnm></au><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Johnson</snm><fnm>CH</fnm></au><au><snm>Golden</snm><fnm>SS</fnm></au></aug><source>J Biol Rhythms</source><pubdate>2004</pubdate><volume>19</volume><fpage>103</fpage><lpage>112</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1177/0748730403262056</pubid><pubid idtype="pmpid" link="fulltext">15038850</pubid></pubidlist></xrefbib></bibl><bibl id="B47"><title><p>Application of bioluminescence to the study of circadian rhythms in cyanobacteria.</p></title><aug><au><snm>Andersson</snm><fnm>CR</fnm></au><au><snm>Tsinoremas</snm><fnm>NF</fnm></au><au><snm>Shelton</snm><fnm>J</fnm></au><au><snm>Lebedeva</snm><fnm>NV</fnm></au><au><snm>Yarrow</snm><fnm>J</fnm></au><au><snm>Min</snm><fnm>H</fnm></au><au><snm>Golden</snm><fnm>SS</fnm></au></aug><source>Methods Enzymol</source><pubdate>2000</pubdate><volume>305</volume><fpage>527</fpage><lpage>542</lpage><xrefbib><pubid idtype="pmpid">10812624</pubid></xrefbib></bibl><bibl id="B48"><title><p>Expression of the psbDII gene in <it>Synechococcus </it>sp. strain PCC 7942 requires sequences downstream of the transcription start site.</p></title><aug><au><snm>Bustos</snm><fnm>SA</fnm></au><au><snm>Golden</snm><fnm>SS</fnm></au></aug><source>J Bacteriol</source><pubdate>1991</pubdate><volume>173</volume><fpage>7525</fpage><lpage>7533</lpage><xrefbib><pubidlist><pubid idtype="pmcid">212519</pubid><pubid idtype="pmpid" link="fulltext">1938947</pubid></pubidlist></xrefbib></bibl><bibl id="B49"><title><p>Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.</p></title><aug><au><snm>Langmead</snm><fnm>B</fnm></au><au><snm>Trapnell</snm><fnm>C</fnm></au><au><snm>Pop</snm><fnm>M</fnm></au><au><snm>Salzberg</snm><fnm>SL</fnm></au></aug><source>Genome Biol</source><pubdate>2009</pubdate><volume>10</volume><fpage>R25</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/gb-2009-10-3-r25</pubid><pubid idtype="pmcid">2690996</pubid><pubid idtype="pmpid" link="fulltext">19261174</pubid></pubidlist></xrefbib></bibl><bibl id="B50"><title><p>Mapping DNA interaction sites of chromosomal proteins using immunoprecipitation and polymerase chain reaction.</p></title><aug><au><snm>Hecht</snm><fnm>A</fnm></au><au><snm>Grunstein</snm><fnm>M</fnm></au></aug><source>Methods Enzymol</source><pubdate>1999</pubdate><volume>304</volume><fpage>399</fpage><lpage>414</lpage><xrefbib><pubid idtype="pmpid">10372373</pubid></xrefbib></bibl><bibl id="B51"><title><p>Chromatic decouples promoter threshold from dynamic range.</p></title><aug><au><snm>Lam</snm><fnm>FH</fnm></au><au><snm>Steger</snm><fnm>DJ</fnm></au><au><snm>O&apos;Shea</snm><fnm>EK</fnm></au></aug><source>Nature</source><pubdate>2008</pubdate><volume>453</volume><fpage>246</fpage><lpage>250</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nature06867</pubid><pubid idtype="pmcid">2435410</pubid><pubid idtype="pmpid" link="fulltext">18418379</pubid></pubidlist></xrefbib></bibl><bibl id="B52"><title><p>A statistical approach for array CGH data analysis.</p></title><aug><au><snm>Picard</snm><fnm>F</fnm></au><au><snm>Robin</snm><fnm>S</fnm></au><au><snm>Lavielle</snm><fnm>M</fnm></au><au><snm>Vaisse</snm><fnm>C</fnm></au><au><snm>Daubin</snm><fnm>JJ</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2005</pubdate><volume>6</volume><fpage>27</fpage><xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-6-27</pubid><pubid idtype="pmcid">549559</pubid><pubid idtype="pmpid" link="fulltext">15705208</pubid></pubidlist></xrefbib></bibl><bibl id="B53"><title><p>A high-resolution map of transcription in the yeast genome.</p></title><aug><au><snm>David</snm><fnm>L</fnm></au><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Granovskaia</snm><fnm>M</fnm></au><au><snm>Toedling</snm><fnm>J</fnm></au><au><snm>Palm</snm><fnm>CJ</fnm></au><au><snm>Bofkin</snm><fnm>L</fnm></au><au><snm>Jones</snm><fnm>T</fnm></au><au><snm>Davis</snm><fnm>RW</fnm></au><au><snm>Steinmetz</snm><fnm>LM</fnm></au></aug><source>Proc Natl Acad Sci U S A</source><pubdate>2006</pubdate><volume>103</volume><fpage>5320</fpage><lpage>5325</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1073/pnas.0601091103</pubid><pubid idtype="pmcid">1414796</pubid><pubid idtype="pmpid" link="fulltext">16569694</pubid><pubid idtype="pubmed">    16569694  </pubid></pubidlist></xrefbib></bibl><bibl id="B54"><title><p>Transcript mapping with high-density oligonucleotide tiling arrays.</p></title><aug><au><snm>Huber</snm><fnm>W</fnm></au><au><snm>Toedling</snm><fnm>J</fnm></au><au><snm>Steinmetz</snm><fnm>LM</fnm></au></aug><source>Bioinformatics</source><volume>22</volume><fpage>1963</fpage><lpage>1970</lpage><xrefbib><pubid idtype="pubmed">    16787969  </pubid></xrefbib></bibl><bibl id="B55"><title><p>High-throughput identification of transcription start sites, conserved promoter motifs and predicted regulons.</p></title><aug><au><snm>McGrath</snm><fnm>PT</fnm></au><au><snm>Lee</snm><fnm>H</fnm></au><au><snm>Zhang</snm><fnm>L</fnm></au><au><snm>Iniesta</snm><fnm>AA</fnm></au><au><snm>Hottes</snm><fnm>AK</fnm></au><au><snm>Tan</snm><fnm>MH</fnm></au><au><snm>Hillson</snm><fnm>NJ</fnm></au><au><snm>Shapiro</snm><fnm>L</fnm></au><au><snm>McAdams</snm><fnm>HH</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2007</pubdate><volume>25</volume><fpage>584</fpage><lpage>592</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt1294</pubid><pubid idtype="pmpid" link="fulltext">17401361</pubid></pubidlist></xrefbib></bibl><bibl id="B56"><title><p>PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.</p></title><aug><au><snm>Rozowsky</snm><fnm>J</fnm></au><au><snm>Euskirchen</snm><fnm>G</fnm></au><au><snm>Auerbach</snm><fnm>RK</fnm></au><au><snm>Zhang</snm><fnm>ZD</fnm></au><au><snm>Gibson</snm><fnm>T</fnm></au><au><snm>Bjornson</snm><fnm>R</fnm></au><au><snm>Carriero</snm><fnm>N</fnm></au><au><snm>Snyder</snm><fnm>M</fnm></au><au><snm>Gerstein</snm><fnm>MB</fnm></au></aug><source>Nat Biotechnol</source><pubdate>2009</pubdate><volume>27</volume><fpage>66</fpage><lpage>75</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1038/nbt.1518</pubid><pubid idtype="pmcid">2924752</pubid><pubid idtype="pmpid" link="fulltext">19122651</pubid></pubidlist></xrefbib></bibl><bibl id="B57"><title><p>Complete suboptimal folding of RNA and the stability of secondary structures.</p></title><aug><au><snm>Wuchty</snm><fnm>S</fnm></au><au><snm>Fontana</snm><fnm>W</fnm></au><au><snm>Hofacker</snm><fnm>I</fnm></au><au><snm>Schuster</snm><fnm>P</fnm></au></aug><source>Biopolymers</source><pubdate>1999</pubdate><volume>49</volume><fpage>145</fpage><lpage>165</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1002/(SICI)1097-0282(199902)49:2&lt;145::AID-BIP4&gt;3.0.CO;2-G</pubid><pubid idtype="pmpid" link="fulltext">10070264</pubid></pubidlist></xrefbib></bibl><bibl id="B58"><title><p>Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure.</p></title><aug><au><snm>Matthews</snm><fnm>D</fnm></au><au><snm>Sabina</snm><fnm>J</fnm></au><au><snm>Zuker</snm><fnm>M</fnm></au><au><snm>Turner</snm><fnm>D</fnm></au></aug><source>J Mol Biol</source><pubdate>1999</pubdate><volume>288</volume><fpage>911</fpage><lpage>940</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1006/jmbi.1999.2700</pubid><pubid idtype="pmpid" link="fulltext">10329189</pubid></pubidlist></xrefbib></bibl><bibl id="B59"><title><p>Improved thermodynamic parameters and helix initiation factor to predict stability of DNA duplexes.</p></title><aug><au><snm>Sugimoto</snm><fnm>N</fnm></au><au><snm>Nakano</snm><fnm>S</fnm></au><au><snm>Yoneyama</snm><fnm>M</fnm></au><au><snm>Honda</snm><fnm>K</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>1996</pubdate><volume>24</volume><fpage>4501</fpage><lpage>4505</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/nar/24.22.4501</pubid><pubid idtype="pmcid">146261</pubid><pubid idtype="pmpid" link="fulltext">8948641</pubid></pubidlist></xrefbib></bibl><bibl id="B60"><title><p>Identifying DNA and protein patterns with statistically significant alignments of multiple sequences.</p></title><aug><au><snm>Gertz</snm><fnm>GZ</fnm></au><au><snm>Stromo</snm><fnm>GD</fnm></au></aug><source>Bioinformatics</source><pubdate>1999</pubdate><volume>15</volume><fpage>563</fpage><lpage>577</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1093/bioinformatics/15.7.563</pubid><pubid idtype="pmpid" link="fulltext">10487864</pubid></pubidlist></xrefbib></bibl><bibl id="B61"><title><p>Molecular mechanism for the operation of nitrogen control in cyanobacteria.</p></title><aug><au><snm>Luque</snm><fnm>I</fnm></au><au><snm>Flores</snm><fnm>E</fnm></au><au><snm>Herrero</snm><fnm>A</fnm></au></aug><source>EMBO J</source><pubdate>1994</pubdate><volume>13</volume><fpage>2862</fpage><lpage>2869</lpage><xrefbib><pubidlist><pubid idtype="pmcid">395167</pubid><pubid idtype="pmpid">8026471</pubid></pubidlist></xrefbib></bibl><bibl id="B62"><title><p>Circadian expression of genes involved in the purine biosynthetic pathway of the cyanobacterium <it>Synechococcus </it>sp. Strain PCC 7942.</p></title><aug><au><snm>Liu</snm><fnm>Y</fnm></au><au><snm>Tsinoremas</snm><fnm>NF</fnm></au><au><snm>Golden</snm><fnm>SS</fnm></au><au><snm>Kondo</snm><fnm>T</fnm></au><au><snm>Johnson</snm><fnm>CH</fnm></au></aug><source>Mol Microbiol</source><pubdate>1996</pubdate><volume>20</volume><fpage>1071</fpage><lpage>1081</lpage><xrefbib><pubidlist><pubid idtype="doi">10.1111/j.1365-2958.1996.tb02547.x</pubid><pubid idtype="pmpid">8809759</pubid></pubidlist></xrefbib></bibl></refgrp>
</bm></art>