<?xml version='1.0'?>
<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
   <ui>gb-2007-8-3-r40</ui>
   <ji>GBJ</ji>
   <fm>
      <dochead>Research</dochead>
      <bibl>
         <title>
            <p>Sense-antisense pairs in mammals: functional and evolutionary considerations</p>
         </title>
         <aug>
            <au id="A1">
               <snm>Galante</snm>
               <mi>AF</mi>
               <fnm>Pedro</fnm>
               <insr iid="I1"/>
               <insr iid="I2"/>
               <email>pgalante@compbio.ludwig.org.br</email>
            </au>
            <au id="A2">
               <snm>Vidal</snm>
               <mi>O</mi>
               <fnm>Daniel</fnm>
               <insr iid="I1"/>
               <email>dvidal@ludwig.org.br</email>
            </au>
            <au id="A3">
               <snm>de Souza</snm>
               <mi>E</mi>
               <fnm>Jorge</fnm>
               <insr iid="I1"/>
               <email>jorge@compbio.ludwig.org.br</email>
            </au>
            <au id="A4">
               <snm>Camargo</snm>
               <mi>A</mi>
               <fnm>Anamaria</fnm>
               <insr iid="I1"/>
               <email>anamaria@ludwig.org.br</email>
            </au>
            <au id="A5" ca="yes">
               <snm>de Souza</snm>
               <mi>J</mi>
               <fnm>Sandro</fnm>
               <insr iid="I1"/>
               <email>sandro@compbio.ludwig.org.br</email>
            </au>
         </aug>
         <insg>
            <ins id="I1">
               <p>Ludwig Institute for Cancer Research, S&#227;o Paulo Branch, Hospital Alem&#227;o Oswaldo Cruz, Rua Jo&#227;o Juliao 245, 1 andar, S&#227;o Paulo, SP 01323-903, Brazil</p>
            </ins>
            <ins id="I2">
               <p>Department Of Biochemistry, University of S&#227;o Paulo, Av. Prof. Lineu Prestes, 748 - sala 351, S&#227;o Paulo, SP 05508-900, Brazil</p>
            </ins>
         </insg>
         <source>Genome Biology</source>
         <issn>1465-6906</issn>
         <pubdate>2007</pubdate>
         <volume>8</volume>
         <issue>3</issue>
         <fpage>R40</fpage>
         <url>http://genomebiology.com/2007/8/3/R40</url>
         <xrefbib>
            <pubidlist>
               <pubid idtype="pmpid">17371592</pubid>
               <pubid idtype="doi">10.1186/gb-2007-8-3-r40</pubid>
            </pubidlist>
         </xrefbib>
      </bibl>
      <history>
         <rec>
            <date>
               <day>3</day>
               <month>5</month>
               <year>2006</year>
            </date>
         </rec>
         <revrec>
            <date>
               <day>4</day>
               <month>9</month>
               <year>2006</year>
            </date>
         </revrec>
         <acc>
            <date>
               <day>19</day>
               <month>3</month>
               <year>2007</year>
            </date>
         </acc>
         <pub>
            <date>
               <day>19</day>
               <month>03</month>
               <year>2007</year>
            </date>
         </pub>
      </history>
      <cpyrt>
         <year>2007</year>
         <collab>Galante et al.; licensee BioMed Central Ltd.</collab>
         <note>This is an open access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note>
      </cpyrt>
      <shorttitle>
         <p>Sense-antisense pairs in mammals</p>
      </shorttitle>
      <shortabs>
         <p>Analysis of a catalog of S-AS pairs in the human and mouse genomes revealed several putative roles for natural antisense transcripts and showed that some are artifacts of cDNA library construction.</p>
      </shortabs>
      <abs>
         <sec>
            <st>
               <p>Abstract</p>
            </st>
            <sec>
               <st>
                  <p>Background</p>
               </st>
               <p>A significant number of genes in mammalian genomes are being found to have natural antisense transcripts (NATs). These sense-antisense (S-AS) pairs are believed to be involved in several cellular phenomena.</p>
            </sec>
            <sec>
               <st>
                  <p>Results</p>
               </st>
               <p>Here, we generated a catalog of S-AS pairs occurring in the human and mouse genomes by analyzing different sources of expressed sequences available in the public domain plus 122 massively parallel signature sequencing (MPSS) libraries from a variety of human and mouse tissues. Using this dataset of almost 20,000 S-AS pairs in both genomes we investigated, in a computational and experimental way, several putative roles that have been assigned to NATs, including gene expression regulation. Furthermore, these global analyses allowed us to better dissect and propose new roles for NATs. Surprisingly, we found that a significant fraction of NATs are artifacts produced by genomic priming during cDNA library construction.</p>
            </sec>
            <sec>
               <st>
                  <p>Conclusion</p>
               </st>
               <p>We propose an evolutionary and functional model in which alternative polyadenylation and retroposition account for the origin of a significant number of functional S-AS pairs in mammalian genomes.</p>
            </sec>
         </sec>
      </abs>
   </fm>
   <meta>
      <classifications>
         <classification type="BMC" subtype="man_spc_id" id="30010016">Molecular biology</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010010">Genome studies</classification>
         <classification type="BMC" subtype="man_spc_id" id="30010008">Evolution</classification>
      </classifications>
   </meta>
   <bdy>
      <sec>
         <st>
            <p>Background</p>
         </st>
         <p>Natural antisense RNAs (or natural antisense transcripts (NATs)) are endogenous transcripts with sequence complementarity to other transcripts. There are two types of NATs in eukaryotic genomes: <it>cis</it>-encoded antisense NATs, which are transcribed from the opposite strand of the same genomic locus as the sense RNA and have a long (or perfect) overlap with the sense transcripts; and <it>trans</it>-encoded antisense NATs, which are transcribed from a different genomic locus of the sense RNA and have a short (or imperfect) overlap with the sense transcripts. <it>Cis</it>-NATs are usually related in a one-to-one fashion to the sense transcript, whereas a single <it>trans</it>-NAT may target several sense transcripts <abbrgrp><abbr bid="B1">1</abbr><abbr bid="B2">2</abbr><abbr bid="B3">3</abbr></abbrgrp>. In this manuscript, we describe analyses in which only cis-NATs were considered. From now on, we refer to these loci as sense-antisense (S-AS) pairs.</p>
         <p>When evaluated globally, several features related to the distribution of NATs strongly suggest they have a prominent role in antisense regulation in gene expression <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr><abbr bid="B7">7</abbr></abbrgrp>. For instance, expression of S-AS transcripts tends to be positively or negatively correlated and is more evolutionarily conserved than expected by chance <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B5">5</abbr><abbr bid="B7">7</abbr></abbrgrp>. Although experimental validation of a putative regulatory role has been achieved for a few models <abbrgrp><abbr bid="B8">8</abbr><abbr bid="B9">9</abbr><abbr bid="B10">10</abbr></abbrgrp>, it is still unknown whether antisense regulation is a rule or an exception in the human genome. NATs have been implicated in RNA and translational interference <abbrgrp><abbr bid="B11">11</abbr></abbrgrp>, genomic imprinting <abbrgrp><abbr bid="B12">12</abbr></abbrgrp>, transcriptional interference <abbrgrp><abbr bid="B13">13</abbr></abbrgrp>, X-inactivation <abbrgrp><abbr bid="B14">14</abbr></abbrgrp>, alternative splicing <abbrgrp><abbr bid="B10">10</abbr><abbr bid="B15">15</abbr></abbrgrp> and RNA editing <abbrgrp><abbr bid="B16">16</abbr></abbrgrp>. Moreover, an accumulating body of evidence suggests that NATs might have a pivotal role in a range of human diseases <abbrgrp><abbr bid="B2">2</abbr></abbrgrp>.</p>
         <p>NATs were initially identified in studies looking at individual genes. However, with the accumulation of whole genome and expressed sequences (mRNA and ESTs) in public databases, a significant number of NATs has been identified using computational analysis <abbrgrp><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. These studies showed a widespread occurrence of these transcripts in mammalian genomes. The first evidence that antisense transcription is a common feature of mammalian genomes came from analysis of reverse complementarity between all available mRNA sequences <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>. Subsequent studies, using larger collections of mRNA sequences, ESTs and genomic sequences, confirmed and extended these initial observations <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B19">19</abbr><abbr bid="B20">20</abbr><abbr bid="B21">21</abbr><abbr bid="B22">22</abbr></abbrgrp>. More recently, other sources of expression data, such as serial analysis of gene expression (SAGE) tags, were used to expand the catalog of NATs present in mammalian genomes <abbrgrp><abbr bid="B23">23</abbr><abbr bid="B24">24</abbr></abbrgrp>. At present, it is estimated that at least 15% and 20% of mouse and human transcripts, respectively, might form S-AS pairs <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B22">22</abbr></abbrgrp>, although a recent analysis <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> reported that 47% of human transcriptional units are involved in S-AS pairing (24.7% and 22.7% corresponding to S-AS pairs with exon and non-exon overlapping, respectively).</p>
         <p>The major obstacle in using expressed sequence data for NAT identification is how to determine the correct orientation of the sequences, especially ESTs. Many ESTs were not directionally cloned and even well-known mRNA sequences were registered from both strands of cloned cDNAs or are incorrectly annotated. As done by others <abbrgrp><abbr bid="B18">18</abbr><abbr bid="B22">22</abbr><abbr bid="B23">23</abbr></abbrgrp>, we here established a set of stringent criteria, including the orientation of splicing sites, the presence of poly-A signal and tail as well as sequence annotation, to determine the correct orientation of each transcript relative to the genomic sequence and made a deep survey of NAT distribution in the human and mouse genomes. Using a set of computational and experimental procedures, we extensively explored expressed sequences and massively parallel signature sequencing (MPSS) data mapped onto the human and mouse genomes. Besides generating a catalog of known and new S-AS pairs, our analyses shed some light on functional and evolutionary aspects of S-AS pairs in mammalian genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Results and discussion</p>
         </st>
         <sec>
            <st>
               <p>Overall distribution of S-AS pairs in human and mouse genomes</p>
            </st>
            <p>To identify transcripts that derive from opposite strands of the same locus, we used a modified version of an in-house knowledgebase previously described for humans <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. This knowledgebase contains more than 6 million expressed sequences mapped onto the human genome sequence and clustered in approximately 111,000 groups. Furthermore, SAGE <abbrgrp><abbr bid="B29">29</abbr></abbrgrp> and MPSS <abbrgrp><abbr bid="B30">30</abbr></abbrgrp> tags were also annotated with all associated information, such as tag frequency, library source and tag-to-gene-assignment (using a strategy developed by us for SAGE Genie <abbrgrp><abbr bid="B31">31</abbr></abbrgrp>). An equivalent knowledgebase was built for the mouse genome (for more details see Materials and methods).</p>
            <p>We first designed software that searched the human and mouse genomes extracting gene information from transcripts mapped onto opposite strands of the same locus. Several parameters were used by the software to identify S-AS pairs, such as: sequence orientation given by the respective GenBank entry; presence and orientation of splice site consensus; and presence of a poly-A tail (for more details see Materials and methods). We found 3,113 and 2,599 S-AS pairs in human and mouse genomes, respectively, containing at least one full-insert cDNA (sequences annotated as 'mRNA' in GenBank and referred to here as such) in each orientation (Table <tblr tid="T1">1</tblr>). Furthermore, we also made use of EST data from both species. A critical issue when using ESTs is the orientation of the sequence, a feature not always available in the respective GenBank entries. We overcame this problem by simply using those ESTs that had a poly-A tail or spanned an intron and, therefore, disclosed their strand of origin by the orientation of a splicing consensus sequence (GT...AG rule). We found 6,964 and 5,492 additional S-AS pairs when EST data were incorporated into the analysis, totaling 10,077 and 8,091 pairs for human and mouse genomes, respectively (Table <tblr tid="T1">1</tblr>). All of these pairs contained at least one mRNA since we did not analyze EST/EST pairs. It is important to note that we haven't considered in the present analysis non-polyadenylated transcripts and <it>trans</it>-NATs. Thus, the total number of NATs is likely to be even higher in both genomes. Data presented in Table <tblr tid="T1">1</tblr> are split in cases where a single S-AS pair is present in a given locus (single bidirectional transcription) and in cases where more than one pair is present per locus (multiple bidirectional transcription). Additional data file 1 lists two representative GenBank entries for all S-AS pairs split by chromosome mapping in the two species. As previously observed <abbrgrp><abbr bid="B17">17</abbr></abbrgrp>, S-AS pairs are under-represented in the sex chromosomes of both species (Additional data file 2).</p>
            <tbl id="T1">
               <title>
                  <p>Table 1</p>
               </title>
               <caption>
                  <p>Overall distribution of S-AS pairs in the human and mouse genomes</p>
               </caption>
               <tblbdy cols="5">
                  <r>
                     <c ca="left">
                        <p>cDNA type</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Single bidirectional transcription</p>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Multiple bidirectional transcription</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>Mouse</p>
                     </c>
                     <c ca="center">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>Mouse</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="5">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mRNA-mRNA</p>
                     </c>
                     <c ca="center">
                        <p>2,109</p>
                     </c>
                     <c ca="center">
                        <p>1,879</p>
                     </c>
                     <c ca="center">
                        <p>1,004</p>
                     </c>
                     <c ca="center">
                        <p>720</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>mRNAs-ESTs</p>
                     </c>
                     <c ca="center">
                        <p>3,299</p>
                     </c>
                     <c ca="center">
                        <p>3,265</p>
                     </c>
                     <c ca="center">
                        <p>3,665</p>
                     </c>
                     <c ca="center">
                        <p>2,227</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>5,408</p>
                     </c>
                     <c ca="center">
                        <p>5,144</p>
                     </c>
                     <c ca="center">
                        <p>4,669</p>
                     </c>
                     <c ca="center">
                        <p>2,947</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Single bidirectional transcription corresponds to those loci in which only one S-AS pair is present. Multiple bidirectional transcription corresponds to those loci in which more than one S-AS pairs is present (at least one gene belongs to more than one S-AS pair).</p>
               </tblfn>
            </tbl>
            <p>The above numbers confirm that S-AS pairs are much more frequent in mammalian genomes than originally estimated <abbrgrp><abbr bid="B4">4</abbr><abbr bid="B17">17</abbr><abbr bid="B18">18</abbr></abbrgrp>. Our analyses suggest that at least 21,000 human and 16,000 mouse genes are involved in S-AS pairing. These numbers are more in agreement with those from <abbrgrp><abbr bid="B32">32</abbr></abbrgrp> in their analysis using tiling microarrays to evaluate gene expression of a fraction of the human genome. For the mouse genome, our numbers are in agreement with those reported by Katayama <it>et al</it>. <abbrgrp><abbr bid="B8">8</abbr></abbrgrp>. A more recent analysis <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> also gives a similar estimate of S-AS pairs in both human and mouse genomes.</p>
            <p>Could this high number of S-AS pairs be due to the stringency of our clustering strategy? If the same transcriptional unit is fragmented in close contigs due to 3' untranslated region (UTR) heterogeneity, the total number of clusters would be inflated, leading to an erroneous count of S-AS pairs. To evaluate this possibility, we relaxed our clustering parameters, requiring a minimum of 1 base-pair (bp) same strand overlap for clustering. Furthermore, we collapsed into a single cluster all pairs of clusters located in the same strand and less than 30 bp away from each other. Additional data file 3 shows the total number of clusters and S-AS pairs after this new clustering strategy was employed. As expected, both the total number of clusters and S-AS pairs decreased with the new clustering methodology. The total number of clusters decreased by 2% and 1% for human and mouse, respectively, while the total number of S-AS pairs decreased by 0.3% for both human and mouse. Thus, the small difference observed does not affect the conclusions on the genomic organization of S-AS pairs. For all further analyses, we decided to use the original dataset obtained with a more stringent clustering methodology.</p>
            <p>We further explored the genomic organization of S-AS pairs using the subset of 3,113 human and 2,599 mouse pairs that contained mRNAs in both sense and antisense orientations. The genomic organization of S-AS pairs can be further divided into three subtypes based on their overlapping patterns: head-head (5'5'), tail-tail (3'3') or embedded (one gene contained entirely within the other) pairs (Table <tblr tid="T2">2</tblr>). For a schematic view of the genomic organization of S-AS pairs, see Additional data file 4. Embedded pairs are more frequent in both species, corresponding to 47.8% and 42.5% of all pairs in human and mouse, respectively. If we take into account the intron/exon organization of both genes, we observe that the most frequent overlap involves at least one exon-intron border. In spite of this, a significant amount of NATs maps completely within introns from the sense gene in both human and mouse (category 'Fully intronic' in Table <tblr tid="T2">2</tblr>). Interestingly, more than three-quarters of all S-AS pairs categorized as 'Fully intronic' fall within the embedded category for human and mouse. How unique is this distribution? Monte Carlo simulations, in which we randomly replaced NATs in relation to sense genes while keeping their 5'5'/embedded/3'3' orientation, show that the distribution of S-AS pairs is quite unique. All three categories of S-AS pairs deviate from a random distribution (chi-square = 11.5, df (degrees of freedom) = 2, <it>p </it>= 0.003 for embedded pairs; chi-square = 49, df = 2, <it>p </it>= 2.3 &#215; 10<sup>-11 </sup>for 5'5' pairs; chi-square = 132, df = 2, <it>p </it>= 2.1 &#215; 10<sup>-29 </sup>for 3'3' pairs). This peculiar distribution will be further discussed in the light of the expression analyses. Since these intronic NATs have been shown to be over-expressed in prostate tumors <abbrgrp><abbr bid="B33">33</abbr></abbrgrp>, our dataset should be further explored regarding differential expression in cancer. Due to their genomic distribution, any putative regulatory role of these intronic NATs would have to be restricted to the nucleus. Interestingly, Kiyosawa <it>et al</it>. <abbrgrp><abbr bid="B34">34</abbr></abbrgrp> observed that a significant amount of NATs in mouse is poly-A negative and nuclear localized.</p>
            <tbl id="T2">
               <title>
                  <p>Table 2</p>
               </title>
               <caption>
                  <p>Distribution of NATs in relation to the genomic structure of the sense transcript</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Human</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Mouse</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5'5'</p>
                     </c>
                     <c ca="center">
                        <p>Embedded</p>
                     </c>
                     <c ca="center">
                        <p>3'3'</p>
                     </c>
                     <c ca="center">
                        <p>5'5'</p>
                     </c>
                     <c ca="center">
                        <p>Embedded</p>
                     </c>
                     <c ca="center">
                        <p>3'3'</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fully exonic</p>
                     </c>
                     <c ca="center">
                        <p>112 (20%)</p>
                     </c>
                     <c ca="center">
                        <p>32 (3%)</p>
                     </c>
                     <c ca="center">
                        <p>213 (40%)</p>
                     </c>
                     <c ca="center">
                        <p>156 (27%)</p>
                     </c>
                     <c ca="center">
                        <p>14 (2%)</p>
                     </c>
                     <c ca="center">
                        <p>227 (45%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Exonic/intronic</p>
                     </c>
                     <c ca="center">
                        <p>362 (64%)</p>
                     </c>
                     <c ca="center">
                        <p>372 (37%)</p>
                     </c>
                     <c ca="center">
                        <p>259 (48%)</p>
                     </c>
                     <c ca="center">
                        <p>360 (62%)</p>
                     </c>
                     <c ca="center">
                        <p>338 (42%)</p>
                     </c>
                     <c ca="center">
                        <p>242 (48%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Fully intronic</p>
                     </c>
                     <c ca="center">
                        <p>92 (16%)</p>
                     </c>
                     <c ca="center">
                        <p>606 (60%)</p>
                     </c>
                     <c ca="center">
                        <p>61 (12%)</p>
                     </c>
                     <c ca="center">
                        <p>61 (11%)</p>
                     </c>
                     <c ca="center">
                        <p>448 (56%)</p>
                     </c>
                     <c ca="center">
                        <p>33 (7%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>566</p>
                     </c>
                     <c ca="center">
                        <p>1,010</p>
                     </c>
                     <c ca="center">
                        <p>533</p>
                     </c>
                     <c ca="center">
                        <p>577</p>
                     </c>
                     <c ca="center">
                        <p>800</p>
                     </c>
                     <c ca="center">
                        <p>502</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>5'5', head-head orientation; 3'3', tail-tail orientation.</p>
               </tblfn>
            </tbl>
            <p>Another interesting observation is the higher frequency of intronless genes within the set of S-AS pairs (Table <tblr tid="T3">3</tblr>). About half (47%) of all mRNA/mRNA S-AS pairs in humans contains at least one intronless gene. This number is slightly lower for mouse (44%) (Table <tblr tid="T3">3</tblr>). Interestingly, intronless genes are significantly enriched within the set of embedded pairs (chi-square = 95.9, <it>p </it>&lt; 1.2 &#215; 10<sup>-22 </sup>for human and chi-square = 3.98 and <it>p </it>&lt; 0.045 for mouse). For humans, 66% of all S-AS pairs containing at least one intronless gene are within the 'embedded' category; Sun <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> found 43.4% of their S-AS pairs as 'embedded'. Furthermore, they found 35% of 3'3' pairs while we found only 25%. These differences are probably due to the fact that Sun <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> included in their analyses pairs containing only ESTs.</p>
            <tbl id="T3">
               <title>
                  <p>Table 3</p>
               </title>
               <caption>
                  <p>Classification of S-AS pairs in reference to their orientation and the presence of introns at the genome level for both genes in a pair</p>
               </caption>
               <tblbdy cols="7">
                  <r>
                     <c ca="left">
                        <p>NAT pair</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Human</p>
                     </c>
                     <c cspan="3" ca="center">
                        <p>Mouse</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>5'5'</p>
                     </c>
                     <c ca="center">
                        <p>Embedded</p>
                     </c>
                     <c ca="center">
                        <p>3'3'</p>
                     </c>
                     <c ca="center">
                        <p>5'5'</p>
                     </c>
                     <c ca="center">
                        <p>Embedded</p>
                     </c>
                     <c ca="center">
                        <p>3'3'</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="7">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Both with intron</p>
                     </c>
                     <c ca="center">
                        <p>342 (61%)</p>
                     </c>
                     <c ca="center">
                        <p>351 (35%)</p>
                     </c>
                     <c ca="center">
                        <p>417 (78%)</p>
                     </c>
                     <c ca="center">
                        <p>259 (45%)</p>
                     </c>
                     <c ca="center">
                        <p>394 (49%)</p>
                     </c>
                     <c ca="center">
                        <p>390 (78%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Intron-intronless</p>
                     </c>
                     <c ca="center">
                        <p>206 (36%)</p>
                     </c>
                     <c ca="center">
                        <p>645 (64%)</p>
                     </c>
                     <c ca="center">
                        <p>103 (19%)</p>
                     </c>
                     <c ca="center">
                        <p>285 (49%)</p>
                     </c>
                     <c ca="center">
                        <p>398 (50%)</p>
                     </c>
                     <c ca="center">
                        <p>96 (19%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Both intronless</p>
                     </c>
                     <c ca="center">
                        <p>18 (3%)</p>
                     </c>
                     <c ca="center">
                        <p>14 (1%)</p>
                     </c>
                     <c ca="center">
                        <p>13 (3%)</p>
                     </c>
                     <c ca="center">
                        <p>33 (6%)</p>
                     </c>
                     <c ca="center">
                        <p>8 (1%)</p>
                     </c>
                     <c ca="center">
                        <p>16 (3%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>566</p>
                     </c>
                     <c ca="center">
                        <p>1,010</p>
                     </c>
                     <c ca="center">
                        <p>533</p>
                     </c>
                     <c ca="center">
                        <p>577</p>
                     </c>
                     <c ca="center">
                        <p>800</p>
                     </c>
                     <c ca="center">
                        <p>502</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>5'5', head-head orientation; 3'3', tail-tail orientation.</p>
               </tblfn>
            </tbl>
            <p>All these results clearly show that subsets of S-AS pairs have distinct genomic organization, suggesting that they may play different biological roles in mammalian genomes. Below we will discuss these data in a functional/evolutionary context.</p>
         </sec>
         <sec>
            <st>
               <p>Conservation of S-AS pairs between human and mouse</p>
            </st>
            <p>Using our set of human and mouse S-AS pairs, we measured the degree of conservation between S-AS pairs from human and mouse. Since the numbers reported so far are discrepant, ranging from a few hundred <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B6">6</abbr></abbrgrp> to almost a thousand <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, we decided to use different strategies. We first used a strategy based on HomoloGene <abbrgrp><abbr bid="B35">35</abbr></abbrgrp>. The number of S-AS pairs with both genes mapped to HomoloGene is 854 for human and 579 for mouse. Among these, 190 S-AS pairs are conserved between human and mouse. One problem with this type of analysis lies in its dependence on HomoloGene, which, for example, does not take into consideration genes that do not code for proteins. Therefore, we decided to implement a different strategy, in which we identified those pairs that had at least one conserved gene mapped by HomoloGene and tested each known gene's NAT for sequence level conservation. Using this strategy, we found an additional 546 cases, giving a total of 736 (190 + 546) conserved S-AS pairs between human and mouse. Finally, we also applied to our dataset the same strategy used by Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, in which they counted the number of human and mouse S-AS pairs that had exon overlap in corresponding positions in a BLASTZ alignment of the two genomes. We applied the same strategy to our dataset and found 1,136 and 1,144 corresponding S-AS pairs in human and mouse, respectively. As observed by Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> the numbers from human and mouse slightly differ because a small proportion of mouse pairs corresponded to several human pairs and <it>vice versa</it>. Additional data file 5 lists all S-AS pairs found by the three methodologies discussed above.</p>
            <p>There is a predominance of 3'3' pairs in all sets of conserved S-AS pairs. For the first strategy solely based on HomoloGene, 67% of all pairs are 3'3' compared to 19% embedded and 14% 5'5'. For the dataset obtained using the strategy from Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, there is also a prevalence of 3'3'pairs (48%) compared to embedded (14%) and 5'5 (38%) pairs. We have also modified the method of Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp> to take into account all S-AS pairs and not only those presenting exon-exon overlap. These data are shown in Additional Data File 6. We observed that S-AS pairs whose overlap is classified as 'Fully intronic' are less represented in the set of conserved S-AS pairs (18% in this set compared to 29% in the whole dataset of S-AS pairs). The same is true for S-AS pairs containing at least one intronless gene (26% in the set of conserved S-AS pairs compared to 47% in the whole dataset). These last results are in accordance with our previous observation that conserved S-AS pairs are enriched with 3'3' pairs. As seen in Tables <tblr tid="T2">2</tblr> and <tblr tid="T3">3</tblr>, 3'3' pairs are poorly represented in the categories 'Fully intronic' (Table <tblr tid="T2">2</tblr>) and 'Intron/intronless' (Table <tblr tid="T3">3</tblr>).</p>
         </sec>
         <sec>
            <st>
               <p>Discovery of new S-AS pairs in human and mouse genomes using MPSS data</p>
            </st>
            <p>Large-scale expression profiling tools have been used to discover and analyze the co-expression of S-AS pairs <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B23">23</abbr><abbr bid="B34">34</abbr></abbrgrp>. Qu&#233;r&#233; <it>et al</it>. <abbrgrp><abbr bid="B23">23</abbr></abbrgrp>, for instance, recently explored the SAGE repositories to detect NATs. These authors searched for tags mapped on the reverse complement of known transcripts and analyzed their expression pattern on different SAGE libraries. However, no attempt was made to experimentally validate the existence of such NATs. Here, we made use of MPSS data available in public repositories <abbrgrp><abbr bid="B36">36</abbr><abbr bid="B37">37</abbr></abbrgrp> to search for new NATs in both human and mouse genomes. Since MPSS tags are longer than conventional SAGE tags, we can use the genome sequence for tag mapping. Furthermore, MPSS offers a much deeper coverage of the transcriptome since at least a million tags are generated from each sample.</p>
            <p>We made use of 122 MPSS libraries derived from a variety of human and mouse tissues (81 libraries for mouse, 41 for human; see the list in Additional data file 7). Our strategy was based on the generation of virtual tags from each genome by simply searching the respective genome sequence for <it>Dpn</it>II sites. Since these sites are palindromes, we extract, for each one, two virtual tags (13 and 16 nucleotide long tags for human and mouse, respectively), both immediately downstream of the restriction site but in opposite orientations (see Materials and methods for more details). In this way, we could evaluate the expression of transcriptional units present in both strands of DNA. We obtained 5,580,158 and 8,645,994 virtual tags for the human and mouse genomes, respectively. This set of virtual tags was then compared to a list of tags observed in the MPSS libraries. As true for any study using mapped tags, our analysis misses those cases in which a tag maps exactly at an exon/exon border at the cDNA level.</p>
            <p>We first evaluated the number of cDNA-based S-AS pairs (shown in Table <tblr tid="T1">1</tblr>) that were further confirmed by the presence of an MPSS tag. Data for this analysis are presented as Additional data file 8. Roughly, 84% and 51% of all cDNA-based S-AS pairs were confirmed by MPSS data for human and mouse, respectively.</p>
            <p>Since we were interested in finding new antisense transcripts, we searched for tags found in the MPSS libraries that were mapped on the opposite strand of both introns and exons of known genes. For this analysis we excluded those genes that were already part of S-AS pairs as described above. For humans, 4,308 genes have at least one MPSS tag derived from the antisense strand (Table <tblr tid="T4">4</tblr>). For 1,221 human genes there were two or more distinct MPSS tags in the antisense orientation. Another interesting observation is the larger number of MPSS tags antisense to exonic regions of the sense genes. Unexpectedly, we found a much smaller number of antisense tags for mouse (Table <tblr tid="T4">4</tblr>). Although the number of mouse libraries is larger (81 mouse and 41 human libraries), the number of unique tags is significantly smaller (56,061 for mouse and 340,820 for human). The assignment of these unique tags to known genes shows a smaller representation of known genes in the mouse dataset (51% against 66% for human). It is unlikely, however, that these differences can explain the dramatic difference shown in Table <tblr tid="T4">4</tblr>. Further analyses are needed to solve this apparent discrepancy.</p>
            <tbl id="T4">
               <title>
                  <p>Table 4</p>
               </title>
               <caption>
                  <p>Distribution of MPSS tags in an antisense orientation in human and mouse genomes</p>
               </caption>
               <tblbdy cols="3">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2" ca="center">
                        <p>Number of clusters</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c cspan="2">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Human</p>
                     </c>
                     <c ca="center">
                        <p>Mouse</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="3">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>One exonic tag</p>
                     </c>
                     <c ca="center">
                        <p>2,212 (51.3%)</p>
                     </c>
                     <c ca="center">
                        <p>124 (57.3%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>One intronic tag</p>
                     </c>
                     <c ca="center">
                        <p>875 (20.3%)</p>
                     </c>
                     <c ca="center">
                        <p>90 (41.7%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Exonic and intronic tag</p>
                     </c>
                     <c ca="center">
                        <p>707 (16.4%)</p>
                     </c>
                     <c ca="center">
                        <p>2 (1%)</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multiple exonic tags</p>
                     </c>
                     <c ca="center">
                        <p>318 (7.4%)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Multiple intronic tags</p>
                     </c>
                     <c ca="center">
                        <p>196 (4.6%)</p>
                     </c>
                     <c ca="center">
                        <p>0</p>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>4,308</p>
                     </c>
                     <c ca="center">
                        <p>216</p>
                     </c>
                  </r>
               </tblbdy>
               <tblfn>
                  <p>Exonic and intronic refer to the genome organization of the sense gene. For instance, the category 'One exonic tag' corresponds to those genes with only one antisense tag complementary to its exonic region. All identified tags are found at a frequency &#8805;3 tags per million (see Materials and methods).</p>
               </tblfn>
            </tbl>
            <p>To experimentally validate the existence of these novel human NAT candidates we used the GLGI (Generation of Longer cDNA fragments from SAGE for Gene Identification)-MPSS technique <abbrgrp><abbr bid="B38">38</abbr></abbrgrp> to convert 96 antisense MPSS tags into their corresponding 3' cDNA fragments. A sense primer corresponding to the antisense MPSS tag was used for GLGI-MPSS amplification as described in Materials and methods. A predominant band was obtained for most of the GLGI-MPSS reactions (Figure <figr fid="F1">1</figr>). Amplified fragments were purified, cloned, sequenced and aligned to the human genome sequence. We were able to generate a specific 3' cDNA fragment for 46 (50.5%) out of 91 novel antisense candidates. Of these 46, the poly-A tail of 19 aligned with stretches of As in the human genome sequence (this finding will be discussed further). The existence of three of these antisense transcripts, out of three that were tested, was further confirmed by orientation-specific RT-PCR (data not shown).</p>
            <fig id="F1">
               <title>
                  <p>Figure 1</p>
               </title>
               <caption>
                  <p>GLGI-MPSS amplification</p>
               </caption>
               <text>
                  <p>GLGI-MPSS amplification. GLGI amplifications for 96 MPSS antisense tags were analyzed on agarose gels stained with ethidium bromide. Note that some lanes show only a single amplified band whereas others have more than one band and sometimes a smear. A 100 bp ladder (M) was used as molecular weight marker.</p>
               </text>
               <graphic file="gb-2007-8-3-r40-1"/>
            </fig>
            <p>Among the 49.5% (91 - 46 = 45) of candidates that were not considered to be validated, we found 25 that were amplified in the GLGI-MPSS experiment but whose exon-intron organization was identical to the sense gene. Although antisense sequences like these have already been observed <abbrgrp><abbr bid="B39">39</abbr></abbrgrp>, we did not consider them as validated antisense transcripts. Orientation-specific RT-PCR confirmed the existence of one transcript, out of two that were tested.</p>
         </sec>
         <sec>
            <st>
               <p>Alternative polyadenylation as a major factor in defining S-AS pairs</p>
            </st>
            <p>Dahary <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> observed that S-AS overlap usually involves transcripts generated by alternative polyadenylation. This observation had already been reported by us and others <abbrgrp><abbr bid="B40">40</abbr></abbrgrp>. We decided to test if these preliminary observations would survive a more quantitative analysis. We found that the S-AS overlap is predominantly due to alternative polyadenylation variants. Roughly, 51% of all S-AS pairs (274 out of 533 3'3' pairs) overlap due to the existence of at least one variant. This number is certainly underestimated since many variants are still not represented in the sequence databases. The above observation raises the exciting possibility that antisense regulation is associated with the regulation of alternative polyadenylation. It is expected that the presence of overlapping genes imposes constraints on their evolution since any mutation will be evaluated by natural selection according to its effect in both genes. Thus, in principle, overlapping genes should impose a negative effect on the fitness of a subject. Alternative polyadenylation has the potential to relax such negative selection since the overlapping is dependent on a post-transcriptional modification.</p>
            <p>If alternative polyadenylation is a significant factor in defining S-AS pairs, we would expect a lower rate of alternative polyadenylation in chromosome X, which has the smallest density of S-AS pairs. Indeed, only 20% of all messages from the X chromosome show at least two polyadenylation variants, compared to 27.5%, on average, for the autosomes (chi-square = 34.91, df = 1, <it>p </it>&lt; 0.0001).</p>
         </sec>
         <sec>
            <st>
               <p>A fraction of S-AS pairs is generated through internal priming and retroposition events</p>
            </st>
            <p>During the validation of new NATs identified using the MPSS data, we noticed that a significant fraction of GLGI amplicons (19 out of 46 validated fragments) had their 3' ends aligning to stretches of As in the human genome. This motivated us to search for similar cases in the set of cDNA-based S-AS pairs identified in this study. We found that 18% and 26% of all S-AS pairs have at least one gene with its 3' end aligning with a stretch of A's in the human and mouse genomes, respectively. This number is certainly inflated by ESTs since it decreases to 11.7% for human and 12.6% for mouse when only mRNA/mRNA S-AS pairs are considered. Two possibilities could account for this observation. First, a fraction of all antisense transcripts would be artifacts due to genomic priming with contaminant genomic DNA during cDNA library construction. An alternative is the possibility that antisense genes were constructed during evolution by retroposition events. Both possibilities are in agreement with the observation that antisense genes are depleted of introns.</p>
            <p>An experimental strategy was developed to evaluate the likelihood of genomic priming as a factor generating artifactual antisense cDNAs. A total of 11 mRNA candidates derived from cDNA libraries from fetal liver, colon and lung with a high proportion of sequences that had their 3' ends aligning to stretches of As in the human genome were selected for experimental validation by RT-PCR. cDNA samples used in these experiments were reverse transcribed from fetal liver, colon and lung total RNA treated or not with DNAse. As can be seen in Figure <figr fid="F2">2</figr>, specific amplifications could not be achieved for 7 (63.6%) out of the 11 selected candidates when cDNA samples used as templates for PCR amplification were prepared from DNA-free RNA. On the other hand, when untreated RNA was used for cDNA synthesis, all candidates could be amplified, suggesting that a significant proportion of these internal priming sequences were indeed generated from contaminant genomic DNA.</p>
            <fig id="F2">
               <title>
                  <p>Figure 2</p>
               </title>
               <caption>
                  <p>RT-PCR analysis for the internal priming (IP) candidates in fetal liver, colon and lung total RNA</p>
               </caption>
               <text>
                  <p>RT-PCR analysis for the internal priming (IP) candidates in fetal liver, colon and lung total RNA. RT-PCR was conducted in DNA-free RNA previously treated with <it>DNAse </it>(lanes 1 and 2) and in untreated RNA, which was, therefore, contaminated with genomic DNA (gDNA; lanes 3 and 4) for each candidate in the corresponding tissue. As a control, RT-PCR was conducted in the presence (lanes 1 and 3) and absence (lanes 2 and 4) of reverse transcriptase. gDNA was used as a positive control of the PCR reaction (lane 5) and no template as a negative control (lane 6). For fetal liver, in 3 IP candidates (5, 8 and 11) the PCR products (152 bp, 153 bp and 160 bp, respectively) were observed in the treated RNA when RT was added (lane 1) or in untreated RNA independent of the RT (lanes 3 and 4). For colon, in 1 IP candidate (9) the PCR product (158 bp) was observed in the treated RNA when RT was added (lane 1) or in untreated RNA independent of the RT (lanes 3 and 4). For the remaining IP candidates (1, 2, 4, 6, 7, 10 and 12), the PCR products (214 bp, 229 bp, 207 bp, 156 bp, 227 bp, 205 bp and 234 bp, respectively) were observed only in untreated RNA independent of the RT (lanes 3 and 4). The PCR products were analyzed on 8% polyacrylamide gels with silver staining. A 100 bp ladder (M) was used as molecular weight marker. In each gel the lower fragment in lane M correspond to 100 bp.</p>
               </text>
               <graphic file="gb-2007-8-3-r40-2"/>
            </fig>
            <p>Some other features support the artifactual origin of these antisense transcripts. First, cDNAs containing a stretch of As at their 3' genomic end have much less polyadenylation signals than genes in general (17% compared to 85%). Furthermore, these genes have a much narrower and rarer expression pattern when analyzed by SAGE and MPSS than genes in general (data not shown). These observations suggest that a significant fraction of all antisense genes are actually artifacts, due to genomic priming during library construction.</p>
            <p>Retroposition generates intronless copies of existing genes through reverse transcription of mature mRNAs followed by integration of the resulting cDNA into the genome (for a review, see Long <it>et al</it>. <abbrgrp><abbr bid="B41">41</abbr></abbrgrp>). Eventually, the cDNA copy can be involved in homologous recombination with the original source gene as has been suggested for yeast <abbrgrp><abbr bid="B42">42</abbr></abbrgrp>. Retroposition was thought to generate non-functional copies of functional genes. However, several groups have shown that retroposition has generated a significant amount of new functional genes in several species <abbrgrp><abbr bid="B43">43</abbr><abbr bid="B44">44</abbr><abbr bid="B45">45</abbr></abbrgrp>. Recently, Marques <it>et al</it>. <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> found almost 4,000 retrocopies of functional genes in the human genome. More recently, the same group reported that more than 1,000 of these retrocopies are transcribed, of which at least 120 have evolved as <it>bona fide </it>genes <abbrgrp><abbr bid="B46">46</abbr></abbrgrp>.</p>
            <p>Retrocopies usually have a poly-A tail at their 3' end because of the insertion of this post-transcriptional modification together with the remaining cDNA. Thus, retroposition can explain the high incidence of antisense transcripts with a poly-A tail at their 3' end. To evaluate the contribution of retrocopies to the formation of S-AS pairs we compared the loci identified by Marques <it>et al</it>. <abbrgrp><abbr bid="B43">43</abbr></abbrgrp> as retrocopies with the list of S-AS pairs identified in this study. Out of 413 retrocopies represented in the cDNA databases, 138 were involved in S-AS pairs (70 mRNA/mRNA and 68 mRNA/EST pairs). For the 70 mRNA/mRNA pairs, 78% were classified as embedded. This is in agreement with our previous observation that embedded pairs are enriched with intronless genes. Thus, retroposition seems to significantly contribute to the origin of embedded S-AS pairs.</p>
         </sec>
         <sec>
            <st>
               <p>Expression patterns within S-AS pairs</p>
            </st>
            <p>A critical issue to effectively evaluate the role of antisense transcripts in regulating distinct cellular phenomena is related to the expression pattern of both sense and antisense transcripts belonging to the same S-AS pair. Several reports have been published based on large-scale gene-expression analyses <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B19">19</abbr><abbr bid="B23">23</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp>. Similar to Wang <it>et al</it>. <abbrgrp><abbr bid="B48">48</abbr></abbrgrp>, we here used MPSS libraries available for human to explore this issue. Tag to gene assignment was performed as previously described <abbrgrp><abbr bid="B31">31</abbr><abbr bid="B49">49</abbr></abbrgrp>. To ensure the MPSS sequences were unambiguously matched to the assigned transcript, we removed tags mapped to more than one locus. Frequencies for all tags assigned to genes in an S-AS pair were collected from all MPSS libraries.</p>
            <p>Figure <figr fid="F3">3</figr> shows the expression pattern of S-AS pairs for all MPSS libraries for human. We divided the dataset into the following categories as before: 3'3', 5'5' or embedded. Several features are evident. The rate of co-expression in our dataset was 35.1% compared to 44.9% observed by Chen <it>et al</it>. <abbrgrp><abbr bid="B4">4</abbr></abbrgrp>. The differences are probably due to experiment design in both reports (for example, differences in the dataset and in the way the rate was calculated). Second, the rate of co-expression is significantly higher for 3'3' pairs when compared to the frequency of the embedded pairs (50.3%, chi-square = 134, df = 1, <it>p </it>= 5.4 &#215; 10<sup>-31</sup>). This supports a previous conclusion from Sun <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> that 3'3' S-AS pairs are significantly more co-expressed than other pairs and, therefore, are more prone to be involved in antisense regulation. It is important to mention that 5'5' pairs are also enriched in co-expressed pairs when compared to embedded pairs (chi-square = 23.5, df = 1, <it>p </it>= 1.2 &#215; 10<sup>-6</sup>). We observed no statistical difference among the three categories regarding differential expression of both genes in a pair.</p>
            <fig id="F3">
               <title>
                  <p>Figure 3</p>
               </title>
               <caption>
                  <p>Expression pattern (in a set of 31 tissues covered by MPSS) of genes belonging to all three types of S-AS pairs (3'3', 5'5'and embedded)</p>
               </caption>
               <text>
                  <p>Expression pattern (in a set of 31 tissues covered by MPSS) of genes belonging to all three types of S-AS pairs (3'3', 5'5'and embedded). <b>(a) </b>Categories are as follows: 'no expression', for S-AS pairs whose expression was not detected (see Materials and methods for details); 'single-gene expression', for S-AS pairs in which expression is observed for only one gene in the pair; 'co-expression', for pairs in which expression is seen for both genes in the pair. <b>(b) </b>Rate of differential expression for the set of co-expressed S-AS pairs. Ratio of sense/antisense genes in the pair is shown on the x-axis.</p>
               </text>
               <graphic file="gb-2007-8-3-r40-3"/>
            </fig>
         </sec>
         <sec>
            <st>
               <p>Influence of antisense transcripts in the splicing of sense transcripts</p>
            </st>
            <p>It is quite clear nowadays that a significant fraction of all human genes undergo regulated alternative splicing, producing more than one mature mRNA from a gene (Galante <it>et al</it>. <abbrgrp><abbr bid="B27">27</abbr></abbrgrp> and references therein). Although several regulatory elements in <it>cis </it>and <it>trans </it>have been identified (for a review see Pagani and Baralle <abbrgrp><abbr bid="B50">50</abbr></abbrgrp>), it is reasonable to say that we are far from a complete understanding of how constitutive and alternative splicing are regulated. One possible regulatory mechanism involves antisense sequences. Since the late 1980s, it is known that antisense RNA can inhibit splicing of a pre-mRNA <it>in vitro </it><abbrgrp><abbr bid="B15">15</abbr></abbrgrp>. A few years later, Munroe and Lazar <abbrgrp><abbr bid="B51">51</abbr></abbrgrp> observed that NATs could inhibit the splicing of a message derived from the other DNA strand, more specifically the <it>ErbA&#945; </it>gene. More recently, Yan <it>et al</it>. <abbrgrp><abbr bid="B52">52</abbr></abbrgrp> characterized a new human gene, called <it>SAF</it>, which is transcribed from the opposite strand of the <it>FAS </it>gene. Over-expression of <it>SAF </it>altered the splicing pattern of <it>FAS </it>in a regulated way, suggesting that <it>SAF </it>controls the splicing of <it>FAS</it>. With the growing amount of genomic loci presenting both sense and antisense transcripts, a general role for S-AS pairing in splicing regulation has been proposed <abbrgrp><abbr bid="B47">47</abbr></abbrgrp>. However, no systematic large-scale analysis has been reported so far investigating this issue for mammals. We made use of the human dataset described in this report to tackle this problem.</p>
            <p>We first tested whether the rate of alternative splicing in the sense gene would be affected by the existence of an antisense transcript. It is expected that the effect of S-AS pairing on splicing would be restricted to those exon-intron borders located in the region involved in pairing. We therefore restricted the analysis to those exon-intron borders spanning the region involved in an S-AS pairing. Our strategy was to compare the number of splicing variants for those borders against all other exon-intron borders (those without an antisense transcript) in the same genes. To make the analysis more informative we split the borders into four categories (terminal donor, internal donor, internal acceptor and terminal acceptor). For both internal donor and acceptor sites, the presence of an antisense transcript slightly increased the rate of alternative splicing (Table <tblr tid="T5">5</tblr>; 4% and 3% increases, respectively). For the terminal sites, the presence of a NAT had the opposite effect (5% and 6% decrease for donor and acceptor, respectively). Table <tblr tid="T5">5</tblr> also shows that these differences are predominantly due to intron retention. On the other hand, NATs located within the introns and exons (but not spanning the border) have no major effect on the splicing of the respective borders. The observed differences between borders with or without NATs is statistically significant (chi-square = 31.2, df = 1, <it>p </it>= 2.3 &#215; 10<sup>-8 </sup>for donor sites; and chi-square = 23, df = 1, <it>p </it>= 1.6 &#215; 10<sup>-6 </sup>for acceptor sites).</p>
            <tbl id="T5">
               <title>
                  <p>Table 5</p>
               </title>
               <caption>
                  <p>Frequency of different types of alternative splicing in exon-intron borders with or without an antisense transcript</p>
               </caption>
               <tblbdy cols="6">
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c ca="center">
                        <p>Total</p>
                     </c>
                     <c ca="center">
                        <p>Alternative borders</p>
                     </c>
                     <c ca="center">
                        <p>Intron retention</p>
                     </c>
                     <c ca="center">
                        <p>Exon skipping</p>
                     </c>
                     <c ca="center">
                        <p>Alternative 3'/5' site</p>
                     </c>
                  </r>
                  <r>
                     <c cspan="6">
                        <hr/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Borders with antisense</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Terminal donor</p>
                     </c>
                     <c ca="center">
                        <p>2,578</p>
                     </c>
                     <c ca="center">
                        <p>553</p>
                     </c>
                     <c ca="center">
                        <p>130</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>416</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Internal donor</p>
                     </c>
                     <c ca="center">
                        <p>7,632</p>
                     </c>
                     <c ca="center">
                        <p>3,100</p>
                     </c>
                     <c ca="center">
                        <p>535</p>
                     </c>
                     <c ca="center">
                        <p>1,616</p>
                     </c>
                     <c ca="center">
                        <p>949</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Terminal acceptor</p>
                     </c>
                     <c ca="center">
                        <p>7,749</p>
                     </c>
                     <c ca="center">
                        <p>3,145</p>
                     </c>
                     <c ca="center">
                        <p>493</p>
                     </c>
                     <c ca="center">
                        <p>1,642</p>
                     </c>
                     <c ca="center">
                        <p>1,010</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Internal acceptor</p>
                     </c>
                     <c ca="center">
                        <p>2,763</p>
                     </c>
                     <c ca="center">
                        <p>688</p>
                     </c>
                     <c ca="center">
                        <p>208</p>
                     </c>
                     <c ca="center">
                        <p>7</p>
                     </c>
                     <c ca="center">
                        <p>473</p>
                     </c>
                  </r>
                  <r>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c ca="left">
                        <p>
                           <b>Borders without antisense</b>
                        </p>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                     <c>
                        <p/>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Terminal donor</p>
                     </c>
                     <c ca="center">
                        <p>2,200</p>
                     </c>
                     <c ca="center">
                        <p>579</p>
                     </c>
                     <c ca="center">
                        <p>101</p>
                     </c>
                     <c ca="center">
                        <p>32</p>
                     </c>
                     <c ca="center">
                        <p>446</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Internal donor</p>
                     </c>
                     <c ca="center">
                        <p>23,414</p>
                     </c>
                     <c ca="center">
                        <p>8,674</p>
                     </c>
                     <c ca="center">
                        <p>1,080</p>
                     </c>
                     <c ca="center">
                        <p>4,997</p>
                     </c>
                     <c ca="center">
                        <p>2,597</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Terminal acceptor</p>
                     </c>
                     <c ca="center">
                        <p>23,447</p>
                     </c>
                     <c ca="center">
                        <p>8,787</p>
                     </c>
                     <c ca="center">
                        <p>1,022</p>
                     </c>
                     <c ca="center">
                        <p>5,007</p>
                     </c>
                     <c ca="center">
                        <p>2,758</p>
                     </c>
                  </r>
                  <r>
                     <c indent="1" ca="left">
                        <p>Internal acceptor</p>
                     </c>
                     <c ca="center">
                        <p>1,732</p>
                     </c>
                     <c ca="center">
                        <p>545</p>
                     </c>
                     <c ca="center">
                        <p>154</p>
                     </c>
                     <c ca="center">
                        <p>16</p>
                     </c>
                     <c ca="center">
                        <p>375</p>
                     </c>
                  </r>
               </tblbdy>
            </tbl>
            <p>Recently, Wiemann <it>et al</it>. <abbrgrp><abbr bid="B53">53</abbr></abbrgrp> reported a new variant of IL4L1 that contains the first two exons of an upstream gene, NUP62. This chimeric transcript was expressed in a tissue and cell-specific manner. The authors speculated that cell type specific alternative splicing was involved in the generation of this chimeric transcript. We speculate that NATs could be involved in the generation of this type of chimeric cDNA. The same antisense message pairing with both sense messages would form a double-stranded RNA that could induce the spliceosome to skip the paired region and join the two sense messages, a process very similar to the one proposed for <it>trans</it>-splicing in mammals <abbrgrp><abbr bid="B54">54</abbr></abbrgrp>. Interestingly, we found five examples in our dataset of S-AS pairs in which the genomic organization of both sense and antisense genes suggest a process like this. Additional data file 9 illustrates one of these cases. It can be seen that two transcripts represented by cDNAs AK095876 and AK000438 join messages from genes <it>SERF2 </it>and <it>HYPK</it>. The antisense transcript is represented by cDNA AK097682. Additional data file 10 lists all other putative cases of chimeric transcripts. The fact that both sense genes share a common antisense transcript raises the possibility that antisense transcripts can mediate <it>trans</it>-splicing of the sense genes, thereby generating the chimeric transcript.</p>
         </sec>
         <sec>
            <st>
               <p>On the evolution of S-AS pairs: functional implications</p>
            </st>
            <p>It is reasonable to assume that a fraction of all S-AS pairs reached this genome organization solely by chance. However, evidence presented here and elsewhere suggest that this fraction is probably small <abbrgrp><abbr bid="B6">6</abbr><abbr bid="B55">55</abbr><abbr bid="B56">56</abbr></abbrgrp>. For example, Dahary <it>et al</it>. <abbrgrp><abbr bid="B6">6</abbr></abbrgrp> concluded that antisense transcription had a significant effect on vertebrate genome evolution since the genomic organization of S-AS pairs is much more conserved than the organization of genes in general. However, how did this organization come to be? In principle, S-AS genomic organization should carry a negative effect on the overall fitness of a subject. For each gene in an S-AS pair, its evolution is constrained not only by features of its own sequence but also by functional features encoded by the other gene in the pair. The fact that we observed a significant amount of S-AS pairs in mammalian genomes suggests that there are advantages inherent to this organization to counter-balance the negative effects. The proposed role of NATs in gene regulation is certainly advantageous. We propose here two evolutionary scenarios, not mutually exclusive, that would speed up the generation of S-AS pairs. In one scenario, alternative polyadenylation has a fundamental role. Sun <it>et al</it>. <abbrgrp><abbr bid="B5">5</abbr></abbrgrp> observed a preferential targeting of 3' UTRs for NATs. Our observation that 51% of 3'3' S-AS pairs overlap because of polyadenylation variants suggests that selection has favored cases where overlapping occurs only in a time and spatially regulated manner.</p>
            <p>In a second scenario, retroposition generates NATs, which lack introns and may even show a polyadenylation tail integrated into the genome. We observe here that retroposition contributed significantly to the origin of S-AS pairs, especially those classified as embedded. What would be the selective advantages of retrocopies as NATs? Chen <it>et al</it>. <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> observed that antisense genes have shorter introns when compared to genes in general. They speculated that this feature was advantageous during evolution since NATs need to be "rapid responsers" to execute their regulatory activities. Although transcription is a slow process in eukaryotes, another bottleneck in the expression of a gene is splicing. Furthermore, Nott <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> observed that the presence of introns in a gene affects gene expression by enhancing mRNA accumulation. Thus, the argument from Chen <it>et al</it>. <abbrgrp><abbr bid="B56">56</abbr></abbrgrp> gets stronger with the data reported here and by Nott <it>et al</it>. <abbrgrp><abbr bid="B57">57</abbr></abbrgrp> since intronless antisense genes would be transcribed even faster; their transcripts would simply skip splicing and the half-life of the respective messages would be shorter. All key features for genes involved in regulatory activities.</p>
            <p>An important issue is the conservation of S-AS pairs between human and mouse. Although we found more than a thousand conserved pairs, this number is still small compared to the whole set of S-AS pairs in both species. Several factors, however, suggest that the number reported here is an underestimate. First, as discussed by Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>, sequence conservation might not be of primary importance for antisense regulation. Furthermore, it is likely that many truly conserved pairs were not detected because transcript sequences have not been discovered yet. This is more critical in the face of our findings that a significant proportion of 3'3' S-AS pairs depend on alternative polyadenylation for an overlap. It is also quite likely that some S-AS pairs are lineage-specific. For instance, our finding that retroposition contributes to the origin of many S-AS pairs could explain the appearance of lineage-specific S-AS pairs, assuming that the retroposition event occurred after the divergence between human and mouse.</p>
            <p>These two evolutionary scenarios (alternative polyadenylation and retroposition) might produce S-AS pairs with different functional implications. The expression and evolutionary conservation analyses presented here, together with evidence from others <abbrgrp><abbr bid="B5">5</abbr><abbr bid="B19">19</abbr><abbr bid="B23">23</abbr><abbr bid="B47">47</abbr><abbr bid="B48">48</abbr></abbrgrp> suggest that 3'3' overlap achieved by polyadenylation variants was used throughout evolution to regulate gene expression. Those pairs generated through retroposition may be involved in some other types of regulation, such as alternative splicing.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Conclusion</p>
         </st>
         <p>This is the deepest survey so far of S-AS pairs in the human and mouse genomes. We made use of all cDNAs available in the public domain together with 122 MPSS libraries for human and mouse. The major findings of the present report include: as many as 10,077 and 8,091 S-AS pairs were identified for human and mouse respectively; using MPSS data, we found 4,308 and 216 new putative S-AS loci in human and mouse, respectively; a small fraction of all S-AS pairs are artifacts caused by genomic priming during cDNA library construction; a significant amount of S-AS pairs is due to retroposition events of one of the genes in the pair; quantitative analyses suggest that the presence of an antisense gene, complementary to an exon-intron border of the sense gene, increases the rate of retention of the respective intron. Furthermore, we propose an evolutionary model in which alternative polyadenylation and retroposition are important forces in the generation of S-AS pairs.</p>
         <p>Taken together, these results offer, up to now, the vastest catalog of S-AS pairs in human and mouse genomes.</p>
      </sec>
      <sec>
         <st>
            <p>Materials and methods</p>
         </st>
         <sec>
            <st>
               <p>Mapping cDNAs and MPSS tags onto the human and mouse genomes</p>
            </st>
            <p>We used a modified protocol similar to the one described previously to identify transcription clusters in the human and mouse genomes <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>. Briefly, genome sequence (NCBI build no. 35 for human and NCBI build no. 33 for mouse), EST collections (5,992,459 sequences for human and 4,246,824 sequences for mouse) and mRNA sequences (186,358 for human and 120,058 for mouse) were downloaded from UCSC <abbrgrp><abbr bid="B58">58</abbr></abbrgrp>. All cDNAs were mapped to the respective genome sequence using BLAT (default parameters) <abbrgrp><abbr bid="B59">59</abbr></abbrgrp>. The best hit for each cDNA in the genome was identified, followed by a pairwise alignment using Sim4 <abbrgrp><abbr bid="B60">60</abbr></abbrgrp>. Only transcripts presenting identity &#8805;94%, coverage &#8805;50% and all splice sites in the same orientations were used.</p>
            <p>Correct orientation of ESTs was determined by the presence of a poly-A tail (a stretch of 8 As at the 3' end) and/or a splicing donor (GT) and acceptor (AG) sites. All mRNAs were considered in the 'sense' orientation (oriented from 5' end to 3' end). All cDNAs mapped and reliably orientated were assembled into clusters. One cluster contains cDNAs presenting the same orientation and sharing at least one exon-intron boundary or a minimum of 30 nucleotides of overlap (only for those sequences without a common exon/intron organization).</p>
            <p>For the mapping of MPSS data, we first extracted 'virtual' tags for both human and mouse genomes by simply finding all <it>Dpn</it>II sites and extracting a 13 (human) or 16 (mouse) nucleotide long sequence immediately downstream of the restriction site in both orientations. These 'virtual' tags present only once in the respective genomes were further used and matched against the 'real' tags found in 41 and 81 MPSS libraries for human and mouse, respectively. Only MPSS tags classified as 'reliable' (present in more than one sequencing run) and 'significant' (tags per million >3) were considered as trusted signatures.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of S-AS pairs</p>
            </st>
            <p>S-AS pairs were identified as those cases in which two clusters, in opposite orientations, overlap at the genome level. For the correct orientation of all mapped cDNAs, we took into consideration several parameters, including: sequence annotation as available in the respective GenBank entry; splice junctions; and poly-A tails and poly-T heads. We excluded from our analyses all cDNAs that presented conflicting orientations as defined by the three criteria above. If only two clusters overlap in the opposite orientation, they were classified as a single bidirectional S-AS pair. If a given cluster overlaps with more than one antisense cluster, they were classified as multiple bidirectional S-AS pairs. S-AS pairs were also classified according to their genomic pattern. Parameters evaluated included: pattern of S-AS overlap (exonic, intronic and exonic/intronic); spanning of introns by the components of a pair as defined by their alignment onto the genome; and chromosome localization and relative orientation within the S-AS pairs (tail-tail, head-head and embedded).</p>
         </sec>
         <sec>
            <st>
               <p>Conservation between human and mouse S-AS pairs</p>
            </st>
            <p>We used three strategies to evaluate the degree of conservation between human and mouse S-AS pairs. First, all pairs were searched against the dataset from HomoloGene <abbrgrp><abbr bid="B35">35</abbr></abbrgrp> and those pairs conserved in both species were counted. In our second strategy, we selected those S-AS pairs in which at least one gene was conserved according to HomoloGene. We then used Needle, an alignment algorithm <abbrgrp><abbr bid="B61">61</abbr></abbrgrp>, to test sequence conservation between the respective antisense genes. We classified as conserved those global alignments with identity >30%. Finally, we also used the strategy from Engstrom <it>et al</it>. <abbrgrp><abbr bid="B25">25</abbr></abbrgrp>. We used the net alignment between human and mouse genomes (retrieved from the UCSC Genome Browser database) to define the corresponding (synthenic) regions. We considered a human S-AS pair to be conserved in mouse if it had an exon region aligning (>20 bp) to an exon region from a mouse pair.</p>
         </sec>
         <sec>
            <st>
               <p>Investigation of the expression pattern of S-AS transcripts</p>
            </st>
            <p>We evaluated the expression pattern of S-AS pairs at the whole genome level based on their expression profiles obtained from MPSS libraries (available at <abbrgrp><abbr bid="B36">36</abbr></abbrgrp>). The procedure was previously described by us for SAGE and MPSS <abbrgrp><abbr bid="B27">27</abbr><abbr bid="B31">31</abbr><abbr bid="B49">49</abbr></abbrgrp>. The tag to gene assignment was done by scanning and extracting virtual tags (13 nucleotide-long sequences present downstream to the 3'-most <it>Dpn</it>II restriction sites of each mRNA sequence). To accurately represent the 3' end of a transcript, only mRNA sequences containing a poly-A tail were used. All tags mapped to two or more different genes were excluded and the frequencies of different tags for the same gene (mainly alternative polyadenylation variants) were summed. MPSS tags were normalized to counts-per-million and the expression data were cross-linked to genomic positions by the extraction of virtual tags for both the human and mouse genomes. Only tags showing 100% identity with a genomic locus were used in the analyses.</p>
            <p>The classification of the expression pattern of S-AS pairs was done using those tags with &#8805;3 tags per million across all MPSS libraries. To evaluate the co-expression of all S-AS pairs, both genes in a pair had to be co-expressed in at least 04 libraries. If both genes in a pair were co-expressed in less than four libraries or they were independently expressed in different libraries, the pair was classified as 'single-gene expression'. The remaining S-AS pairs were classified as 'no-expression'.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of antisense MPSS tags</p>
            </st>
            <p>All <it>Dpn</it>II sites in the human and mouse genomes were identified and for each site two 'virtual' MPSS tags were extracted from both DNA strands in the correct orientation. All 'virtual' MPSS tags mapped in the opposite strand of known mRNAs in both genomes were identified. Those mRNAs belonging to an S-AS pair previously identified were excluded. Those antisense MPSS tags mapped just once in the respective genome and present in at least one MPSS library were identified and submitted to experimental validation.</p>
         </sec>
         <sec>
            <st>
               <p>Simulations on the genomic organization of S-AS pairs</p>
            </st>
            <p>A random distribution of S-AS pairs was obtained by re-indexing the coordinates of one gene in all the pairs 1,000 times. This was done by randomly selecting a genomic coordinate for the start of mapping of a given gene. All the remaining exon-intron borders were then re-indexed based on this initial coordinate. The relative organization of both genes in all random S-AS pairs was stored and frequencies for each category were calculated. Those frequencies were used as the expectation for chi-square tests of the null hypothesis.</p>
         </sec>
         <sec>
            <st>
               <p>Identification of splicing variants</p>
            </st>
            <p>Using the database mentioned earlier and described elsewhere <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp> we identified all exon-intron borders complementary to a NAT. We then compared the rate of alternative splicing in these borders against the borders from the same genes without a NAT. We established a set of stringent criteria to identify alternative borders. These criteria are detailed elsewhere <abbrgrp><abbr bid="B26">26</abbr><abbr bid="B27">27</abbr><abbr bid="B28">28</abbr></abbrgrp>.</p>
         </sec>
         <sec>
            <st>
               <p>Experimental validation of MPSS antisense tags</p>
            </st>
            <p>MPSS tags corresponding to antisense transcripts were converted into their corresponding 3' cDNA fragments using GLGI-MPSS <abbrgrp><abbr bid="B37">37</abbr></abbrgrp>. Antisense tags were selected from a MPSS library derived from the normal breast luminal epithelial cell line HB4a and the same RNA source was used for GLGI amplification. For the GLGI-MPSS amplification, we used a sense primer including 17 bases of the MPSS tag sequence and 6 additional bases (CAGGGA), giving a total of 23 bases for each primer (5'-CAGGGAGATCXXXXXXXXXXXXX-3'). We also used an antisense primer (ACTATCTAGAGCGGCCGCTT) present in the 3' end of all cDNA molecules that was incorporated from reverse transcription primers in cDNA synthesis. The reaction mixture was prepared in a final volume of 30 &#956;l, including 1&#215; PCR buffer, 2.0 mM MgCl<sub>2</sub>, 83 &#956;M dNTPs, 2.3 ng/&#956;l antisense primer, 2.3 ng/&#956;l sense primer, 1.5 U of Taq Platinum DNA polymerase (Invitrogen, San Diego, CA, USA) and 0.5-0.8 &#956;l of the same cDNA source used for MPSS library construction. PCR conditions used for amplification were 94&#176;C for 2 minutes, followed by 30 cycles at 94&#176;C for 30 s, 64&#176;C for 30 s, and 72&#176;C for 35 s. Reactions were kept at 72&#176;C for 5 minutes after the last cycle. The amplified products were ethanol precipitated and cloned into the pGEM<sup>&#174;</sup>-T Easy vector (Promega, Madison, WI, USA). Twelve colonies for each GLGI-MPSS fragment were screened by PCR using pGEM universal primers and positive colonies were sequenced using Big-Dye Terminator (Applied Biosystems, Foster City, CA, USA) and an ABI3100 sequencer (Applied Biosystems).</p>
         </sec>
         <sec>
            <st>
               <p>Experimental validation of genomic primed sequences</p>
            </st>
            <p>Total RNA derived from fetal liver, colon and lung was purchased from Clontech laboratories (Palo Alto, CA, USA). For cDNA synthesis, 2 &#956;g of total RNA were treated (or not) with 100 units of <it>DNAse </it>I (FPLC-pure, Amersham, Piscataway, NJ, USA) and were reverse transcribed using oligo(dT)12-18, random primers and <it>SuperScript </it>II (Invitrogen), following the manufacturers' instructions. After synthesis, the resulting cDNA was subjected to <it>RNase </it>H treatment. The absence of genomic DNA contamination was evaluated for each preparation. DNA-free total RNA was subjected to PCR amplification using primers within intronic sequences flanking exon 12 of the hMLH-1 gene (forward, 5' TGGTGTCTCTAGTTCTGG3'; reverse 5' CATTGTTGTAGTAGCTCTGC 3'). All PCR amplifications were carried out using 2 &#956;l of cDNA as a template to the final volume of 25 &#956;l and 1&#215; buffer, 1.5 mM MgCl<sub>2</sub>, 0.2 mM dNTP, 0.2 &#956;M of each specific primer and 0.025 U/&#956;l of Taq DNA polymerase (Life Technologies, San Diego, CA, USA). The following cycling protocol was used: initial denaturation of 94&#176;C for 4 minutes; 94&#176;C for 30 s; 55&#176;C for 45 s; 72&#176;C for 1 minute for 35 cycles; along with a final extension at 72&#176;C for 7 minutes. All PCR products were resolved on 8% polyacrylamide gels and sequenced as described above to verify amplification specificity.</p>
         </sec>
         <sec>
            <st>
               <p>Strand-specific RT-PCR</p>
            </st>
            <p>In the strand-specific RT-PCR, orientation of the transcript is accessed by restricting which gene-specific primer is present during first-strand cDNA synthesis. For each candidate, 1 &#956;g of total RNA was treated with Promega RQ1 RNAse-free DNAse and tested for remaining DNA contamination as described above. First-strand cDNA synthesis was carried out at 50&#176;C for 2 h using 200 U of <it>SuperScript </it>II (Invitrogen) and 0.9 &#956;M of a primer complementary to the antisense transcript. PCR amplifications were performed using 1 &#956;l of the first-strand cDNA as a template in a final volume of 25 &#956;l and 1&#215; buffer, 1.5 mM MgCl<sub>2</sub>, 0.1 mM dNTP, 0.4 &#956;M of gene specific primers and 1 U of Platinum Taq DNA polymerase (Invitrogen). The following cycling conditions were used for amplification: initial denaturation of 95&#176;C for 2 minutes; 94&#176;C for 40 s; reaction-specific annealing temperature for 40 s and 72&#176;C for 1 minute for 35 cycles; followed by a final extension step at 72&#176;C for 7 minutes. All PCR products were resolved on 8% polyacrylamide gels. Controls for the absence of self-priming during cDNA synthesis were done with reverse transcriptase in the absence of primers, and controls for the absence of DNA were done by incubation with primers but with no reverse transcriptase.</p>
         </sec>
         <sec>
            <st>
               <p>Availability</p>
            </st>
            <p>To make our dataset fully accessible to the community we have set up a worldwide web portal <abbrgrp><abbr bid="B62">62</abbr></abbrgrp> containing all raw data generated in this study and a series of tools to explore the data.</p>
         </sec>
      </sec>
      <sec>
         <st>
            <p>Additional data files</p>
         </st>
         <p>The following additional data are available with the online version of this paper. Additional data file <supplr sid="S1">1</supplr> is a list of representative GenBank entries for all S-AS pairs in both human and mouse. Additional data file <supplr sid="S2">2</supplr> is a table showing the total number of S-AS pairs by chromosome for both human and mouse. Additional data file <supplr sid="S3">3</supplr> shows the number of clusters and S-AS pairs when a less stringent clustering methodology is applied. Additional data file <supplr sid="S4">4</supplr> shows a schematic view of all possible genomic organizations of S-AS pairs. Additional data file <supplr sid="S5">5</supplr> lists all S-AS pairs conserved between human and mouse using the three strategies described in the text. Additional data file <supplr sid="S6">6</supplr> shows the fraction of S-AS pairs conserved between human and mouse that are classified as 'Fully intronic' and the fraction of conserved S-AS pairs that contain at least one intronless gene. Additional data file <supplr sid="S7">7</supplr> is a list of all MPSS libraries used in this study. Additional data file <supplr sid="S8">8</supplr> presents the number of cDNA-based pairs that were further confirmed by the MPSS data. Additional data file <supplr sid="S9">9</supplr> is a figure illustrating chimeric transcripts joining two adjacent genes (<it>SERF2 </it>and <it>HYPK</it>) with a NAT located between them. Additional file <supplr sid="S10">10</supplr> lists all cases of chimeric transcripts identified in our dataset.</p>
         <suppl id="S1">
            <title>
               <p>Additional data file 1</p>
            </title>
            <caption>
               <p>Representative GenBank entries for all S-AS pairs in both human and mouse</p>
            </caption>
            <text>
               <p>Representative GenBank entries for all S-AS pairs in both human and mouse.</p>
            </text>
            <file name="gb-2007-8-3-r40-S1.xls">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S2">
            <title>
               <p>Additional data file 2</p>
            </title>
            <caption>
               <p>Total number of S-AS pairs by chromosome for both human and mouse</p>
            </caption>
            <text>
               <p>Total number of S-AS pairs by chromosome for both human and mouse.</p>
            </text>
            <file name="gb-2007-8-3-r40-S2.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S3">
            <title>
               <p>Additional data file 3</p>
            </title>
            <caption>
               <p>Number of clusters and S-AS pairs when a less stringent clustering methodology is applied</p>
            </caption>
            <text>
               <p>Number of clusters and S-AS pairs when a less stringent clustering methodology is applied.</p>
            </text>
            <file name="gb-2007-8-3-r40-S3.rtf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S4">
            <title>
               <p>Additional data file 4</p>
            </title>
            <caption>
               <p>All possible genomic organizations of S-AS pairs</p>
            </caption>
            <text>
               <p>All possible genomic organizations of S-AS pairs.</p>
            </text>
            <file name="gb-2007-8-3-r40-S4.tiff">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S5">
            <title>
               <p>Additional data file 5</p>
            </title>
            <caption>
               <p>All S-AS pairs conserved between human and mouse using the three strategies described in the text</p>
            </caption>
            <text>
               <p>All S-AS pairs conserved between human and mouse using the three strategies described in the text.</p>
            </text>
            <file name="gb-2007-8-3-r40-S5.zip">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S6">
            <title>
               <p>Additional data file 6</p>
            </title>
            <caption>
               <p>The fraction of S-AS pairs conserved between human and mouse that are classified as 'Fully intronic' and the fraction of conserved S-AS pairs that contain at least one intronless gene</p>
            </caption>
            <text>
               <p>The fraction of S-AS pairs conserved between human and mouse that are classified as 'Fully intronic' and the fraction of conserved S-AS pairs that contain at least one intronless gene.</p>
            </text>
            <file name="gb-2007-8-3-r40-S6.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S7">
            <title>
               <p>Additional data file 7</p>
            </title>
            <caption>
               <p>All MPSS libraries used in this study</p>
            </caption>
            <text>
               <p>All MPSS libraries used in this study.</p>
            </text>
            <file name="gb-2007-8-3-r40-S7.pdf">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S8">
            <title>
               <p>Additional data file 8</p>
            </title>
            <caption>
               <p>The number of cDNA-based pairs that were further confirmed by the MPSS data</p>
            </caption>
            <text>
               <p>The number of cDNA-based pairs that were further confirmed by the MPSS data.</p>
            </text>
            <file name="gb-2007-8-3-r40-S8.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S9">
            <title>
               <p>Additional data file 9</p>
            </title>
            <caption>
               <p>Chimeric transcripts joining two adjacent genes (<it>SERF2 </it>and <it>HYPK</it>) with a NAT located between them</p>
            </caption>
            <text>
               <p>Chimeric transcripts joining two adjacent genes (<it>SERF2 </it>and <it>HYPK</it>) with a NAT located between them.</p>
            </text>
            <file name="gb-2007-8-3-r40-S9.tiff">
               <p>Click here for file</p>
            </file>
         </suppl>
         <suppl id="S10">
            <title>
               <p>Additional data file 10</p>
            </title>
            <caption>
               <p>All cases of chimeric transcripts identified in our dataset</p>
            </caption>
            <text>
               <p>All cases of chimeric transcripts identified in our dataset.</p>
            </text>
            <file name="gb-2007-8-3-r40-S10.doc">
               <p>Click here for file</p>
            </file>
         </suppl>
      </sec>
   </bdy>
   <bm>
      <ack>
         <sec>
            <st>
               <p>Acknowledgements</p>
            </st>
            <p>We would like to thank Artur Ramos de Souza for the design and maintenance of the web portal. We also thank Henrik Kaessmann for making available the data on human retrocopies. We are also indebt to Andrew Simpson for a critical review of the manuscript and to three anonymous reviewers for critical and constructive comments/suggestions.</p>
         </sec>
      </ack>
      <refgrp>
         <bibl id="B1">
            <title>
               <p>In search of antisense.</p>
            </title>
            <aug>
               <au>
                  <snm>Lavorgna</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Sanderson</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Casari</snm>
                  <fnm>G</fnm>
               </au>
            </aug>
            <source>Trends Biochem Sci</source>
            <pubdate>2004</pubdate>
            <volume>29</volume>
            <fpage>88</fpage>
            <lpage>94</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tibs.2003.12.002</pubid>
                  <pubid idtype="pmpid" link="fulltext">15102435</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B2">
            <title>
               <p>Antisense RNA: function and fate of duplex RNA in cells of higher eukaryotes.</p>
            </title>
            <aug>
               <au>
                  <snm>Kumar</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
            </aug>
            <source>Microbiol Mol Biol Rev</source>
            <pubdate>1998</pubdate>
            <volume>62</volume>
            <fpage>1415</fpage>
            <lpage>1434</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">98951</pubid>
                  <pubid idtype="pmpid" link="fulltext">9841677</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B3">
            <title>
               <p>Do natural antisense transcripts make sense in eukaryotes?</p>
            </title>
            <aug>
               <au>
                  <snm>Vanhee-Brossollet</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Vaquero</snm>
                  <fnm>C</fnm>
               </au>
            </aug>
            <source>Gene</source>
            <pubdate>1998</pubdate>
            <volume>211</volume>
            <fpage>1</fpage>
            <lpage>9</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0378-1119(98)00093-6</pubid>
                  <pubid idtype="pmpid" link="fulltext">9573333</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B4">
            <title>
               <p>Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>326</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.04.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15922830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B5">
            <title>
               <p>Evidence for a preferential targeting of 3'-UTRs by cis-encoded natural antisense transcripts.</p>
            </title>
            <aug>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2005</pubdate>
            <volume>33</volume>
            <fpage>5533</fpage>
            <lpage>5543</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1243798</pubid>
                  <pubid idtype="pmpid" link="fulltext">16204454</pubid>
                  <pubid idtype="doi">10.1093/nar/gki852</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B6">
            <title>
               <p>Naturally occurring antisense: transcriptional leakage or real overlap?</p>
            </title>
            <aug>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Elroy-Stein</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Sorek</snm>
                  <fnm>R</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>364</fpage>
            <lpage>368</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">551562</pubid>
                  <pubid idtype="pmpid" link="fulltext">15710751</pubid>
                  <pubid idtype="doi">10.1101/gr.3308405</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B7">
            <title>
               <p>Genome-wide <it>in silico </it>identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species.</p>
            </title>
            <aug>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>XS</fnm>
               </au>
               <au>
                  <snm>Liu</snm>
                  <fnm>QR</fnm>
               </au>
               <au>
                  <snm>Wei</snm>
                  <fnm>L</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2006</pubdate>
            <volume>34</volume>
            <fpage>3465</fpage>
            <lpage>3475</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1524920</pubid>
                  <pubid idtype="pmpid" link="fulltext">16849434</pubid>
                  <pubid idtype="doi">10.1093/nar/gkl473</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B8">
            <title>
               <p>Antisense transcript in the mammalian transcriptome.</p>
            </title>
            <aug>
               <au>
                  <snm>Katayama</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tomaru</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Kasukawa</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kaki</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Nakanishi</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nakamura</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Nishida</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yap</snm>
                  <fnm>CC</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kawai</snm>
                  <fnm>J</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>309</volume>
            <fpage>1564</fpage>
            <lpage>1566</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1112009</pubid>
                  <pubid idtype="pmpid" link="fulltext">16141073</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B9">
            <title>
               <p>Expression of alternatively spliced FGF-2 antisense RNA transcript in the central nervous system: regulation of FGF-2 mRNA translation.</p>
            </title>
            <aug>
               <au>
                  <snm>Li</snm>
                  <fnm>AW</fnm>
               </au>
               <au>
                  <snm>Murphy</snm>
                  <fnm>PR</fnm>
               </au>
            </aug>
            <source>Mol Cell Endocrinol</source>
            <pubdate>2000</pubdate>
            <volume>162</volume>
            <fpage>69</fpage>
            <lpage>78</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0303-7207(00)00209-4</pubid>
                  <pubid idtype="pmpid" link="fulltext">10854699</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B10">
            <title>
               <p>Post-transcriptional regulation of thyroid hormone receptor expression by cis-acting sequences and a naturally-occurring antisense RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Hastings</snm>
                  <fnm>ML</fnm>
               </au>
               <au>
                  <snm>Ingle</snm>
                  <fnm>HA</fnm>
               </au>
               <au>
                  <snm>Lasar</snm>
                  <fnm>MA</fnm>
               </au>
               <au>
                  <snm>Munroe</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2000</pubdate>
            <volume>275</volume>
            <fpage>11507</fpage>
            <lpage>11513</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.275.15.11507</pubid>
                  <pubid idtype="pmpid" link="fulltext">10753970</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B11">
            <title>
               <p>Antisense-RNA regulation and RNA interference.</p>
            </title>
            <aug>
               <au>
                  <snm>Brantl</snm>
                  <fnm>S</fnm>
               </au>
            </aug>
            <source>Biochim Biophys Acta</source>
            <pubdate>2002</pubdate>
            <volume>1575</volume>
            <fpage>15</fpage>
            <lpage>25</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">12020814</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B12">
            <title>
               <p>Antisense RNA in imprinting: spreading silence through Air.</p>
            </title>
            <aug>
               <au>
                  <snm>Rougeulle</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Heard</snm>
                  <fnm>E</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>434</fpage>
            <lpage>437</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02749-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">12175797</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B13">
            <title>
               <p>Transcriptional collision between convergent genes in budding yeast.</p>
            </title>
            <aug>
               <au>
                  <snm>Prescott</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Proudfoot</snm>
                  <fnm>NJ</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>8796</fpage>
            <lpage>8801</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">124378</pubid>
                  <pubid idtype="pmpid" link="fulltext">12077310</pubid>
                  <pubid idtype="doi">10.1073/pnas.132270899</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B14">
            <title>
               <p>Antisense regulation in X inactivation and autosomal imprinting.</p>
            </title>
            <aug>
               <au>
                  <snm>Ogawa</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Lee</snm>
                  <fnm>JT</fnm>
               </au>
            </aug>
            <source>Cytogenet Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>59</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1159/000071575</pubid>
                  <pubid idtype="pmpid" link="fulltext">12900546</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B15">
            <title>
               <p>Antisense RNA inhibits splicing of pre-mRNA <it>in vitro</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Munroe</snm>
                  <fnm>SH</fnm>
               </au>
            </aug>
            <source>EMBO J</source>
            <pubdate>1988</pubdate>
            <volume>7</volume>
            <fpage>2523</fpage>
            <lpage>2532</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">457123</pubid>
                  <pubid idtype="pmpid">2461296</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B16">
            <title>
               <p>RNA editing and regulation of <it>Drosophila </it>4f-rnp expression by sas-10 antisense readthrough mRNA transcripts.</p>
            </title>
            <aug>
               <au>
                  <snm>Peters</snm>
                  <fnm>NT</fnm>
               </au>
               <au>
                  <snm>Rohrbach</snm>
                  <fnm>JA</fnm>
               </au>
               <au>
                  <snm>Zalewski</snm>
                  <fnm>BA</fnm>
               </au>
               <au>
                  <snm>Byrkett</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Vaughn</snm>
                  <fnm>JC</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2003</pubdate>
            <volume>9</volume>
            <fpage>698</fpage>
            <lpage>710</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370437</pubid>
                  <pubid idtype="pmpid" link="fulltext">12756328</pubid>
                  <pubid idtype="doi">10.1261/rna.2120703</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B17">
            <title>
               <p>Antisense transcripts in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Lehner</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Willians</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Campbell</snm>
                  <fnm>RC</fnm>
               </au>
               <au>
                  <snm>Sanderson</snm>
                  <fnm>CM</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2002</pubdate>
            <volume>18</volume>
            <fpage>63</fpage>
            <lpage>65</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(02)02598-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">11818131</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B18">
            <title>
               <p>Antisense transcripts with FANTON2 clone set and their implications for gene regulation.</p>
            </title>
            <aug>
               <au>
                  <snm>Kiyosawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Yamanaka</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Osato</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Kondo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <cnm>RIKEN GER Group, GSL Members</cnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2003</pubdate>
            <volume>13</volume>
            <fpage>1324</fpage>
            <lpage>1334</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">403655</pubid>
                  <pubid idtype="pmpid" link="fulltext">12819130</pubid>
                  <pubid idtype="doi">10.1101/gr.982903</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B19">
            <title>
               <p>Widespread occurrence of antisense transcription in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Yelin</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Dahary</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Rorek</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Levanon</snm>
                  <fnm>EY</fnm>
               </au>
               <au>
                  <snm>Goldstein</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Shoshan</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Diber</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Biton</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Tamir</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Khosravi</snm>
                  <fnm>R</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2003</pubdate>
            <volume>21</volume>
            <fpage>379</fpage>
            <lpage>386</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nbt808</pubid>
                  <pubid idtype="pmpid" link="fulltext">12640466</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B20">
            <title>
               <p>Overlapping antisense transcription in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Fahey</snm>
                  <fnm>ME</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>TF</fnm>
               </au>
               <au>
                  <snm>Higgins</snm>
                  <fnm>DG</fnm>
               </au>
            </aug>
            <source>Comp Funct Genomics</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>244</fpage>
            <lpage>253</lpage>
            <xrefbib>
               <pubid idtype="doi">10.1002/cfg.173</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B21">
            <title>
               <p>Computational discovery of sense-antisense transcription in the human and mouse genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Shendure</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Church</snm>
                  <fnm>GM</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2002</pubdate>
            <volume>3</volume>
            <fpage>R44</fpage>
            <xrefbib>
               <pubid idtype="doi">10.1186/gb-2002-3-9-research0044</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B22">
            <title>
               <p>Over 20% of human transcripts might form sense-antisense pairs.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
               <au>
                  <snm>Huang</snm>
                  <fnm>X</fnm>
               </au>
               <au>
                  <snm>Xie</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
               <au>
                  <snm>Zhou</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Shi</snm>
                  <fnm>RZ</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>4812</fpage>
            <lpage>4820</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">519112</pubid>
                  <pubid idtype="pmpid" link="fulltext">15356298</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh818</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B23">
            <title>
               <p>Mining SAGE data allows large-scale, sensitive screening of antisense transcript expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Quere</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Manchon</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lejeune</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Clement</snm>
                  <fnm>O</fnm>
               </au>
               <au>
                  <snm>Pierrat</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Bonafoux</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Commes</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Piquemal</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Marti</snm>
                  <fnm>J</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>e163</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534641</pubid>
                  <pubid idtype="pmpid" link="fulltext">15561998</pubid>
                  <pubid idtype="doi">10.1093/nar/gnh161</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B24">
            <title>
               <p>LongSAGE analysis revealed the presence of a large number of novel antisense genes in the mouse genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Wahl</snm>
                  <fnm>MB</fnm>
               </au>
               <au>
                  <snm>Heinzmann</snm>
                  <fnm>U</fnm>
               </au>
               <au>
                  <snm>Imai</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Bioinformatics</source>
            <pubdate>2004</pubdate>
            <volume>21</volume>
            <fpage>1389</fpage>
            <lpage>1392</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/bioinformatics/bti205</pubid>
                  <pubid idtype="pmpid" link="fulltext">15585522</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B25">
            <title>
               <p>Complex loci in human and mouse genomes.</p>
            </title>
            <aug>
               <au>
                  <snm>Engstrom</snm>
                  <fnm>PG</fnm>
               </au>
               <au>
                  <snm>Suzuki</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Ninomiya</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Akalin</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Sessa</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Lavorgna</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Brozzi</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Luzi</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Tan</snm>
                  <fnm>SL</fnm>
               </au>
               <au>
                  <snm>Yang</snm>
                  <fnm>L</fnm>
               </au>
               <etal/>
            </aug>
            <source>PLoS Genetics</source>
            <pubdate>2006</pubdate>
            <volume>2</volume>
            <fpage>e47</fpage>
            <lpage/>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1371/journal.pgen.0020047</pubid>
                  <pubid idtype="pmpid" link="fulltext">16683030</pubid>
                  <pubid idtype="pmcid">1449890</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B26">
            <title>
               <p>ORESTES are enriched in rare exon usage variants affecting the encoded proteins.</p>
            </title>
            <aug>
               <au>
                  <snm>Sakabe</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Galante</snm>
                  <fnm>PAF</fnm>
               </au>
               <au>
                  <snm>de Oliveira</snm>
                  <fnm>PS</fnm>
               </au>
               <au>
                  <snm>Passetti</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Brentani</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Osorio</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Zaiats</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Leerkes</snm>
                  <fnm>MR</fnm>
               </au>
               <au>
                  <snm>Kitajima</snm>
                  <fnm>JP</fnm>
               </au>
               <etal/>
            </aug>
            <source>C R Biol</source>
            <pubdate>2003</pubdate>
            <volume>326</volume>
            <fpage>979</fpage>
            <lpage>985</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.crvi.2003.09.027</pubid>
                  <pubid idtype="pmpid">14744104</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B27">
            <title>
               <p>Detection and evaluation of of intron retention in the human transcriptome.</p>
            </title>
            <aug>
               <au>
                  <snm>Galante</snm>
                  <fnm>PAF</fnm>
               </au>
               <au>
                  <snm>Sakabe</snm>
                  <fnm>NJ</fnm>
               </au>
               <au>
                  <snm>Kirschbaum-Slager</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2004</pubdate>
            <volume>10</volume>
            <fpage>757</fpage>
            <lpage>765</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370565</pubid>
                  <pubid idtype="pmpid" link="fulltext">15100430</pubid>
                  <pubid idtype="doi">10.1261/rna.5123504</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B28">
            <title>
               <p>Identification of human exons over-expressed in tumors through the use of genome and expressed sequence data.</p>
            </title>
            <aug>
               <au>
                  <snm>Kirschbaum-Slager</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Parmiggiani</snm>
                  <fnm>RB</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>SJ</fnm>
               </au>
            </aug>
            <source>Physiol Genomics</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>423</fpage>
            <lpage>432</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1152/physiolgenomics.00237.2004</pubid>
                  <pubid idtype="pmpid" link="fulltext">15784694</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B29">
            <title>
               <p>Serial analysis of gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Velculescu</snm>
                  <fnm>VE</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Vogelstein</snm>
                  <fnm>B</fnm>
               </au>
               <au>
                  <snm>Kinzler</snm>
                  <fnm>KW</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>1995</pubdate>
            <volume>270</volume>
            <fpage>484</fpage>
            <lpage>487</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.270.5235.484</pubid>
                  <pubid idtype="pmpid" link="fulltext">7570003</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B30">
            <title>
               <p>Gene expression analysis by massively parallel signature sequencing (MPSS) on microbead arrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Brenner</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Bridgham</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Golda</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Lloyd</snm>
                  <fnm>DH</fnm>
               </au>
               <au>
                  <snm>Johnson</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Luo</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>McCurdy</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Foy</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Ewan</snm>
                  <fnm>M</fnm>
               </au>
               <etal/>
            </aug>
            <source>Nat Biotechnol</source>
            <pubdate>2000</pubdate>
            <volume>18</volume>
            <fpage>630</fpage>
            <lpage>634</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/76469</pubid>
                  <pubid idtype="pmpid" link="fulltext">10835600</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B31">
            <title>
               <p>An anatomy of normal and malignant gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Boon</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Osorio</snm>
                  <fnm>EC</fnm>
               </au>
               <au>
                  <snm>Greenhut</snm>
                  <fnm>SF</fnm>
               </au>
               <au>
                  <snm>Schaefer</snm>
                  <fnm>CF</fnm>
               </au>
               <au>
                  <snm>Shoemaker</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Polyak</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Morin</snm>
                  <fnm>PJ</fnm>
               </au>
               <au>
                  <snm>Buetow</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>De Souza</snm>
                  <fnm>SJ</fnm>
               </au>
               <etal/>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2002</pubdate>
            <volume>99</volume>
            <fpage>11287</fpage>
            <lpage>11292</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">123249</pubid>
                  <pubid idtype="pmpid" link="fulltext">12119410</pubid>
                  <pubid idtype="doi">10.1073/pnas.152324199</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B32">
            <title>
               <p>Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution.</p>
            </title>
            <aug>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Brubaker</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Patel</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Stern</snm>
                  <fnm>D</fnm>
               </au>
               <au>
                  <snm>Tammana</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <etal/>
            </aug>
            <source>Science</source>
            <pubdate>2005</pubdate>
            <volume>308</volume>
            <fpage>1149</fpage>
            <lpage>1154</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1108625</pubid>
                  <pubid idtype="pmpid" link="fulltext">15790807</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B33">
            <title>
               <p>Anti-sense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer.</p>
            </title>
            <aug>
               <au>
                  <snm>Reis</snm>
                  <fnm>EM</fnm>
               </au>
               <au>
                  <snm>Nakaya</snm>
                  <fnm>HI</fnm>
               </au>
               <au>
                  <snm>Louro</snm>
                  <fnm>R</fnm>
               </au>
               <au>
                  <snm>Canavez</snm>
                  <fnm>FC</fnm>
               </au>
               <au>
                  <snm>Flatschart</snm>
                  <fnm>AV</fnm>
               </au>
               <au>
                  <snm>Almeida</snm>
                  <fnm>GT</fnm>
               </au>
               <au>
                  <snm>Egidio</snm>
                  <fnm>CM</fnm>
               </au>
               <au>
                  <snm>Paquola</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Machado</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Festa</snm>
                  <fnm>F</fnm>
               </au>
               <etal/>
            </aug>
            <source>Oncogene</source>
            <pubdate>2004</pubdate>
            <volume>23</volume>
            <fpage>6684</fpage>
            <lpage>6692</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/sj.onc.1207880</pubid>
                  <pubid idtype="pmpid" link="fulltext">15221013</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B34">
            <title>
               <p>Disclosing hidden transcripts: mouse natural sense-antisense transcripts tend to be poly(A) negative and nuclear localized.</p>
            </title>
            <aug>
               <au>
                  <snm>Kiyosawa</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Mise</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Iwase</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Hayashizaki</snm>
                  <fnm>Y</fnm>
               </au>
               <au>
                  <snm>Abe</snm>
                  <fnm>K</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>463</fpage>
            <lpage>474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1074361</pubid>
                  <pubid idtype="pmpid" link="fulltext">15781571</pubid>
                  <pubid idtype="doi">10.1101/gr.3155905</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B35">
            <title>
               <p>HomoloGene</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/HomoloGene/</url>
         </bibl>
         <bibl id="B36">
            <title>
               <p>LICR MPSS Repository</p>
            </title>
            <url>http://mpss.licr.org/</url>
         </bibl>
         <bibl id="B37">
            <title>
               <p>NCBI: Mouse Transcriptome Project</p>
            </title>
            <url>http://www.ncbi.nlm.nih.gov/genome/guide/mouse/MouseTranscriptome.html</url>
         </bibl>
         <bibl id="B38">
            <title>
               <p>Generation of longer 3' cDNA fragments from massive parallel signature sequencing tags.</p>
            </title>
            <aug>
               <au>
                  <snm>Silva</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Carraro</snm>
                  <fnm>DM</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>SM</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>e94</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">484194</pubid>
                  <pubid idtype="pmpid" link="fulltext">15247327</pubid>
                  <pubid idtype="doi">10.1093/nar/gnh095</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B39">
            <title>
               <p>Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays.</p>
            </title>
            <aug>
               <au>
                  <snm>Kapranov</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Drenkow</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Helt</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Dike</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Gingeras</snm>
                  <fnm>TR</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2005</pubdate>
            <volume>15</volume>
            <fpage>987</fpage>
            <lpage>997</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1172043</pubid>
                  <pubid idtype="pmpid" link="fulltext">15998911</pubid>
                  <pubid idtype="doi">10.1101/gr.3455305</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B40">
            <title>
               <p>Long-range heterogeneity at the 3' ends of human mRNAs.</p>
            </title>
            <aug>
               <au>
                  <snm>Iseli</snm>
                  <fnm>C</fnm>
               </au>
               <au>
                  <snm>Stevenson</snm>
                  <fnm>BJ</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Samaia</snm>
                  <fnm>HB</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
               <au>
                  <snm>Buetow</snm>
                  <fnm>KH</fnm>
               </au>
               <au>
                  <snm>Strausberg</snm>
                  <fnm>RL</fnm>
               </au>
               <au>
                  <snm>Simpson</snm>
                  <fnm>AJ</fnm>
               </au>
               <au>
                  <snm>Bucher</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Jongeneel</snm>
                  <fnm>CV</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>1068</fpage>
            <lpage>1074</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">186619</pubid>
                  <pubid idtype="pmpid" link="fulltext">12097343</pubid>
                  <pubid idtype="doi">10.1101/gr.62002. Article published online before print in June 2002</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B41">
            <title>
               <p>The origin of new genes: glimpses from the young and old.</p>
            </title>
            <aug>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Betran</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Thornton</snm>
                  <fnm>K</fnm>
               </au>
               <au>
                  <snm>Wang</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2003</pubdate>
            <volume>4</volume>
            <fpage>865</fpage>
            <lpage>875</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1204</pubid>
                  <pubid idtype="pmpid" link="fulltext">14634634</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B42">
            <title>
               <p>Pseudogenes in yeast?</p>
            </title>
            <aug>
               <au>
                  <snm>Fink</snm>
                  <fnm>GR</fnm>
               </au>
            </aug>
            <source>Cell</source>
            <pubdate>1987</pubdate>
            <volume>49</volume>
            <fpage>5</fpage>
            <lpage>6</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/0092-8674(87)90746-X</pubid>
                  <pubid idtype="pmpid" link="fulltext">3549000</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B43">
            <title>
               <p>Emergence of young human genes after a burst of retroposition in primates.</p>
            </title>
            <aug>
               <au>
                  <snm>Marques</snm>
                  <fnm>AC</fnm>
               </au>
               <au>
                  <snm>Dupanloup</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Vinckenbosch</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Reymond</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>PLoS Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>e357</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1251493</pubid>
                  <pubid idtype="pmpid" link="fulltext">16201836</pubid>
                  <pubid idtype="doi">10.1371/journal.pbio.0030357</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B44">
            <title>
               <p>Birth and adaptive evolution of a hominoid gene that supports high neurotransmitter flux.</p>
            </title>
            <aug>
               <au>
                  <snm>Burki</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Nat Genet</source>
            <pubdate>2004</pubdate>
            <volume>36</volume>
            <fpage>1061</fpage>
            <lpage>1063</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/ng1431</pubid>
                  <pubid idtype="pmpid" link="fulltext">15378063</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B45">
            <title>
               <p>Extensive gene traffic on the mammalian X chromosome.</p>
            </title>
            <aug>
               <au>
                  <snm>Emerson</snm>
                  <fnm>JJ</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
               <au>
                  <snm>Betran</snm>
                  <fnm>E</fnm>
               </au>
               <au>
                  <snm>Long</snm>
                  <fnm>M</fnm>
               </au>
            </aug>
            <source>Science</source>
            <pubdate>2004</pubdate>
            <volume>303</volume>
            <fpage>537</fpage>
            <lpage>540</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1126/science.1090042</pubid>
                  <pubid idtype="pmpid" link="fulltext">14739461</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B46">
            <title>
               <p>Evolutionary fate of retroposed gene copies in the human genome.</p>
            </title>
            <aug>
               <au>
                  <snm>Vinckenbosch</snm>
                  <fnm>N</fnm>
               </au>
               <au>
                  <snm>Dupanloup</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Kaessmann</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>Proc Natl Acad Sci USA</source>
            <pubdate>2006</pubdate>
            <volume>103</volume>
            <fpage>3220</fpage>
            <lpage>3225</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1413932</pubid>
                  <pubid idtype="pmpid" link="fulltext">16492757</pubid>
                  <pubid idtype="doi">10.1073/pnas.0511307103</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B47">
            <title>
               <p>Natural antisense transcripts with coding capacity in <it>Arabidopsis </it>may have a regulatory role that is not linked to double-stranded RNA degradation.</p>
            </title>
            <aug>
               <au>
                  <snm>Jen</snm>
                  <fnm>C-H</fnm>
               </au>
               <au>
                  <snm>Michalopoulos</snm>
                  <fnm>I</fnm>
               </au>
               <au>
                  <snm>Westhead</snm>
                  <fnm>DR</fnm>
               </au>
               <au>
                  <snm>Meyer</snm>
                  <fnm>P</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R51</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1175971</pubid>
                  <pubid idtype="pmpid" link="fulltext">15960803</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-6-r51</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B48">
            <title>
               <p>Genome-wide prediction and identification of cis-natural antisense transcripts in <it>Arabidopsis thaliana</it>.</p>
            </title>
            <aug>
               <au>
                  <snm>Wang</snm>
                  <fnm>X-J</fnm>
               </au>
               <au>
                  <snm>Gaasterland</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Chua</snm>
                  <fnm>N-H</fnm>
               </au>
            </aug>
            <source>Genome Biol</source>
            <pubdate>2005</pubdate>
            <volume>6</volume>
            <fpage>R30</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1088958</pubid>
                  <pubid idtype="pmpid" link="fulltext">15833117</pubid>
                  <pubid idtype="doi">10.1186/gb-2005-6-4-r30</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B49">
            <title>
               <p>The impact of SNPs on the interpretation of SAGE and MPSS experiments.</p>
            </title>
            <aug>
               <au>
                  <snm>Silva</snm>
                  <fnm>AP</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>JE</fnm>
               </au>
               <au>
                  <snm>Galante</snm>
                  <fnm>PA</fnm>
               </au>
               <au>
                  <snm>Riggins</snm>
                  <fnm>GJ</fnm>
               </au>
               <au>
                  <snm>de Souza</snm>
                  <fnm>SJ</fnm>
               </au>
               <au>
                  <snm>Camargo</snm>
                  <fnm>AA</fnm>
               </au>
            </aug>
            <source>Nucleic Acids Res</source>
            <pubdate>2004</pubdate>
            <volume>32</volume>
            <fpage>6104</fpage>
            <lpage>6110</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">534621</pubid>
                  <pubid idtype="pmpid" link="fulltext">15562001</pubid>
                  <pubid idtype="doi">10.1093/nar/gkh937</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B50">
            <title>
               <p>Genomic variants in exons and introns: identifying the splicing spoilers.</p>
            </title>
            <aug>
               <au>
                  <snm>Pagani</snm>
                  <fnm>F</fnm>
               </au>
               <au>
                  <snm>Baralle</snm>
                  <fnm>FE</fnm>
               </au>
            </aug>
            <source>Nat Rev Genet</source>
            <pubdate>2004</pubdate>
            <volume>5</volume>
            <fpage>389</fpage>
            <lpage>396</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1038/nrg1327</pubid>
                  <pubid idtype="pmpid">15168696</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B51">
            <title>
               <p>Inhibition of c-erbA mRNA splicing by a naturally occurring antisense RNA.</p>
            </title>
            <aug>
               <au>
                  <snm>Munroe</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Lazar</snm>
                  <fnm>MA</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>1991</pubdate>
            <volume>266</volume>
            <fpage>22083</fpage>
            <lpage>22086</lpage>
            <xrefbib>
               <pubid idtype="pmpid" link="fulltext">1657988</pubid>
            </xrefbib>
         </bibl>
         <bibl id="B52">
            <title>
               <p>Identification and characterization of a novel gene SAF transcribed from the opposite strand of FAS.</p>
            </title>
            <aug>
               <au>
                  <snm>Yan</snm>
                  <fnm>M-D</fnm>
               </au>
               <au>
                  <snm>Hong</snm>
                  <fnm>C-C</fnm>
               </au>
               <au>
                  <snm>Lai</snm>
                  <fnm>G-M</fnm>
               </au>
               <au>
                  <snm>Cheng</snm>
                  <fnm>A-L</fnm>
               </au>
               <au>
                  <snm>Lin</snm>
                  <fnm>Y-W</fnm>
               </au>
               <au>
                  <snm>Chuang</snm>
                  <fnm>SE</fnm>
               </au>
            </aug>
            <source>Hum Mol Gen</source>
            <pubdate>2005</pubdate>
            <volume>14</volume>
            <fpage>1465</fpage>
            <lpage>1474</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1093/hmg/ddi156</pubid>
                  <pubid idtype="pmpid" link="fulltext">15829500</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B53">
            <title>
               <p>Alternative pre-mRNA processing regulates cell-type specific expression of the IL4l1 and NUP62 genes.</p>
            </title>
            <aug>
               <au>
                  <snm>Wiemann</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Kolb-Kokocinski</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Poustka</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>BMC Biol</source>
            <pubdate>2005</pubdate>
            <volume>3</volume>
            <fpage>16</fpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1198218</pubid>
                  <pubid idtype="pmpid" link="fulltext">16029492</pubid>
                  <pubid idtype="doi">10.1186/1741-7007-3-16</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B54">
            <title>
               <p>Heterogeneous Sp1 mRNAs in human HepG2 cells include a product of homotypic <it>trans</it>-splicing.</p>
            </title>
            <aug>
               <au>
                  <snm>Takahara</snm>
                  <fnm>T</fnm>
               </au>
               <au>
                  <snm>Kanazu</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Yanagisawa</snm>
                  <fnm>S</fnm>
               </au>
               <au>
                  <snm>Akanuma</snm>
                  <fnm>H</fnm>
               </au>
            </aug>
            <source>J Biol Chem</source>
            <pubdate>2000</pubdate>
            <volume>275</volume>
            <fpage>38067</fpage>
            <lpage>38072</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1074/jbc.M002010200</pubid>
                  <pubid idtype="pmpid" link="fulltext">10973950</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B55">
            <title>
               <p>Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>326</fpage>
            <lpage>329</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.04.006</pubid>
                  <pubid idtype="pmpid" link="fulltext">15922830</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B56">
            <title>
               <p>Human antisense genes have short introns: evidence for selection for rapid transcription.</p>
            </title>
            <aug>
               <au>
                  <snm>Chen</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Sun</snm>
                  <fnm>M</fnm>
               </au>
               <au>
                  <snm>Hurst</snm>
                  <fnm>LD</fnm>
               </au>
               <au>
                  <snm>Carmichael</snm>
                  <fnm>GG</fnm>
               </au>
               <au>
                  <snm>Rowley</snm>
                  <fnm>JD</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2005</pubdate>
            <volume>21</volume>
            <fpage>203</fpage>
            <lpage>207</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/j.tig.2005.02.003</pubid>
                  <pubid idtype="pmpid" link="fulltext">15797613</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B57">
            <title>
               <p>A quantitative analysis of intron effects on mammalian gene expression.</p>
            </title>
            <aug>
               <au>
                  <snm>Nott</snm>
                  <fnm>A</fnm>
               </au>
               <au>
                  <snm>Meislin</snm>
                  <fnm>SH</fnm>
               </au>
               <au>
                  <snm>Moore</snm>
                  <fnm>MJ</fnm>
               </au>
            </aug>
            <source>RNA</source>
            <pubdate>2003</pubdate>
            <volume>9</volume>
            <fpage>607</fpage>
            <lpage>617</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">1370426</pubid>
                  <pubid idtype="pmpid" link="fulltext">12702819</pubid>
                  <pubid idtype="doi">10.1261/rna.5250403</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B58">
            <title>
               <p>UCSC Genome Browser: Download Page</p>
            </title>
            <url>http://hgdownload.cse.ucsc.edu/</url>
         </bibl>
         <bibl id="B59">
            <title>
               <p>BLAT - the BLAST-like alignment tool.</p>
            </title>
            <aug>
               <au>
                  <snm>Kent</snm>
                  <fnm>WJ</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>2002</pubdate>
            <volume>12</volume>
            <fpage>656</fpage>
            <lpage>664</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1101/gr.229202. Article published online before March 2002</pubid>
                  <pubid idtype="pmpid" link="fulltext">11932250</pubid>
                  <pubid idtype="pmcid">187518</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B60">
            <title>
               <p>A computer program for aligning a cDNA sequence with a genomic DNA sequence.</p>
            </title>
            <aug>
               <au>
                  <snm>Florea</snm>
                  <fnm>L</fnm>
               </au>
               <au>
                  <snm>Hartzell</snm>
                  <fnm>G</fnm>
               </au>
               <au>
                  <snm>Zhang</snm>
                  <fnm>Z</fnm>
               </au>
               <au>
                  <snm>Rubin</snm>
                  <fnm>GM</fnm>
               </au>
               <au>
                  <snm>Miller</snm>
                  <fnm>W</fnm>
               </au>
            </aug>
            <source>Genome Res</source>
            <pubdate>1998</pubdate>
            <volume>8</volume>
            <fpage>967</fpage>
            <lpage>974</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="pmcid">310774</pubid>
                  <pubid idtype="pmpid" link="fulltext">9750195</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B61">
            <title>
               <p>EMBOSS: The European Molecular Biology open software suite.</p>
            </title>
            <aug>
               <au>
                  <snm>Rice</snm>
                  <fnm>P</fnm>
               </au>
               <au>
                  <snm>Longden</snm>
                  <fnm>J</fnm>
               </au>
               <au>
                  <snm>Bleasby</snm>
                  <fnm>A</fnm>
               </au>
            </aug>
            <source>Trends Genet</source>
            <pubdate>2000</pubdate>
            <volume>16</volume>
            <fpage>276</fpage>
            <lpage>277</lpage>
            <xrefbib>
               <pubidlist>
                  <pubid idtype="doi">10.1016/S0168-9525(00)02024-2</pubid>
                  <pubid idtype="pmpid" link="fulltext">10827456</pubid>
               </pubidlist>
            </xrefbib>
         </bibl>
         <bibl id="B62">
            <title>
               <p>LICR Sense/Antisense Portal</p>
            </title>
            <url>http://www.compbio.ludwig.org.br/sense-antisense</url>
         </bibl>
      </refgrp>
   </bm>
</art>
