Human subtelomeric duplicon structure and organization1The Wistar Institute, Spruce St, Philadelphia, PA 19104, USA 2Department of Molecular Biology, Princeton University, Princeton, NJ 08544, USA
Genome Biology 2007, 8:R151doi:10.1186/gb-2007-8-7-r151
Subject areas: Molecular biology, Genome studies Additional filesAdditional data file 1: The p-arm sequence as given was attached at the p-arm coordinate, and the reverse complement of the q-arm sequences were attached at the indicated q-arm coordinates Format: PDF Size: 12KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 2: Duplicon modules were defined by processing the results of BLAST searches of in-house curated subtelomere query sequences (see text and Materials and methods). Colinear and properly oriented pairs of BLAST matches to the query sequence were joined into a chain if not separated by greater than 25 kb and not uninterrupted by other hits from the same query sequence. Groups of chained blast hits spanning ≥1 kb of the subject sequence were defined as duplicons. These methods were tolerant of insertions and deletions <25 kb in size (for example, of retrotransposons) but not tolerant of rearrangements. Format: PDF Size: 61KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 3: Each module is defined by a set of pairwise alignments, and each reference sequence in these sets is represented as a single row in this table. The first column (module) contains an identifier for the particular copy of the module (duplicon) indicated in the next three columns. These columns (query sequence) list the subtelomeric location of the query sequence defining the module (see Materials and methods). The 'aligned sequences' column shows the locations of other duplicons in this module, matched by the query. The coordinates in this column refer either to our published subtelomeric assemblies (designated by chromosome and arm p or q) or the human genome build 35 (all other designations). The %IDeach is percent nucleotide sequence identity across the chained pairwise alignment, excluding masked sequence. The %IDavg is the average percent identity of all pairwise alignments in the module. This was the number used for %ID in charts and analyses in this paper. The final column shows a 1 if the module contains intrachromosomal non-subtelomeric sequence matches, and 0 if it does not. Format: PDF Size: 772KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 4: This table shows the numbers of duplicon modules defined per subtelomere. The complete list of these modules is included in Additional data file 3. The 'subtelomeric' column shows the total number of modules for each subtelomere region (since each module is defined by a set of subtelomeric coordinates). The 'non-subtelomeric' column lists the subset of these modules with homology to duplicated regions that lie outside the subtelomeres. A comparison of these non-subtelomeric duplicons to the subtelomeric copies is included in Figure 3 and in Additional data file 5. The 'intra-chromosomal' column indicates the subset of modules with homology to a different region on the same chromosome. Format: PDF Size: 27KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 5: Subtelomeric regions correspond to the set of query sequences enumerated in Additional data file 1 and the average percent identity across the sequences to which each is aligned. The non-subtelomeric regions correspond to the aligned sequences that fall outside the subtelomere regions (the subset listed in Additional data file 2). Format: PDF Size: 42KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 6: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 846KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 7: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 543KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 8: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 531KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 9: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 602KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 10: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 474KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 11: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 758KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 12: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 552KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 13: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 630KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 14: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 564KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 15: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 793KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 16: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 657KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 17: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 654KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 18: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 693KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 19: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 472KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 20: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 718KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 21: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 474KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 22: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 532KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 23: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 641KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 24: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 544KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 25: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 607KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 26: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 834KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 27: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 513KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 28: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 520KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 29: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 501KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 30: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 538KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 31: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 542KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 32: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 659KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 33: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 551KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 34: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 978KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 35: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 531KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 36: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 594KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 37: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 690KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 38: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 504KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 39: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 790KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 40: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 556KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 41: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 503KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 42: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 661KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 43: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 596KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 44: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 627KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 45: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 508KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 46: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 548KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 47: The subtelomere sequences shown are the assemblies published previously [6] and are available at the Riethman Lab website [47]. The telomeric end of each sequence assembly is located at the left. The distance from the end of the sequence to the start of the terminal repeat array is indicated by the vertical arrow at the telomeric end of the sequence. The position and orientation of (TTAGGG)n tracts are shown as black arrows. Top panels: duplicated genomic segments are identified by chromosome (color) and whether they are subtelomeric (bounded rectangles), non-subtelomeric (unbounded rectangles), or intra-chromosomal (located above the subtelomere coordinates). Each rectangle represents a separate duplicon. Bottom panels: duplicated genomic segments are the same as in the top panels, but identified by nucleotide sequence similarity with the query subtelomere sequence (color scheme as indicated in the key). Format: PDF Size: 562KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 48: This table shows blocks of modules that occur exclusively in subtelomere regions. The first column gives an identifier for each block. The next three columns (query sequence) give the subtelomeric location that defines the block (which will consist of one or more adjacent modules). For completeness, in some cases aligned sequences have been included in these blocks even though they fell below thresholds for module definition. The percent identity of the chained alignments between the sequences is indicated (excluding masked sequence). Named genes/gene families that have transcripts matching part or all of the respective duplicon blocks are listed in the last column. Block 7 is the D4Z4 tandem repeat on the 4q and 10q subtelomeres, for which no percent identity is calculated because of the very large number and diverse percent identities of the BLAST alignments among tandem D4Z4 repeats. Format: PDF Size: 22KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 49: This table shows blocks of modules that are adjacent to the ends of finished telomeres (see Materials and methods). The columns describe the same categories of information as indicated in Additional data file 48. A limited set of non-subtelomeric copies of subterminal duplicons exist (Additional data file 49). Their genomic locations suggest sites of ancestral telomere-associated chromosome rearrangements, including a well-documented telomere fusion at 2q13-q14 [37] that contains representatives of subterminal duplicon families A, B, C, and D (Additional data file 49). The non-subtelomeric site of a duplicon from family D at 3p12.3 is the tip of an extended duplication region; the DNA on the centromeric flank of this site contains 4q and 10q subtelomere homology, including beta satellite repeat structure resembling part of the D4Z4 repeat. Subterminal family F contains several non-subtelomeric sites of duplicons; those on chromosomes 22q, 14q, and 12p are very close to the respective centromeres (Additional data file 49), indicating potential ancestral inversion of a chromosome arm followed by duplication of pericentromeric sequences as a mechanism for the genesis of the non-subterminal copies of this subterminal sequence family. The sequence similarity between subterminal duplicon copies within a family is mainly in the 90-96% range for subterminal blocks A, B, and D (Table 2; see Additional data file 49 for the rare exceptions.). As with the subtel-only blocks, some of these duplicons correspond to only part of the subterminal block sequence. There is also some overlap in sequences occupied by subterminal duplicon blocks A, B, and D; this is reflected in their occupancy of parts of the same transcript families RPL23A7 and FAM41C (Table 2). The cross-family homologies between subterminal blocks A, B, and D are also in the 90-96% identity range but the positions of the duplicons within the blocks vary and are located at different distances from the (TTAGGG)n tract; also, there are several alternative organizations of high-copy repetitive elements (masked and not examined in detail in this study) within these subterminal blocks. Thus, there might be more frequent shuffling of subterminal sequences than sequences located more centromerically, at least within a subset of subtelomere alleles; this idea is broadly consistent with an earlier model of subtelomere structure featuring compartments with distinct functional properties [9]. Further refinement of the classification of these subterminal families appears feasible and will benefit from more extensive sampling of (TTAGGG)n-adjacent sequences from additional alleles. Subterminal Block F contains one duplicon on 10p with very high similarity to the 18p query sequence, suggesting a very recent duplication event; the remaining duplicons were all in the 91-94% identity range. Block C has the highest sequence similarity among all subterminal duplicon sequence families, and has a copy at the 2q fusion locus. Block E (96-97%) is unusual in that it corresponds to a portion of subtelomere-only duplicon family 6 (Table 1), and is the only subterminal duplicon sequence family with subtel-only properties. This particular sequenced allele of 17p might have formed by the truncation of a chromosome end within this large subtelomere-only duplicon, as there is mapping evidence for several longer alleles of the 17p telomere (H Riethman, unpublished). It is interesting to note that (TTAGGG)n tracts at 17p and, indeed, on this particular allele of 17p tend to be consistently among the shortest in the human genome [19,51]. Format: PDF Size: 44KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 50: Comparison of subtel-only and subterminal duplicon blocks defined in this work with the subtelomeric homology blocks reported in Linardopoulou et al. [12] Format: PDF Size: 17KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 51: Candidate transcripts were identified by blasting the representative subtelomere-only query sequences (Additional data file 48) against the NCBI RefSeq mrna database (downloaded 24 July 2006) [52]. Human mRNAs with 90% or greater homology were run through Spidey [53] against the set of subtelomere-only duplicon block representatives. This table has been filtered to those hits above 95% identity according to the Spidey predictions. The first and second columns indicate the subtelomere-only block and RefSeq accession that align to each other. The third is the description line from the RefSeq database. The fourth and fifth columns are the percent identity and percent coverage of the aligned mRNA as reported by Spidey. Format: PDF Size: 29KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 52: Candidate transcripts were identified by blasting the representative subterminal query sequences (Additional data file 49) against the NCBI RefSeq mrna database (downloaded 24 July 2006) [52]. Human mRNAs with 90% or greater homology were run through Spidey [53] against the set of subterminal duplicon block representatives. The first and second columns indicate the subterminal block and RefSeq accession that align to each other. The third is the description line from the RefSeq database. The fourth and fifth columns are the percent identity and percent coverage of the aligned mRNA as reported by Spidey. Format: PDF Size: 72KB Download file This file can be viewed with: Adobe Acrobat Reader |


on Google Scholar







author email
corresponding author email