Global analysis of patterns of gene expression during Drosophila embryogenesis1Department of Molecular and Cell Biology, University of California, Berkeley, CA 94720, USA 2Howard Hughes Medical Institute, Cyclotron Road, Berkeley, CA 94720, USA 3Max Planck Institute of Molecular Cell Biology and Genetics, Pfotenhauerstr., Dresden, D-01307, Germany 4Department of Preventive Medicine, Keck School of Medicine of USC, Eastlake Ave, Los Angeles, CA 90033, USA 5Lawrence Berkeley National Laboratory, Cyclotron Road, Berkeley, CA 94720 6Department of Molecular Cell and Developmental Biology, University of California Los Angeles, Los Angeles, CA 90095, USA 7Janelia Farm Research Campus, HHMI, Helix Drive, Ashburn, VA 20147, USA
Genome Biology 2007, 8:R145doi:10.1186/gb-2007-8-7-r145
Subject areas: Development, Genome studies Additional filesAdditional data file 1: The schematic of our image-driven curation strategy is shown on the left. The curator generated lists of genes (either all genes at a stage range or subset based on annotation query) and the computer produced all the images associated with those genes. Images were presented in batches that were navigated in a manner similar to Google searches with progress tracking. Pages that were viewed were marked. Gene names were shown in the corner of the first image for that gene. The curator reviewed the annotations of all genes expressed in a tissue, for example CNS, and by clicking on an image showing CNS staining, moved that image and the associated gene name to a child gene list. The images for the selected genes were eliminated from the parent pages. Both the parent and child gene lists were stored in a specialized database called the 'list manager' [21]. Lists became inputs for further image-based sub-selection or were used as a starting point for the annotation driven curation approach. The schematic of our annotation driven curation strategy is shown on the right. The curator inputs a gene list that was based on an annotation query or based on the previously described image driven curation. The curator then selected subsets of annotation terms called the 'block'. The list was ordered according to the content of block terms in the annotations of each gene. The curator was presented with images and block-limited annotations for each gene in the list in succession and changed any annotations within the block that needed correction. The curator was then presented with the set of genes that were not annotated for block terms and corrected any omissions. Modifications to the annotations were immediately stored in the database and a history of all changes was tracked. Format: EPS Size: 14MB Download file Additional data file 2: Comparison of the distribution of selected GO terms (GO slim general [20]) in the 6,003 genes in our study (red bars) and the distribution of the same terms in the 14,586 genes (purple bars) in the genome (Release 4.3). Each bar represents the percentage of genes with a given GO slim category. The similarities of these two profiles suggest we have annotated a representative sample of all genes. Format: EPS Size: 318KB Download file Additional data file 3: We collapsed the 16 well-defined embryonic stages [9] into six developmental stage ranges. Annotation terms were grouped into the six developmental stage ranges, which are separated by solid horizontal lines. We reduced 314 terms in the full anatomical CV to 145 terms representing the structures that were most frequently seen and most readily distinguishable in our dataset, and the 131 of these that are annotated in 10 or more genes are shown here (Additional data files 4 and 5 list all terms). Genes annotated using more specific annotation terms were collapsed to this level. For example, an annotation of 'dorsal ridge' was collapsed to 'dorsal ectoderm', because the 'dorsal ridge' is 'part of' the 'dorsal ectoderm' in our formal CV hierarchy. Relative annotation counts are shown as colored bars. The length of each bar is proportional to the number of genes annotated with that term. Each bar is in one of 16 colors indicating the particular cell fate of the lineage; for instance, endoderm structures are in red across all developmental stages. This color code follows to the 'develops from' relationship among terms in our formal annotation hierarchy. Additional data files 4 and 5 contain the raw gene counts using the same organization. Format: EPS Size: 3.1MB Download file Additional data file 4: For each organ system, we listed: the 'total' number of genes annotated with any raw annotation term that belonged to that organ system; the number of genes restricted ('restr.') to that organ system, meaning that at the stage range indicated the gene was annotated ONLY with raw terms of that organ system; the number of genes expressed in the organ system and at the same time not expressed in an equivalent organ system at another stage of development (restr. 4-6 and restr. 7-10); the total number of genes expressed in the equivalent organ systems connected by 'develops from' relationship at stage range 4-6 OR stage range 7-10 (st 4-10 uni. = union); and the number of genes expressed in equivalent organ system connected by 'develops from' relationship at stage range 4-6 AND stage range 7-10 (stages 4-10 intrs. = intersection). For each collapsed annotation term (block) we listed: the total number of genes annotated with any raw annotation term that belongs to that block; and the number of genes restricted ('restr.') to the block, meaning that at the given stage range the gene is annotated ONLY with raw terms from the block and no other terms from another block. The organ systems and blocks are color-coded. Mapping of organ systems to blocks of terms and blocks of terms to raw terms is available as supplementary on-line material [21]. Format: EPS Size: 2.1MB Download file Additional data file 5: Similar to Additional data file 4, we listed for each organ system and each block of terms: the total number of genes that were annotated with ANY raw annotation term that belonged to that organ system or block; the number of genes expressed ONLY in the organ system or block at the given stage (restr.); the number of genes expressed in the organ system or block and at the same time not expressed in an equivalent organ system or block at another stage of development (restr. 11-12 and restr. 13-16); the number of genes expressed in equivalent organ systems or blocks at stage range 11-12 OR stage range 13-16 (st 11-16 uni. = union); and the number of genes expressed in equivalent organ systems at stage range 11-12 AND stage range 13-16 (stages 11-16 intrs. = intersection). The organ systems and blocks are color-coded. Mapping of organ systems to blocks of terms and blocks of terms to raw terms is available as supplementary on-line material [21]. Format: EPS Size: 1MB Download file Additional data file 6: Genes that share between 50% and 100% of their annotation terms (x-axis; uniformity) were grouped and the resulting number of groups enumerated (y-axis). The solid line in each graph shows the total number of groups thus formed for a given level of uniformity. The dashed line shows the total number of groups with a single gene member (singletons) for a given level of uniformity. For uniformity levels of 100% (all annotation terms matched within the group) and 75% (3 out of 4 terms matched within the group) we highlight the total number of groups, the number of singletons and the difference between the two. (A) Data for all genes expressed in the embryo (B) for genes belonging to broad clusters and (C) for genes in restricted clusters. Format: EPS Size: 396KB Download file Additional data file 7: Given an annotation (spatial) similarity score 0 ≤ ss ≤ 1 and an array (level) similarity score 0 ≤ sl ≤ 1, the function sc = sl + (1 - sl)sl ss gives a similarity score where microarray similarity has a significant effect when annotation similarity is medium to high, but very little effect when annotation similarity is low. Format: PDF Size: 52KB Download file This file can be viewed with: Adobe Acrobat Reader Additional data file 8: *Terms with adjusted p value smaller than 0.001. B = broad, R = restricted. ‡Total number of unique genes in cluster. §GO or Uniprot id. ¶GO or Uniprot name. ¥Ratio between observed and expected number of genes in intersection of the two gene lists. #Number of genes in cluster annotated with given GO or Uniprot term. **Number of genes in the genome annotated with given GO or Uniprot term. ttAdjusted p value of binomial z test testing the null hypothesis that the overlap between the two gene lists is random. Format: EPS Size: 331KB Download file Additional data file 9: Similar to Figure 6 for unique genes in each cluster, we summarized the array profiles, diversity of annotation terms (as an anatogram), number of total and core genes and show two to four embryo images. Whenever possible, genes with previously uncharacterized expression patterns were selected. Array plots show the distribution of scaled intensity scores: the blue line indicates the median value while the gray box gives the inter-quartile range. The most relevant annotation terms in each anatogram are labeled. Format: EPS Size: 7.9MB Download file |


on Google Scholar







author email
corresponding author email