Figure S6. A principal components analysis (PCA) of the virulence characteristics combined with all host gene triples. Top panel: host intestinal biology genes. Middle panel: immunity and defense genes. Bottom panel: random genes. The plots show the proportion of variation explained by the first and second principal components versus the variation explained by just the second principal component. The analyses provide a characterization of a lower dimensional structure underlying the data. When combined with the virulence characteristics, the immunity and defense genes (middle panel) generally exhibit a simpler latent structure compared to the other gene sets (top and bottom panels), as judged by the slight northeast shift in the point cloud. While the latent structure identified by PCA need not reflect a relationship between the virulence characteristics and the host genes, it may, in which case the immunity and defense genes are slightly more promising as a set with respect to future canonical correlation analysis (CCA) aimed at uncovering simple and strong relationships between the metagenomic and host transcriptome data. In this way, PCA may be used as a screening device to identify promising gene triples for CCA analysis.

