Pipeline (exact commands + preprocessing as needed) 0) Clean up the Excel files (change the sheet names to "Reactions", "Metabolites", and "Biomass" if needed [Model 2]; remove unneeded rows; clean up metabolite names if needed [Model 1]: Additional File 9; add IUPAC names, CAS numbers) 1) Run the MetaMerge script (to set everything up) and then parse the two Excel files (note: lines where a choice is made by the user are preceded and followed by an empty line; any relevant error messages are in Additional File 9): MT1=parseExcel('Mycobacterium Tuberculosis 1.xls') Warning: multiple candidates for reaction, equation found in the list 0) reaction 1) reaction name 2) Reference (reactions without specific references were from Borodina et al. 2005 or from KEGG/BioCYC databases or from standard texts) Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: no candidate for opening coefficients found in the list Warning: no candidate for closing coefficients found in the list Warning: no candidate for irreversible reactions found in the list Warning: no candidate for opening compartments found in the list Warning: no candidate for closing compartments found in the list Warning: no candidate for separating compartments found in the list Warning: multiple candidates for OR symbol found in the list 0) OR 1) Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: multiple candidates for EC, E.C. found in the list 0) E.C. number 1) Reference (reactions without specific references were from Borodina et al. 2005 or from KEGG/BioCYC databases or from standard texts) Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: no candidate for biomass reaction found in the list Warning: multiple candidates for biomass reaction found in the list 0) 47 ATP[c] + 7/1000 CL[c] + 1/1000 DIACYLTREHALOSE[c] + 1/500 DIM[c] + 11/500 DNA[c] + 1/10 GLUCAN[c] + 93/500 LAM[c] + 27/500 LM[c] + 26/125 MAPC[c] + 1/10000 MPD[c] + 7/200 P-L-GLX[c] + 3/500 PE[c] + 29/1000 PGL-TB[c] + 1/25 PIMS[c] + 1/1000 POLYACYLTREHALOSE[c] + 107/500 PROTEIN[c] + 9/250 RNA[c] + 1/200 SL-1[c] + 1/20 SMALLMOLECULES[c] + 2/125 TAGbio[c] + 1/1000 TREHALOSEDIMYCOLATE[c] + 1/1000 TREHALOSEMONOMYCOLATE[c] <-> 47 ADP[c] + 1 BIOMASS[c] + 47 PI[c] 1) 47 ATP[c] + 11/500 DNA[c] + 93/500 LAM[c] + 26/125 MAPC[c] + 7/200 P-L-GLX[c] + 3/500 PE[c] + 1/25 PIMS[c] + 107/500 PROTEIN[c] + 9/250 RNA[c] + 1/20 SMALLMOLECULES[c] + 2/125 TAGbio[c] <-> 47 ADP[c] + 1 BIOMASS[c] + 47 PI[c] Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: multiple candidates for name found in the list 0) Name 1) IUPAC name Please select the correct string from the list above and enter its number, or -1 if none 0 MT2=parseExcel('Mycobacterium Tuberculosis 2.xls') Warning: multiple candidates for reversible reactions found in the list 0) <==> 1) h[c] 2) --> 3) (2) 4) [c] 5) : 6) (3) Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: multiple candidates for irreversible reactions found in the list 0) h[c] 1) --> 2) (2) 3) [c] 4) : 5) (3) Please select the correct string from the list above and enter its number, or -1 if none 1 Warning: multiple candidates for separating compartments found in the list 0) 1) : Please select the correct string from the list above and enter its number, or -1 if none 1 Warning: multiple candidates for OR symbol found in the list 0) , 1) Please select the correct string from the list above and enter its number, or -1 if none 0 Warning: no candidate for open group symbol found in the list Warning: no candidate for close group symbol found in the list Warning: no candidate for EC, E.C. found in the list Warning: no candidate for biomass reaction found in the list Warning: no candidate for biomass found in the list Warning: biomass reaction not found; you may need to edit biomassCoefficients manually! Warning: multiple candidates for name found in the list 0) OfficialName 1) IUPAC name Please select the correct string from the list above and enter its number, or -1 if none 0 1.5) Adjust the reversibility in Model 1 and the reaction names in both models MT1.adjustReversibility() MT1.adjustReactionNames() MT2.adjustReactionNames() 2) Extract additional metabolite features from an existing table (note: these have been prepared by Jeremy Zucker for the M. tuberculosis models and would need to be extracted separately for other metabolic network reconstructions) t = shelve.open('Mappings') Table1 = t['McFadden-BioCyc'] Dict1 = convertTableToDictionary(Table1, keyColumn = 0, valueColumn = 2) Table2 = t['Palsson-BioCyc'] Dict2 = convertTableToDictionary(Table2, keyColumn = 1, valueColumn = 3) Table3 = t['MetaCyc-Kegg'] Dict3 = convertTableToDictionary(Table3, keyColumn = 0, valueColumn = 1) t.close() Dict1=augmentDictionaryByValue(Dict1, Dict3) Dict2=augmentDictionaryByValue(Dict2, Dict3) mergeSpecies(MT1.species, Dict1) mergeSpecies(MT2.species, Dict2) 3) Using an Internet server, extract additional features as needed and convert non-standardized features to a uniform format (this is helpful for the matching process) [note: findAllEnzymeNames may get interrupted by the ExPaSy server!] findAllEnzymeNames(MT1.reactions) getAllSpeciesInfo(MT1.species) getAllSpeciesInfo(MT2.species) convertAllFormulas(MT1.species) convertAllFormulas(MT2.species) findAllProteinNames(MT1.reactions) splitAllProteinNames(MT2.reactions) fixGeneCombinations(MT1.reactions) fixGeneCombinations(MT2.reactions) collectAllGeneNames(MT1.reactions) collectAllGeneNames(MT2.reactions) collectReactionNames(MT1.reactions) collectReactionNames(MT2.reactions) collectSpeciesNames(MT1.species) collectSpeciesNames(MT2.species) flattenFeature(MT1.reactions, 'Enzyme names') 4) Prepare reaction features and species features for the matching process and create the initial matching matrices featuresR1 = [x.description for x in MT1.reactions] featuresR2 = [x.description for x in MT2.reactions] featureNamesR = getTitles(option = "react") featureNamesR1 = [x.split('/')[0] for x in featureNamesR] featuresR1 = supplementFeatures(featuresR1, featureNamesR1, default = [[], [], '', '', '']) featureNamesR2 = [x.split('/')[-1] for x in featureNamesR] featuresR2 = supplementFeatures(featuresR2, featureNamesR2, default = [[], [], '', '', '']) MatchesR = compareFeatures(featuresR1, featuresR2, featureNamesR1, featureNamesR2, [1, 1, 2, 1, 2]) featuresS1 = [x.description for x in MT1.species] featuresS2 = [x.description for x in MT2.species] featureNamesS = getTitles(option = "metab") featureNamesS1 = [x for x in featureNamesS] featuresS1 = supplementFeatures(featuresS1, featureNamesS1, default = ['', '', '', '', {}, '', '']) featureNamesS2 = [x for x in featureNamesS] featuresS2 = supplementFeatures(featuresS2, featureNamesS2, default = ['', '', '', '', {}, '', '']) MatchesS = compareFeatures(featuresS1, featuresS2, featureNamesS1[1:], featureNamesS2[1:]) allNames1 = [x.description['name'] for x in MT1.species] allNames2 = [x.description['name'] for x in MT2.species] MatchesS0 = matchStrings(allNames1, allNames2) MatchesS = addMatrices(MatchesS, MatchesS0) 5) Decide on which elements of the network to combine in a user-supervised manner [we are omitting the user choices since these can be individual] external1 = [x for x, metab in enumerate(MT1.metabolites) if metab.external] external2 = [x for x, metab in enumerate(MT2.metabolites) if metab.external] fullNumbering1 = [MT1.species.index(x.species) for x in MT1.metabolites] fullNumbering2 = [MT2.species.index(x.species) for x in MT2.metabolites] externalNumbering1 = [fullNumbering1.index(fullNumbering1[x]) for x in external1] externalNumbering2 = [fullNumbering2.index(fullNumbering2[x]) for x in external2] ordering1, permutation1 = createOrdering(external1, len(MT1.metabolites)) ordering2, permutation2 = createOrdering(external2, len(MT2.metabolites)) fullNumbering1 = recodeList(fullNumbering1, ordering1) fullNumbering2 = recodeList(fullNumbering2, ordering2) external1 = recodeList(permutation1, external1) external2 = recodeList(permutation2, external2) externalNumbering1 = recodeList(permutation1, externalNumbering1) externalNumbering2 = recodeList(permutation2, externalNumbering2) featuresM1 = recodeList(featuresS1, fullNumbering1) featuresM2 = recodeList(featuresS2, fullNumbering2) allNamesM1 = recodeList(allNames1, fullNumbering1) allNamesM2 = recodeList(allNames2, fullNumbering2) FeaturesM1 = convertFeaturesToMatrix(featuresM1, featureNamesS1) FeaturesM2 = convertFeaturesToMatrix(featuresM2, featureNamesS2) FeaturesR1 = convertFeaturesToMatrix(featuresR1, featureNamesR1) FeaturesR2 = convertFeaturesToMatrix(featuresR2, featureNamesR2) MatchesM = recodeMatrix(MatchesS, fullNumbering1, fullNumbering2) Reacts1 = [recodeAndSortPairs(x.pairs, permutation1) for x in MT1.reactions] Reacts2 = [recodeAndSortPairs(x.pairs, permutation2) for x in MT2.reactions] Q = MatchingMaster(Reacts1, Reacts2, allNamesM1, allNamesM2, FeaturesR1, FeaturesR2, FeaturesM1, FeaturesM2, MatchesR, MatchesM, externalNumbering1, externalNumbering2, cutoff = 3) 6) Check for transitivity and non-transformability and create the combined network [note: if one of the assertions fails, the matching must be corrected manually until it succeeds] MetabMatch = Q[1] (transformable, Heights, Widths) = checkTransformable(MetabMatch, Reacts1, Reacts2) assert(transformable == []) assert(findRepeats(Heights) == []) assert(findRepeats(Widths) == []) irrev1 = [x for x, react in enumerate(MT1.reactions) if not react.reversible] irrev2 = [x for x, react in enumerate(MT2.reactions) if not react.reversible] MetabExcept1 = [permutation1[x] for x, metab in enumerate(MT1.metabolites) if metab.species.name == 'H'] MetabExcept2 = [permutation2[x] for x, metab in enumerate(MT2.metabolites) if metab.species.name in ['h', 'h2o']] mergedNetwork = MetabMergeExcept(MetabMatch, external1, external2, MetabExcept1, MetabExcept2, Reacts1, Reacts2, irrev1, irrev2) 7) Convert the combined network into SBML format for future records Growth = [MT1.findBiomassReaction(), MT2.findBiomassReaction()] Genes = [[x.geneCombination.clauses for x in MT1.reactions if x.geneCombination else [[]]], [x.geneCombination.clauses for x in MT2.reactions]] MetabNames = [allNamesM1, allNamesM2] Externals = [externalNumbering1, externalNumbering2] ReactFeatures1 = [filterOut(x, [0,1]) for x in [featureNamesR1] + FeaturesR1] ReactFeatures2 = [filterOut(x, [0,1]) for x in [featureNamesR2] + FeaturesR2] ReactFeatures = [ReactFeatures1, ReactFeatures2] MetabFeatures1 = [filterOut(x, [4]) for x in [featureNamesS1] + FeaturesM1] MetabFeatures2 = [filterOut(x, [4]) for x in [featureNamesS2] + FeaturesM2] MetabFeatures = [MetabFeatures1, MetabFeatures2] (mReacts, mGrowth, mIrrev, mGenes, mMetabNames, mExternal, mReactFeatures, mMetabFeatures, mGeneFeatures) = prepareMergedModel(mergedNetwork, Growth, Genes, MetabNames, Externals, ReactFeatures, MetabFeatures, GeneFeatures = []) allGenes = ConvertToSBML(mReacts, mGrowth, mIrrev, mGenes, mMetabNames, mExternal, mReactFeatures, mMetabFeatures, mGeneFeatures, sep = "\t", br = '\n', ModelName = 'CombinedTBModel', FileName = 'CombinedTBModel.xml'