Structural domains with no more than 40%sequence identity with each other were obtained from ASTRAL 2.06 (correspondingto SCOPe 2.06) and these were organized into superfamilies.
The superfamilieswere then classified on the basis of the number of domains present in eachsuperfamily into single-member superfamilies (SMS) and multi-membersuperfamilies (MMS). Initial alignment of the structuraldomains within each superfamily was obtained using MATT (Multiple Alignmentwith Translations and Twists) (5) which also provided a structural distancebased tree. Using JOY-5.0v (6), the initial alignment was annotated withsecondary structure, hydrophobicity, solvent accessibility and other structuralfeatures. JOY-4.0v (6) was used to identify equivalences in theinitial alignment- non-gapped aligned regions in each member of thesuperfamily.
The structure-guided tree and equivalences were provided as inputsfor COMPARER (7), which uses variable gap penalties and local structural features such asbackbone conformation, solvent accessibility and hydrogen bonding patterns tocreate the final structure-based sequence alignment. In general, the variablegap penalties ensure that there are no unreasonable gaps in between secondarystructures and conserved regions within the alignment. JOY-3.2v or MNYFIT (6) is used for rigid-body superposition of the structures and itrequires equivalences as input, which is extracted from the final alignmentusing JOY-4.
0v.Although members of a superfamily areexpected to be structurally similar or have a common fold, we came across caseswhere one or more domain(s) in the superfamily would be structurally deviant;these would either have more than 5.5 Å RMSD with other members (structuraloutliers) or fail to align with other members (extreme structural outliers) (8,9). We also encountered situations, wherethe extreme structural outliers would align within themselves, forming subgroupsofsuperfamilies (split superfamilies). Therewere cases where even on removal of several extreme structural outliers, theremaining members of superfamily would fail to align. In such cases, using thestructural phylogeny of all the members as reference, the superfamily would besplit and each subgroup aligned separately, thus giving rise to ‘splitsuperfamilies’. Hidden Markov Models or HMMs of alignments ofsuperfamily members, along with conserved secondary structural motifs, havebeen created using hmmbuild module of HMMER suite and in-house SMotifprogrammes, respectively (10–12). Absolutely conserved residueswere extracted for the alignments for all superfamilies using a Python scriptto read alignments and look for 100% amino acid conservation at particularpositions.
The alignments were annotated with JOY-5.0v toproduce accessory files such as PSA, HBD, SST etc. Principal component analysis(PCA) plots have been constructed on the basis of sequence similaritydistribution of members of a superfamily and are available for download. Otherthan these, alignment statistics (ALISTAT) and indel information (CUSP) havebeen provided for each superfamily (13). C-alpha RMSDs at structurallyequivalent positions of members of each superfamily were used to constructstructure-guided trees which are available for download. Gene ontology (GO)represents properties of gene product under three major terms, namely cellularcomponent, molecular function and biological process (14).
GO term(s) corresponding toeach member within superfamilies were retrieved dynamically from www.rcsb.orgusing the RestFul API clients written in Python.MySQL5.2 was employed as database engine for this version, along with Python2.
The visualizationof the alignment and mapping of conserved residues have been implemented using anin-house plug-in.