We have created tcDoms, a set of Hidden Markov Models (HMMs) derived from families in TCDB. These tcDoms will be a useful resource (a) to sort query proteins into their corresponding TCDB families, (b) to infer relationships between families, and thus characterize superfamilies, or, when applicable, (c) to infer the substrates and functions of putative transporters. Although many transport-related domains are currently available in Pfam, CDD and other databases, tcDoms are specialized in cellular transport and are designed to assist in database manual curation as well as to increase the robustness of family definitions in TCDB.
In the initial phase, we have focused on families composed exclusively of single component systems. We first performed an "all vs all" comparison of all proteins in TCDB using BLAST and selected those families where all protein members found each other. We then produced multiple alignments for these proteins using MUSCLE. Next, the program hmmbuild, from the HMMER software suite, was used to produce HMM profiles from the multiple alignments. Finally, the performance of the resulting HMMs was benchmarked using a leave-one-out cross-validation and checking against cross-contamination with unrelated families (manuscript in preparation). Of the 364 tcDoms currently built for 277 families, 295 overlap with 382 known Pfam models. Note that a one-to-one relationship between Pfam and tcDoms cannot be expected because different regions of proteins in a TCDB family can match different, nonoverlapping Pfam domains, and different tcDoms can match the same Pfam domain. Our tcDoms are meant to help distinguish members of one family from those of another family, even if they belong to the same superfamily. Thus, we are expecting to produce more than one tcDom per family. The number of tcDoms per family depends on their relationships with other domain collections (e.g., Pfam and CDD) and the number of distinctive characteristic domains that we can identify for each family in TCDB. The initial set of tcDom HMMs are available for downloading here.