Tensor Algebra-based Geometrical (3D) Biomacro-Molecular Descriptors for Protein Research: Theory, Applications and Comparison with other Methods
Artículo académico
In this report, a new type of tridimensional (3D) biomacro-molecular descriptors for proteins are proposed. These descriptors make use of multi-linear algebra concepts based on the application of 3-linear forms (i.e., Canonical Trilinear (Tr), Trilinear Cubic (TrC), Trilinear-Quadratic-Bilinear (TrQB) and so on) as a specific case of the N-linear algebraic forms. The definition of the kth 3-tuple similarity-dissimilarity spatial matrices (Tensor’s Form) are used for the transformation and for the representation of the existing chemical information available in the relationships between three amino acids of a protein. Several metrics (Minkowski-type, wave-edge, etc) and multi-metrics (Triangle area, Bond-angle, etc) are proposed for the interaction information extraction, as well as probabilistic transformations (e.g., simple stochastic and mutual probability) to achieve matrix normalization. A generalized procedure considering amino acid level-based indices that can be fused together by using aggregator operators for descriptors calculations is proposed. The obtained results demonstrated that the new proposed 3D biomacro-molecular indices perform better than other approaches in the SCOP-based discrimination and the prediction of folding rate of proteins by using simple linear parametrical models. It can be concluded that the proposed method allows the definition of 3D biomacro-molecular descriptors that contain orthogonal information capable of providing better models for applications in protein science.