Verified
Since proteins are important biological components of study so it is worthwhile to know details about its classification. And as day by day the scientific community is coming up with better technologies like x-ray crystallography, NMR spectroscopy the elucidation of protein structure has been vast. And day by day number of proteins whose structure has been elucidated is increasing so there is a need of platform where these vast array of proteins can be classified, studied and any information regarding them can be accessed by research professionals and other aspirants with ease.
And to approach this problem bioinformatics has given us much wonderful solution. They are structural databases which contains these vast arrays of proteins and their structural classifications. The protein data bank contains over 35,000 records documentation experimentally solved protein structures; each database is based upon specific algorithms. The main databases which classify proteins on the basis of their structure are – SCOP, CATH, and DALI.
SCOP- Structural Classification of Proteins
CATH- Class, Architecture, Topology and Homology Superfamily
DALI
The above three databases are structural database means they gives data’s about the structures of proteins. Each database has its own significance even though there are certain limitations’ of it. There always exists a pathway of evolution for everything. Similarly for these three databases also they exist. The evolution pathway has gone like this
SCOP→→→→→CATH→→→→→→→→DALI
(Manual) (Mixed=manual + (Fully automated)
Automated)
So we will be comparing each of the three databases singly based upon various criteria.
SCOP:
Mostly proteins share a common structure and many of them have a common origin. So the main work of SCOP database is to provide detailed structural and evolutionary similarity between the proteins whose structures are known till to date.
By knowing the evolutionary relationships of those proteins further research on those proteins become easier. It mostly classifies protein according to hierarchy:
Family- proteins who share a clear common evolutionary relationship are categorised under this platform. It means the residues of proteins have 30 % similarity or greater than that. But sometimes it also classifies according to the similar structure and function even though the percentage similarity by evolutionary relationship is merely 15 %.
Superfamily- Proteins who share a common structure and function indicating that they may share a common evolutionary origin show probable evolutionary common proteins comes under this category.
Example- Actins, ATPase, heat shock protein domains, hexokinase all comes under one Superfamily.
Fold- The proteins which share common structure comes under this category. If the proteins have same secondary structures share same arrangement and same topological connections then they comes under the same fold. The protein which comes under this category may or may not have same evolutionary history but the similarity in their structure is due to the various physical and chemical links between the proteins.
Scop includes many structural domains like alpha helical domains, beta sheet domains, parallel beta sheets or beta-alpha-beta domains, alpha + beta domains which formed independently by alpha helices and antiparallel beta sheets, multidomain proteins, proteins present on various membranes, various peptides, coiled coil proteins, proteins whose resolutions are not so good. SCOP protein classification is essentially a manual process using visual inspection and comparison of structures, some automation is used for the most routine tasks such as clustering protein chains on the basis of sequence similarity.
There are two ways searching in this database. One is homology searching which the user enters a sequence and this database searches to find out those structures which has maximum sequence similarity to that of the given by the user.
Secondly the key word search, where the user enters a key word and the database searches in scop and in Brookhaven protein structure database.
The method which is used by this database to classify proteins is visual inspection and through some of the automated tools and providing the data.
In addition to the structural data about the protein it also provides links to the image of that particular protein , the atomic co-ordinates, sequence data and homologues and medline abstracts.
So to conclude it’s a very important and useful database as far as structural database is concerned.
CATH:
There occurs the hierarchical domain classification of protein structures which are present in the protein database (PDB). Here those crystals structures are considered whose resolution are 4 degree or better than 4 degree. Those molecules which are non proteins and those structures which have greater than 30% c alpha only are excluded from this database.
The protocol which is followed by the PDB for filtering is SIFT which was developed by the Michie and et al in 1996.
The strategies followed by the CATH to classify the proteins are are:
Those proteins which are closely related are classified by comparing their sequences. Those protein homologies which are very distantly related are used to be related by the sequence profiling and structural comparison. Those structures which cannot be classified by following this method are passed on to the next stage where it is examined by manual and automatic methods to find out the domain boundaries. Those structural domains which still remain unclassified are subjected to the first two methods of classification if still unclassified then they are categorised under new structures in CATH.
DALI:
The Dali structural protein database is based on all-against-all 3D structure comparison of protein structures in the Protein Data Bank ,when proteins evolves it changes its structures, if two residues are in contact in one protein the residues aligned with these two in a related protein are also likely to be in contact. This happens to be true even in very distant homologues, and even if the residues involved change in size Mutations that change the sizes of packed buried residues produce adjustments in the packing of the helices and sheets against one another.L. Holm and C. Sander used these observations to solve the problem of structural alignment of proteins. If the inter-residue contact pattern is preserved in distantly-related proteins, then it should be possible to identify distantly-related proteins by detecting conserved contact patterns.
Holm and Sander developed an efficient program called DALI that is now in common use for identifying proteins with folding patterns similar to that of a query structure. The Dali program is used for carrying out automatic comparisons of protein structures determined by X-ray crystallography or NMR. The familiar version is the Dali server, which performs a database search comparing a query structure supplied by the user against the database of known structures and returns the list of structural neighbors by e-mail. The Dali program runs fast enough to carry out routine screens of the entire The Dali Domain Classification is an automatically created classification of protein structures into domain families. First each protein chain is decomposed to domains using the DomainParser2 program. Then the similarity between each pair of domains is calculated by using the Dali structural comparison algorithm. The taxonomy is derived fully automatically from measurements of structural, functional and sequence similarities .All Dali resources use an identical algorithm for structure comparison. Users may run Dali using the Web, or the program may be downloaded to be run locally on Linux computers. The more recently introduced DaliLite server compares two structures against each other and visualizes the result interactively.