He linearity of the resulting plot. The power law distribution is
페이지 정보
작성자 Doris 댓글 0건 조회 69회 작성일 23-08-27 22:07본문
He linearity of the resulting plot. The power law distribution is represented by the function: S(x) = ax-b The logarithmic transformation of the function is: log(S(x)) = log(a) ?blog(x) Therefore, the fitness of the power law function for the indel/loop length distribution has been evaluated based on the Pearson correlation coefficient (r2) of the linear plot.Location analysis of protein domain and indel To further study functional aspects of indels presented in Indel PDB, we have investigated the presence of protein domains that were in the proximity of indel sites. First, 9,318 protein domain profiles characterized by Hidden Markov Model (HMM) were obtained from the Pfam database (version 22.0, [24]). Second, the HMMER program (ver 2.3.2, [25]) was utilized to scan each of the 22,103 PDB protein sequences against each of the 9,318 Pfam domain profiles. The scanning processes were performed on a cluster of 50 CPUs to generate outputs, which contained the exact starting and ending amino acid residues where protein domains were located for each of the protein sequences. In step three, the locations of the protein domains were overlaid with the locations of 117,266 indel sites in 11,294 indel-containing proteins. From an indel perspective, we calculated the distance between any given indel site and all domains on a given protein. The distance was measured by the number of amino acid residues between the boundary of the indel site and a domain site. If there was an overlap between the residues of the indel and the domain, the distance was assigned a "0".Utility and DiscussionOverview of Indel PDB Indel PDB contains sequence and structural data associated with 488,039 (or 117,266 non-redundant) indel sites, extracted from 11,294 indel-containing proteins in PDB. Indel PDB and the indel analysis results are freely accessible to the public over the internet on the World Wide Web [19].An easy way for users to interact with indel data is through a comprehensive Capecitabine indel search engine. Users can search indels using one or more of the following criteria, including PDB ID, indel length, secondary structure composition, solvent accessibility score, and proximity with protein domains. In addition, users can specify the sources (species) of query and subject proteins. For example, the various searching criteria can be used to identify indels of interests between pathogens and humans for possible drug target binding sites. Furthermore, users can set a specific range on indel length, secondary structure or solvent accessibility to find indel sites that are, for instance, long, mainly alpha-helical, and surface PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/16989806 exposed. Moreover, users can search for indels that overlap with certain protein domains by turning on the domain search option, setting the proximity domain distance to '0' and giving a specific domain name or ID (e.g. Peroxidase or PF00141). Such results are useful to study the functional roles of indels among similar proteins. Alternatively, a query protein sequence can be submitted and searched against all the indel sequences in Indel PDB by BLASTp. Successfully indel hits are displayed to users. As shown in Figure 3, the following information of each indel site is displayed: Query PDB ID (protein that contains the insertion site), Query name, Query source, Subject PDB ID (protein that contains the corresponding deletion site), Subject name, Subject source, BLAST alignment scores, the complete sequence alignment, indel location (start and end po.