You can edit almost every page by Creating an account and confirming your email.

Alternative protein

From EverybodyWiki Bios & Wiki


Alternative proteins are proteins that are not yet annotated in current protein sequence databases, including UniProt, RefSeq, and Ensembl. They are coded by unannotated or alternative open reading frames (ORFs). Alternative proteins are not isoforms of annotated proteins derived from alternative splicing of premature messenger RNAs (mRNAs); they are novel proteins. Alternative proteins are small proteins with a median length of 50 amino acids and were discovered recently[1]. Their discovery challenges the concept of monocistronic and non-coding genes and non-coding RNAs (ncRNAs) in eukaryotes; indeed, the coding sequences for alternative proteins are present both in mRNAs already coding for conventional proteins and in RNAs annotated as ncRNAs.

Annotated (reference or canonical) proteins versus unannotated (alternative) proteins

A fraction of any genome includes protein-coding genes[2]. Typically in eukaryotes, each protein-coding gene is believed to contain a single functional Open reading frame (ORF), the longest one; this annotated ORF is also termed coding region or coding sequence (CDS) and encodes one of the thousands of proteins produced by any cell. After transcription, the mRNA interacts with ribosomes which translate the CDS into a cellular protein. Such proteins are annotated in protein sequence databases, including UniProt and RefSeq.
Yet, protein-coding genes may contain several unannotated or alternative ORFs in addition to annotated ORFs; with the sequencing of numerous genomes, it is possible to detect these alternative ORFs. Not all alternative ORFs are functional since a fraction that remains unknown represent random ORFs present in the genome by chance. Proteogenomics is a new field aiming at detecting functional alternative ORFs and the unannotated (or alternative) proteins they code for. Hence, proteogenomics helps improving genome annotations and to capture the true coding potential of genomes. The proteogenomic resource OpenProt functionally annotates alternative ORFs and proteins in several species.
In addition to protein-coding genes, a large fraction of genes is annotated as non-coding. These include genes producing ncRNAs with ORFs shorter than 100 codons and pseudogenes. Non-coding genes may also contain alternative ORFs, thus code for alternative proteins.
Within an mRNA, alternative proteins may be coded in an alternative ORF localized in the 5'UTR, the 3'UTR, or overlapping the CDS in a different reading frame.
When the evidence of expression of an alternative protein accumulates in the scientific literature, this alternative protein gets annotated in conventional protein sequence databases. If the corresponding gene was previously annotated as coding, it is re-annotated as a bicistronic gene; if it was previously annotated as non-coding, it is re-annotated as a coding gene.

Alternative proteins are small

Since the longest ORF only was annotated as the CDS in each protein-coding gene, alternative ORFs and their corresponding proteins are necessarily smaller. However, some alternative proteins are larger than 100 amino acids.



References

  1. Samandi, Sondos. "Deep transcriptome annotation enables the discovery and functional characterization of cryptic small proteins". eLife. doi:10.7554/eLife.27860. PMID 29083303.
  2. Zerbino, Daniel (18 May 2020). "Progress, Challenges, and Surprises in Annotating the Human Genome". Annu Rev Genomics Hum Genet. doi:10.1146/annurev-genom-121119-083418. PMID 32421357 Check |pmid= value (help).


This article "Alternative protein" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Alternative protein. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.