Chromosome 9 Open Reading Frame 43
Introduction
Chromosome 9 open reading frame 43 (c9orf43) is a protein coding gene located on 9q32 with 14 Exons[1]. C9orf43 contains DUF4647, a Domain of unknown function from amino acid 1-454 . RNA-sequencing of 95 human tissue samples, including 27 total different tissue types, allowed for observation of tissue specificity[2]. The testes samples showed the highest Reads Per Kilobase Million (RBKM) 20.215 ± 5.996[3]. cDNA clones also showed expression in the brain, leiomyosarcoma, medulla, testis normal, the uterus, cervical carcinoma cell line[4].
Gene[edit]
Function of c9orf43 is largely unknown, although proteins are expected to be found in the nucleus. No phenotype has been reported, nor is the in vivo function known4. Mitochondria and cytoplasm localization of C9orf43 was predicted with BioPlex 2.0 which could not be previously predicted with BioPlex 1.0 and these findings are seen to be consistent with the Human Protein Axis . Bioplex 2.0 further allowed for associations of uncharacterized proteins with better studied proteins . C9orf43 was found to be associated with amyloid beta (A4) precursor protein (APP), a high throughput interaction detected for the two purified proteins in vitro. Another association is seen between RAB GTPase activating protein 1-like (RABGAP1L) and C9orf43 through affinity capture-luminescence with C9orf43 used as bait and tagged with luciferase, and is enzymatically detected by the prey protein (RABGAP1L). C9orf43 was again used as bait in affinity capture-MS in capture by polyclonal antibody or epitope tags from cell extracts of ACTA2 which is expressed in actin, alpha 2, smooth muscle, and the aorta with an association of 0.9067. A high throughput association (0.8992) determined by Affinity Capture-MS was also seen in interaction with ACTBL2 which is expressed in actin, beta-like 2. Association by Affiinity Capture-MS was also determined in interaction with golgi-associated, gamma adaptin ear containing, ARF binding protein 1 (GGA1).
A search of “C9orf43 protein function” on PubMed revealed the gene is located on chromosome 9q32 which is within the region (9q22.3-34.1) having a 45% probability of possessing the susceptibility gene for cleft palate/lip . 50 SNPs were mapped to test for cleft lip/palate in families from United States, Spain, Turkey, Guatamala, and China. P values for c9orf43 were equal to 0.16 in the pooled population and the association was found to be not significant. A search of “C9orf43” in Google Scholar indicated CAGpolyQ repeats were identified in C9orf43, with G6 A1G1 repeat sequence found in the reference genome . Searches for “C9orf43” in Biocarta, OMIM, and AIDAN did not return any results. Any other articles linked to c9orf43 found focused on study of the entire chromosome 9 and c9orf43 was not specifically studied or analyzed.
Transcript[edit]
The region of the mRNA is shown with base pairs aligned in groups of 10.
gtccgtttcc catggtgccc tgcgccgctg ctgggtgacg gcgcggccgg gcgcagcgcg tgggagacga aggcgtggct gactcggcgg ggcggatccc tttaggaccc gaggcaggct ctgggcccgc cgggtccgtt aatctcaccg cgccgcaagg ggccacgttt taccacttga tttagcaacc ctaagcggtt tggaatctgc tttgctctca caggacctca gcccgtcgtg atcagattct cccactttct tttttctttc ctggaatgga gtgggcagtc tttttccatc tctaccgaag ttgatgttca tttttaatct tttcgcccct cacgcttttg taataatgag cactagaatg ctgcagggtt ggcctttggc ctaaaccatt tctagctatg gacttgccag atgagagcca gtgggacgaa accacctgtg gcttggctgt ttgtcagcac ccacaatgct gggcaactat ccgccgcatt gagaggggcc atcctcgaat cctcggctca tcctgcaaaa ctcccctgga tgctgaagat aaactcccag tgctcaccgt ggtagacatc ttagattccg gctttgcagc tcatcattta ccagaatgta cctttactaa ggcccattct ttattgtctc agagttcaaa gttttactcc aaatttcatg gcaggcctcc gaagggttta cctgacaaaa gtttgatcaa ctgtactaac agacttccca aatttccagt gttgaatttg aatgagacgc aacttccctg ccctgaagat gttagaaata tggttgtatt gtggatccca gaagaaacag agatacatgt gagccagcat gggaagaaga aaagaaagaa ctcggcagtg aaaagcaagt catttctggg tctctctgga aatcagtccg caggaacacg agtaggaaca ccagggatga tcgtgcctcc cccaacccca gtgcaattgt ctgaacaatt cagttcagat ttcctacctc tctgggctca atccgaagcg ttacctcagg atctactgaa ggaacttttg ccaggtggaa agcaaaccat gctctgtcca gagatgaaga taaaattggc catgatgaaa aagaatcttc ccttggaaaa gaaccgacct gacagtgtga tttcttctaa gatgtttcta tctatacacc gcctcaccct ggaaagacca gcactgcgat atcctgaacg tttgaagaaa ttacataacc tgaagacaga aggttacagg aaacagcagc agcggcagca gcagcagcag cagcaacaga agaaggtgaa aacacctatt aagaaacagg aggctaaaaa gaaagccaag agtgatccag ggatccagag cacttcacat aaacatccag ttaccaccgt tcatgaccgt ctctatggtt acagaactct gccaggtcag aacagtgaca tgaagcagca gcagcagatg gaaaaaggaa ccacttcgaa acaggattcc acggagagac caaagatgaa ctactatgac catgcggatt tccaccacag tgtaaaaagt cctgaattgt atgaaacaga acccactaac aaggacatta gtgctccagt ggacgctgtg ccagaagccc aggctgccag gcaaaagaag atctccttta acttttcaga aattatggct agcacaggct ggaactctga gctcaaacta cttaggattc ttcaggacac tgatgatgag gatgaggagg accagtcctc tggggcagag tgagaagcct ctggaggaat agactgaagg catcccctgg ggcagccgtg ttccaaagcg ggatggctgg tatcctgagg gcagcaacgt ttcacataag ggcaagagga gaggggcttc tgctctctgg agcctttacc agggcctgag ctctgagctt agggattcca ttttctttgt tcacctctac ttgcctctaa aataaatgta ggagaaaaat ccccagcctt tttaaattta gattatttcc tttccattag ggtcagaata attttggtga ttaaacacaa ctgcttttca a
Protein[edit]
Conceptually Annotated Translation of C9orf43
1 gtccgtttcccatggtgccctgcgccgctgctgggtgacggcgcggccgggcgcagcgcg 60 Upstream ORF 61 tgggagacgaaggcgtggctgactcggcggggcggatccctttaggacccgaggcaggct 120 121 ctgggcccgccgggtccgttaatctcaccgcgccgcaaggggccacgttttaccacttga 180 181 tttagcaaccctaagcggtttggaatctgctttgctctcacaggacctcagcccgtcgtg 240 241 atcagattctcccactttcttttttctttcctggaatggagtgggcagtctttttccatc 300 Upstream ORF 301 tctaccgaagttgatgttcatttttaatcttttcgcccctcacgcttttgtaataatgag 360 361 cactagaatgctgcagggttggcctttggcctaaaccatttctagctatggacttgccag 420 Kozak sequence RBS M D L P D 421 atgagagccagtgggacgaaaccacctgtggcttggctgtttgtcagcacccacaatgct 480 E S Q W D E T T C G L A V C Q H P Q C W 481 gggcaactatccgccgcattgagaggggccatcctcgaatcctcggctcatcctgcaaaa 540 A T I R R I E R G H P R I L G S S C K T 541 ctcccctggatgctgaagataaactcccagtgctcaccgtggtagacatcttagattccg 600 P L D A E D K L P V L T V V D I L D S G 601 gctttgcagctcatcatttaccagaatgtacctttactaaggcccattctttattgtctc 660 F A A H H L P E C T F T K A H S L L S Q 661 agagttcaaagttttactccaaatttcatggcaggcctccgaagggtttacctgacaaaa 720 S S K F Y S K F H G R P P K G L P D K S 721 gtttgatcaactgtactaacagacttcccaaatttccagtgttgaatttgaatgagacgc 780 L I N C T N R L P K F P V L N L N E T Q 781 aacttccctgccctgaagatgttagaaatatggttgtattgtggatcccagaagaaacag 840 L P C P E D V R N M V V L W I P E E T E 841 agatacatgtgagccagcatgggaagaagaaaagaaagaactcggcagtgaaaagcaagt 900 I H V S Q H G K K K R K N S A V K S K S 901 catttctgggtctctctggaaatcagtccgcaggaacacgagtaggaacaccagggatga 960 F L G L S G N Q S A G T R V G T P G M I 961 tcgtgcctcccccaaccccagtgcaattgtctgaacaattcagttcagatttcctacctc 1020 V P P P T P V Q L S E Q F S S D F L P L 1021 tctgggctcaatccgaagcgttacctcaggatctactgaaggaacttttgccaggtggaa 1080 W A Q S E A L P Q D L L K E L L P G G K 1081 agcaaaccatgctctgtccagagatgaagataaaattggccatgatgaaaaagaatcttc 1140 Q T M L C P E M K I K L A M M K K N L P 1141 ccttggaaaagaaccgacctgacagtgtgatttcttctaagatgtttctatctatacacc 1200 L E K N R P D S V I S S K M F L S I H R 1201 gcctcaccctggaaagaccagcactgcgatatcctgaacgtttgaagaaattacataacc 1260 L T L E R P A L R Y P E R L K K L H N L 1261 tgaagacagaaggttacaggaaacagcagcagcggcagcagcagcagcagcagcaacaga 1320 K T E G Y R K Q Q Q R Q Q Q Q Q Q Q Q K 1321 agaaggtgaaaacacctattaagaaacaggaggctaaaaagaaagccaagagtgatccag 1380 K V K T P I K K Q E A K K K A K S D P G 1381 ggatccagagcacttcacataaacatccagttaccaccgttcatgaccgtctctatggtt 1440 I Q S T S H K H P V T T V H D R L Y G Y 1441 acagaactctgccaggtcagaacagtgacatgaagcagcagcagcagatggaaaaaggaa 1500 R T L P G Q N S D M K Q Q Q Q M E K G T 1501 ccacttcgaaacaggattccacggagagaccaaagatgaactactatgaccatgcggatt 1560 T S K Q D S T E R P K M N Y Y D H A D F 1561 tccaccacagtgtaaaaagtcctgaattgtatgaaacagaacccactaacaaggacatta 1620 H H S V K S P E L Y E T E P T N K D I S 1621 gtgctccagtggacgctgtgccagaagcccaggctgccaggcaaaagaagatctccttta 1680 A P V D A V P E A Q A A R Q K K I S F N 1681 acttttcagaaattatggctagcacaggctggaactctgagctcaaactacttaggattc 1740 F S E I M A S T G W N S E L K L L R I L 1741 ttcaggacactgatgatgaggatgaggaggaccagtcctctggggcagagtgagaagcct 1800 Q D T D D E D E E D Q S S G A E *
1801 ctggaggaatagactgaaggcatcccctggggcagccgtgttccaaagcgggatggctgg 1860 1861 tatcctgagggcagcaacgtttcacataagggcaagaggagaggggcttctgctctctgg 1920 1921 agcctttaccagggcctgagctctgagcttagggattccattttctttgttcacctctac 1980 1981 ttgcctctaaaataaatgtaggagaaaaatccccagcctttttaaatttagattatttcc 2040 Polyadenylation signal 2041 tttccattagggtcagaataattttggtgattaaacacaactgcttttcaa 2091
Key:
Q Q Q R Q Q Q Q Q Q Q Q =Glycine rich region S or T = Predicted site of N-Acetylglucose Addition (YinOYang@EXPASY) S or T = Predicted O-linked glycosylation of Serine or Threonine (NetOGlyc @EXPASY) W = c-mannosylation site (NetCGlyc@EXPASY) XKX = SUMOPLOT prediction of type I sumolyation XXX = Asn-Xaa-Ser/Thr sequence (NetNGlyc@EXPASY) Asp = Asparagines predicted to be N-glycosylated (NetNGlyc@EXPASY)
Homolog[edit]
A global pairwise alignment was performed the Mouse and Human c9orf43 proteins using EMBOSS Needle Alignment on 2/3/2017. Mus musculus protein is 495 amino acids in length while the Homo sapien protein is 461 amino acids long. This alignment indicates that there is 55.6% similarity between the species. This sequence similarity and conservation allows for inference of homology, although the common ancestor is not known. Homology oftentimes suggests a common function. Gaps make up 13.6% of the alignment which are indicated by no dots or dashes. These gaps are present in both the Homo sapiens and Mus musculus amino acid chain.
mrlrssiwsg svlkaqesfd smnvadesqw deavctlsgc qhpqcwaslr rierghpril dpspkspret edklptltiv nitdtclwtq krvaqqqpse ftfpkdrpsl skpaskrqsr spkalrdkdv tsrsprplkl svlnlneakl plsenvsnmv vtwvpeetek dvspvqktdv sswpgkkrrk klrkkskpsl yypgrqysrs paaivpppsp ehhleqlspe aiplwaqvgm lpqdlleeci laheksiigp evkielskmr kslplerrrp esaisskmyl tiqrltlqrp slryparlrk lcpnlkqgeg laghgssdsl mqqgkaktfp pkqepkkkak rnvkgqygee ttsghffhds vglrisgqed qqtpweeedi ektsaethvs leevyefdky yteyyatpes avlyetvyqn ldddeetmvg ikasskdrnl knlsammdgi gwnpelkllr ilqateeede eghnsraqsk tslea
Chromosome 9 Open Reading Frame 43 (C9orf43)[edit]
This article "Chromosome 9 Open Reading Frame 43" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Chromosome 9 Open Reading Frame 43. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
- ↑ "Transcript: C9orf43-201 (ENST00000288462.4) - Exons - Homo sapiens - Ensembl genome browser 91". uswest.ensembl.org. Retrieved 2018-02-18.
- ↑ "C9orf43 - Uncharacterized protein C9orf43 - Homo sapiens (Human) - C9orf43 gene & protein". www.uniprot.org. Retrieved 2018-02-18.
- ↑ "C9orf43 chromosome 9 open reading frame 43 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.
- ↑ mieg@ncbi.nlm.nih.gov, Danielle Thierry-Mieg and Jean Thierry-Mieg, NCBI/NLM/NIH,. "AceView: Gene:C9orf43, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.