Chromosome 9 Open Reading Frame 43

Introduction

Chromosome 9 open reading frame 43 (c9orf43) is a protein coding gene located on 9q32 with 14 Exons^[1]. C9orf43 contains DUF4647, a Domain of unknown function from amino acid 1-454 . RNA-sequencing of 95 human tissue samples, including 27 total different tissue types, allowed for observation of tissue specificity^[2]. The testes samples showed the highest Reads Per Kilobase Million (RBKM) 20.215 ± 5.996^[3]. cDNA clones also showed expression in the brain, leiomyosarcoma, medulla, testis normal, the uterus, cervical carcinoma cell line^[4].

Gene[edit]

Function of c9orf43 is largely unknown, although proteins are expected to be found in the nucleus. No phenotype has been reported, nor is the in vivo function known4. Mitochondria and cytoplasm localization of C9orf43 was predicted with BioPlex 2.0 which could not be previously predicted with BioPlex 1.0 and these findings are seen to be consistent with the Human Protein Axis . Bioplex 2.0 further allowed for associations of uncharacterized proteins with better studied proteins . C9orf43 was found to be associated with amyloid beta (A4) precursor protein (APP), a high throughput interaction detected for the two purified proteins in vitro. Another association is seen between RAB GTPase activating protein 1-like (RABGAP1L) and C9orf43 through affinity capture-luminescence with C9orf43 used as bait and tagged with luciferase, and is enzymatically detected by the prey protein (RABGAP1L). C9orf43 was again used as bait in affinity capture-MS in capture by polyclonal antibody or epitope tags from cell extracts of ACTA2 which is expressed in actin, alpha 2, smooth muscle, and the aorta with an association of 0.9067. A high throughput association (0.8992) determined by Affinity Capture-MS was also seen in interaction with ACTBL2 which is expressed in actin, beta-like 2. Association by Affiinity Capture-MS was also determined in interaction with golgi-associated, gamma adaptin ear containing, ARF binding protein 1 (GGA1).

A search of “C9orf43 protein function” on PubMed revealed the gene is located on chromosome 9q32 which is within the region (9q22.3-34.1) having a 45% probability of possessing the susceptibility gene for cleft palate/lip . 50 SNPs were mapped to test for cleft lip/palate in families from United States, Spain, Turkey, Guatamala, and China. P values for c9orf43 were equal to 0.16 in the pooled population and the association was found to be not significant. A search of “C9orf43” in Google Scholar indicated CAGpolyQ repeats were identified in C9orf43, with G6 A1G1 repeat sequence found in the reference genome . Searches for “C9orf43” in Biocarta, OMIM, and AIDAN did not return any results. Any other articles linked to c9orf43 found focused on study of the entire chromosome 9 and c9orf43 was not specifically studied or analyzed.

Transcript[edit]

The region of the mRNA is shown with base pairs aligned in groups of 10.

   gtccgtttcc catggtgccc tgcgccgctg ctgggtgacg gcgcggccgg gcgcagcgcg
   tgggagacga aggcgtggct gactcggcgg ggcggatccc tttaggaccc gaggcaggct
   ctgggcccgc cgggtccgtt aatctcaccg cgccgcaagg ggccacgttt taccacttga
   tttagcaacc ctaagcggtt tggaatctgc tttgctctca caggacctca gcccgtcgtg
   atcagattct cccactttct tttttctttc ctggaatgga gtgggcagtc tttttccatc
   tctaccgaag ttgatgttca tttttaatct tttcgcccct cacgcttttg taataatgag
   cactagaatg ctgcagggtt ggcctttggc ctaaaccatt tctagctatg gacttgccag
   atgagagcca gtgggacgaa accacctgtg gcttggctgt ttgtcagcac ccacaatgct
   gggcaactat ccgccgcatt gagaggggcc atcctcgaat cctcggctca tcctgcaaaa
   ctcccctgga tgctgaagat aaactcccag tgctcaccgt ggtagacatc ttagattccg
   gctttgcagc tcatcattta ccagaatgta cctttactaa ggcccattct ttattgtctc
   agagttcaaa gttttactcc aaatttcatg gcaggcctcc gaagggttta cctgacaaaa
   gtttgatcaa ctgtactaac agacttccca aatttccagt gttgaatttg aatgagacgc
   aacttccctg ccctgaagat gttagaaata tggttgtatt gtggatccca gaagaaacag
   agatacatgt gagccagcat gggaagaaga aaagaaagaa ctcggcagtg aaaagcaagt
   catttctggg tctctctgga aatcagtccg caggaacacg agtaggaaca ccagggatga
   tcgtgcctcc cccaacccca gtgcaattgt ctgaacaatt cagttcagat ttcctacctc
   tctgggctca atccgaagcg ttacctcagg atctactgaa ggaacttttg ccaggtggaa
   agcaaaccat gctctgtcca gagatgaaga taaaattggc catgatgaaa aagaatcttc
   ccttggaaaa gaaccgacct gacagtgtga tttcttctaa gatgtttcta tctatacacc
   gcctcaccct ggaaagacca gcactgcgat atcctgaacg tttgaagaaa ttacataacc
   tgaagacaga aggttacagg aaacagcagc agcggcagca gcagcagcag cagcaacaga
   agaaggtgaa aacacctatt aagaaacagg aggctaaaaa gaaagccaag agtgatccag
   ggatccagag cacttcacat aaacatccag ttaccaccgt tcatgaccgt ctctatggtt
   acagaactct gccaggtcag aacagtgaca tgaagcagca gcagcagatg gaaaaaggaa
   ccacttcgaa acaggattcc acggagagac caaagatgaa ctactatgac catgcggatt
   tccaccacag tgtaaaaagt cctgaattgt atgaaacaga acccactaac aaggacatta
   gtgctccagt ggacgctgtg ccagaagccc aggctgccag gcaaaagaag atctccttta
   acttttcaga aattatggct agcacaggct ggaactctga gctcaaacta cttaggattc
   ttcaggacac tgatgatgag gatgaggagg accagtcctc tggggcagag tgagaagcct
   ctggaggaat agactgaagg catcccctgg ggcagccgtg ttccaaagcg ggatggctgg
   tatcctgagg gcagcaacgt ttcacataag ggcaagagga gaggggcttc tgctctctgg
   agcctttacc agggcctgag ctctgagctt agggattcca ttttctttgt tcacctctac
   ttgcctctaa aataaatgta ggagaaaaat ccccagcctt tttaaattta gattatttcc
   tttccattag ggtcagaata attttggtga ttaaacacaa ctgcttttca a

Protein[edit]

Conceptually Annotated Translation of C9orf43

       1 gtccgtttcccatggtgccctgcgccgctgctgggtgacggcgcggccgggcgcagcgcg 60     Upstream ORF
      61 tgggagacgaaggcgtggctgactcggcggggcggatccctttaggacccgaggcaggct 120
     121 ctgggcccgccgggtccgttaatctcaccgcgccgcaaggggccacgttttaccacttga 180
     181 tttagcaaccctaagcggtttggaatctgctttgctctcacaggacctcagcccgtcgtg 240
     241 atcagattctcccactttcttttttctttcctggaatggagtgggcagtctttttccatc 300    Upstream ORF
     301 tctaccgaagttgatgttcatttttaatcttttcgcccctcacgcttttgtaataatgag 360
                                                        
     361 cactagaatgctgcagggttggcctttggcctaaaccatttctagctatggacttgccag 420    Kozak sequence RBS
                                                        M  D  L  P  D   
 
     421 atgagagccagtgggacgaaaccacctgtggcttggctgtttgtcagcacccacaatgct 480
           E  S  Q  W  D  E  T  T  C  G  L  A  V  C  Q  H  P  Q  C  W
           
     481 gggcaactatccgccgcattgagaggggccatcctcgaatcctcggctcatcctgcaaaa 540
           A  T  I  R  R  I  E  R  G  H  P  R  I  L  G  S  S  C  K  T   
           
     541 ctcccctggatgctgaagataaactcccagtgctcaccgtggtagacatcttagattccg 600
           P  L  D  A  E  D  K  L  P  V  L  T  V  V  D  I  L  D  S  G   
           
     601 gctttgcagctcatcatttaccagaatgtacctttactaaggcccattctttattgtctc 660
           F  A  A  H  H  L  P  E  C  T  F  T  K  A  H  S  L  L  S  Q   
           
     661 agagttcaaagttttactccaaatttcatggcaggcctccgaagggtttacctgacaaaa 720
           S  S  K  F  Y  S  K  F  H  G  R  P  P  K  G  L  P  D  K  S  
  
     721 gtttgatcaactgtactaacagacttcccaaatttccagtgttgaatttgaatgagacgc 780
           L  I  N  C  T  N  R  L  P  K  F  P  V  L  N  L  N  E  T  Q
           
     781 aacttccctgccctgaagatgttagaaatatggttgtattgtggatcccagaagaaacag 840
           L  P  C  P  E  D  V  R  N  M  V  V  L  W  I  P  E  E  T  E   
           
     841 agatacatgtgagccagcatgggaagaagaaaagaaagaactcggcagtgaaaagcaagt 900
           I  H  V  S  Q  H  G  K  K  K  R  K  N  S  A  V  K  S  K  S   
           
     901 catttctgggtctctctggaaatcagtccgcaggaacacgagtaggaacaccagggatga 960
           F  L  G  L  S  G  N  Q  S  A  G  T  R  V  G  T  P  G  M  I   
           
     961 tcgtgcctcccccaaccccagtgcaattgtctgaacaattcagttcagatttcctacctc 1020
           V  P  P  P  T  P  V  Q  L  S  E  Q  F  S  S  D  F  L  P  L   
              
    1021 tctgggctcaatccgaagcgttacctcaggatctactgaaggaacttttgccaggtggaa 1080
           W  A  Q  S  E  A  L  P  Q  D  L  L  K  E  L  L  P  G  G  K
           
    1081 agcaaaccatgctctgtccagagatgaagataaaattggccatgatgaaaaagaatcttc 1140
           Q  T  M  L  C  P  E  M  K  I  K  L  A  M  M  K  K  N  L  P   
           
    1141 ccttggaaaagaaccgacctgacagtgtgatttcttctaagatgtttctatctatacacc 1200
           L  E  K  N  R  P  D  S  V  I  S  S  K  M  F  L  S  I  H  R   
           
    1201 gcctcaccctggaaagaccagcactgcgatatcctgaacgtttgaagaaattacataacc 1260
           L  T  L  E  R  P  A  L  R  Y  P  E  R  L  K  K  L  H  N  L   
           
    1261 tgaagacagaaggttacaggaaacagcagcagcggcagcagcagcagcagcagcaacaga 1320
           K  T  E  G  Y  R  K  Q  Q  Q  R  Q  Q  Q  Q  Q  Q  Q  Q  K   
  
    1321 agaaggtgaaaacacctattaagaaacaggaggctaaaaagaaagccaagagtgatccag 1380
           K  V  K  T  P  I  K  K  Q  E  A  K  K  K  A  K  S  D  P  G
  
    1381 ggatccagagcacttcacataaacatccagttaccaccgttcatgaccgtctctatggtt 1440
           I  Q  S  T  S  H  K  H  P  V  T  T  V  H  D  R  L  Y  G  Y
           
    1441 acagaactctgccaggtcagaacagtgacatgaagcagcagcagcagatggaaaaaggaa 1500
           R  T  L  P  G  Q  N  S  D  M  K  Q  Q  Q  Q  M  E  K  G  T   
           
    1501 ccacttcgaaacaggattccacggagagaccaaagatgaactactatgaccatgcggatt 1560
           T  S  K  Q  D  S  T  E  R  P  K  M  N  Y  Y  D  H  A  D  F   
             
    1561 tccaccacagtgtaaaaagtcctgaattgtatgaaacagaacccactaacaaggacatta 1620
           H  H  S  V  K  S  P  E  L  Y  E  T  E  P  T  N  K  D  I  S
              
    1621 gtgctccagtggacgctgtgccagaagcccaggctgccaggcaaaagaagatctccttta 1680
           A  P  V  D  A  V  P  E  A  Q  A  A  R  Q  K  K  I  S  F  N
           
    1681 acttttcagaaattatggctagcacaggctggaactctgagctcaaactacttaggattc 1740
           F  S  E  I  M  A  S  T  G  W  N  S  E  L  K  L  L  R  I  L   
           
    
    1741 ttcaggacactgatgatgaggatgaggaggaccagtcctctggggcagagtgagaagcct 1800
           Q  D  T  D  D  E  D  E  E  D  Q  S  S  G  A  E  *

    1801 ctggaggaatagactgaaggcatcccctggggcagccgtgttccaaagcgggatggctgg 1860
    1861 tatcctgagggcagcaacgtttcacataagggcaagaggagaggggcttctgctctctgg 1920
    1921 agcctttaccagggcctgagctctgagcttagggattccattttctttgttcacctctac 1980
    1981 ttgcctctaaaataaatgtaggagaaaaatccccagcctttttaaatttagattatttcc 2040    Polyadenylation signal
    2041 tttccattagggtcagaataattttggtgattaaacacaactgcttttcaa          2091

Key:

Q Q Q R Q Q Q Q Q Q Q Q =Glycine rich region S or T = Predicted site of N-Acetylglucose Addition (YinOYang@EXPASY) S or T = Predicted O-linked glycosylation of Serine or Threonine (NetOGlyc @EXPASY) W = c-mannosylation site (NetCGlyc@EXPASY) XKX = SUMOPLOT prediction of type I sumolyation XXX = Asn-Xaa-Ser/Thr sequence (NetNGlyc@EXPASY) Asp = Asparagines predicted to be N-glycosylated (NetNGlyc@EXPASY)

Homolog[edit]

A global pairwise alignment was performed the Mouse and Human c9orf43 proteins using EMBOSS Needle Alignment on 2/3/2017. Mus musculus protein is 495 amino acids in length while the Homo sapien protein is 461 amino acids long. This alignment indicates that there is 55.6% similarity between the species. This sequence similarity and conservation allows for inference of homology, although the common ancestor is not known. Homology oftentimes suggests a common function. Gaps make up 13.6% of the alignment which are indicated by no dots or dashes. These gaps are present in both the Homo sapiens and Mus musculus amino acid chain.

     mrlrssiwsg svlkaqesfd smnvadesqw deavctlsgc qhpqcwaslr rierghpril
     dpspkspret edklptltiv nitdtclwtq krvaqqqpse ftfpkdrpsl skpaskrqsr
     spkalrdkdv tsrsprplkl svlnlneakl plsenvsnmv vtwvpeetek dvspvqktdv
     sswpgkkrrk klrkkskpsl yypgrqysrs paaivpppsp ehhleqlspe aiplwaqvgm
     lpqdlleeci laheksiigp evkielskmr kslplerrrp esaisskmyl tiqrltlqrp
     slryparlrk lcpnlkqgeg laghgssdsl mqqgkaktfp pkqepkkkak rnvkgqygee
     ttsghffhds vglrisgqed qqtpweeedi ektsaethvs leevyefdky yteyyatpes
     avlyetvyqn ldddeetmvg ikasskdrnl knlsammdgi gwnpelkllr ilqateeede
     eghnsraqsk tslea

Chromosome 9 Open Reading Frame 43 (C9orf43)[edit]

This article "Chromosome 9 Open Reading Frame 43" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Chromosome 9 Open Reading Frame 43. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

↑ "Transcript: C9orf43-201 (ENST00000288462.4) - Exons - Homo sapiens - Ensembl genome browser 91". uswest.ensembl.org. Retrieved 2018-02-18.
↑ "C9orf43 - Uncharacterized protein C9orf43 - Homo sapiens (Human) - C9orf43 gene & protein". www.uniprot.org. Retrieved 2018-02-18.
↑ "C9orf43 chromosome 9 open reading frame 43 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.
↑ mieg@ncbi.nlm.nih.gov, Danielle Thierry-Mieg and Jean Thierry-Mieg, NCBI/NLM/NIH,. "AceView: Gene:C9orf43, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.

[1] "Transcript: C9orf43-201 (ENST00000288462.4) - Exons - Homo sapiens - Ensembl genome browser 91". uswest.ensembl.org. Retrieved 2018-02-18.

[2] "C9orf43 - Uncharacterized protein C9orf43 - Homo sapiens (Human) - C9orf43 gene & protein". www.uniprot.org. Retrieved 2018-02-18.

[3] "C9orf43 chromosome 9 open reading frame 43 [Homo sapiens (human)] - Gene - NCBI". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.

[4] @ncbi.nlm.nih.gov, Danielle Thierry-Mieg and Jean Thierry-Mieg, NCBI/NLM/NIH,. "AceView: Gene:C9orf43, a comprehensive annotation of human, mouse and worm genes with mRNAs or ESTsAceView". www.ncbi.nlm.nih.gov. Retrieved 2018-02-18.

[1]

[2]

[3]

[4]