DatumKB
Content | |
---|---|
Description | DatumKB is a collection of experimental results involving the function and regulation of human proteins in cultured cells |
Organisms | human |
Contact | |
Laboratory | Computer Sciences Lab, SRI International http://www.csl.sri.com |
Access | |
Data format | Flat file knowledgebase, text and Json |
Website | http://datum.csl.sri.com |
Download URL | http://pl.csl.sri.com/datum-json.html |
Miscellaneous | |
Curation policy | manual |
DatumKB is a freely accessible database of experimental results involving the function and regulation of human proteins in cultured cells. The results are manually curated from biological research literature and stored using a "shorthand" language made up of independent units that can be understood by biologists, traced back to their source, and have enough structure to be interrogated computationally. These units are called "datums" to distinguish them from "data" which is generally used for collections of results.
The information in a datum is expressed using controlled vocabularies with links to well known databases such as HGNC, UniProt, PubChem, and Cellosaurus. The DatumKB can be searched using the interface at http://datum.csl.sri.com. The results can be downloaded as the original datum format, a simplified text format, or a JSON file.
Background[edit]
Datums were developed to be used as evidence in a conceptual model of intracellular signal transduction. Collecting experimental evidence involving the behavior of cellular proteins is problematic because the primary sources use images to present the results from two commonly used detection methods. The first method is immunocytochemistry which uses images to depict changes in the intensity and location of a protein withing a cell. The second method is the western blot. Here changes in intensity and the location of a protein on a gel provides information about post-translational modifications. Computers cannot interpret images so the author statements are often used as a substitute. Indexing of author statements involves some form of natural language processing which can cause low precision and insufficient recall. In addition, much of the information in the images that is useful in a model is not mentioned by the authors because it does not further their hypothesis.
Data Structure[edit]
A datum is a structured, computer readable summary of an experimental finding representing a single biological assay. The information in a datum includes:
- the Subject (a protein, gene, or cellular phenotype)
- the Assay (what was measured and how)
- the Treatment (addition of a drug, peptide, or stress for how long)
- the Change (increase, decrease, no change)
- the Environment (cells and culture conditions used at the start of a treatment)
- the Source (a PubMed ID and the number of the figure or table containing the experimental result)
Uses[edit]
Datums were originally designed to be used as evidence for rules in a collection Pathway Logic maps of intracellular signal transduction (STM8, http://pl.csl.sri.com/online.html). They have since been used:
- to automate the curation of executable models.[1]
- to organize experimental results for curation of models of signal transduction processes and provide supporting evidence for signaling rules.[2]
- as a training set and target information structure in developing for natural language processing tools for reading biological papers.[3][4]
References[edit]
- ↑ Nigam V, Donaldson R, Knapp M, McCarthy T, Talcott C (2015). "Inferring Executable Models from Formalized Experimental Evidence". In Roux O, Bourdon J. Computational Methods in Systems Biology. International Conference on Computational Methods in Systems Biology, CMSB 2015. Springer, Cham. doi:10.1007/978-3-319-23401-4_9. ISBN 978-3-319-23400-7.
- ↑ Talcott C (2016). "The Pathway Logic Formal Modeling System: Diverse views of a formal representation of signal transduction.". Workshop on Formal Methods in Bioinformatics and Biomedicine. 2016 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 1. Shenzhen, China. pp. 1468–1476. doi:10.1109/BIBM.2016.7822740.
- ↑ Freitag D, Niekrasz J (2016). "Feature Derivation for Exploitation of Distant Annotation via Pattern Induction against Dependency Parses". Proceedings of the 15th Workshop on Biomedical Natural Language Processing. Association for Computational Linguistics. pp. 36–45. doi:10.18653/v1/W16-29.
- ↑ Freitag D, Kalmar P, Yeh E (2017). "Discourse-Wide Extraction of Assay Frames from the Biological Literature". Proceedings of the Biomedical NLP Workshop associated with RANLP 2017. Varna, Bulgaria: INCOMA Ltd. pp. 15–23. doi:10.26615/978-954-452-044-1_003.
External links[edit]
This article "DatumKB" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:DatumKB. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.