ExaMode project

ExaMode: Extreme-scale Analytics via Multimodal Ontology Discovery & Enhancement is a research project supported by the European Commission through the Horizon 2020 framework with id. number 825292^[1]. Its duration is from January 2019 until December 2022^[2]. The main objective of the project is to develop automatic methods to extract pathological concepts from medical reports. These concepts are used to the weak annotation of Whole Slides Images (WSI)^[3] associated with the record itself.

Project Motivation and Overview

In the last years there was a significant increase in the volume and complexity in the production of data in the life science domain^[4]. Virtual microscopy is an emerging field in the computational pathology field, a process where specialized hardware is used to generate Whole Slide Imaging (WSI), high resolution images of histological sections. These images, viewed on screen and annotated by expert pathologists, are then processed by analysis tools^[5] and used to train Machine Learning (ML) models, such as Convolutional Neural Networks (CNN), the state-of-the-art method for medical image analysis^[6].

However, CNN and other Deep Learning (DL) algorithms for image analysis require large volumes of data, in this case annotated images, to be successfully employed^[7]. The creation and annotation of these images requires a lot of human effort from expert pathologists, often requiring up to one hour per image^[8].

Currently NLP techniques are used for the extraction of labels (i.e., words describing concepts) from the medical documents (clinical trials) accompanying the WSI and written by histopathologists^[9]. By automatically extracting labels from the clinical text, it is possible to weakly annotate the corresponding clinical images, for example, signaling if the image contains a form of cancer or not.

NLP methods are, at the same time, shifting from the use of expertly curated rules to ML approaches. While this on one hand has the advantage of learning from data and generalizing to previously undiscovered patterns^[10]^[11], on the other hand they also require extensive training datasets to be successfully employed. For these reasons, the main idea of the ExaMode project is the development of automatic methods based on both machine learning and NLP techniques for the extraction of the labels from the medical documents written by histopathologists and accompanying WSI. In particular, the aim of ExaMode is the development of a knowledge discovery system to extract, link, and retrieve multimodal information from highly heterogeneous and unstructured data, like the medical data of WSI and their accompanying clinical trials^[1].

Project Results

The consortium has been developing a whole set of new methods and concepts for extreme scale analysis^[12]. The results, a series of deliverables, software, databases and other resources, are being released to the public.

Public Deliverables and Publications

The consortium released a series of public deliverables describing the results of the project^[13]. Some of the institutions participating to the project also published scientific papers in peer-reviewed venues concerning aspects of the project^[14].

Software

Open source software is being created, improved, released and tested to perform knowledge discovery in exascale digital pathology data^[15]. Potential impact of the project include the development of new methods to train deep neural networks from heterogeneous data faster, systems allowing to handle extreme scales of multimodal data with less efforts from the medical experts, increasing the speed of data throughput and access. The list of released software so far is composed by:

MedTAG^[16]^[17] An open source biomedical annotation tool for diagnostic reports. It provides tools to tag biomedical concepts contained in clinical reports. An alternative version of MedTAG, called DocTAG, has also been developed for annotating documents in the style of a typical information retrieval evaluation campaign^[18]^[19]
A Python implementation of the original software proposed in the paper Processing Megapixel Images with Deep Attention-Sampling Models^[20], originally implemented in the C and C++ languages.
CompFigSep. The implementation of a complete pipeline for compound figure separation. A compound figure is a figure containing multiple sub figures. In the context of medical scientific publications, compound figures account for a significant amount of visual data. To exploit the information contained in them, it is necessary to segment them in subfigures as independent as possible^[21].
NanoWeb. A Web-oriented search system that allows to search, access, and explore nanopublications on the Web^[22]^[23].
ExaMode CERT. A software that allows the user to extract both the entities and concepts from user-provided Colon Cancer-related medical reports^[24].
HookNet A multi-resolution convolutional neural network for semantic segmentation in histopathology WSI^[25].
Few-Shot Detection. A deep learning system for the weakly supervised object detection in digital pathology WSI ^[26].
SURF-Deeplab. A repository containing the implementation of ML models such as EfficientDet^[27] and DeeplabV3+^[28].
COLOR INFORMATION^[29]. An application based on the paper^[30] that concerns the invertible mapping of the color information in WSI patches.
ExaNet (ExaMode Semantic NETworks). A visual tool that enables the visualization of the medical reports in a tabular form. It provides searching and filtering capabilities, together with the visualization of the related underlying RDF graphs.
CERT (Colon cancer Entity Recognition Tool)^[31]. A visual tool that enables the automatic extraction of concepts from cancer medical reports, enabling the automatic generation of RDF graph corresponding to the user-provided medical report.

VIRTUM ALBUM^[32], by Microscopeit, is a cloud-based software to manage, store, and annotate histopathological slide collections, targeting individual and institutional users. The prototype was tested by pathologists and represented a useful tool for advancing in the project.

Ontology

Ontologies today are used in many applications characterized by domain-specific terms, including NLP^[33] and Deep Learning^[34]. Ontologies, among other things, allow the annotation of text, establishing links between the concepts expressed in the text, the defined classes and the correlated information. Ontologies are used to create a shared model of the reality of interest, in order to resolve the problems of data heterogeneity and integration problems^[33].

The ExaMode project released its own ontology^[35] written in the OWL language. This ontology contains the classes and relationships characterizing the domain concerning the clinical trials considered by the project, and in particular the four diseases studied in the project: colon cancer, uterine cervix cancer, lung cancer, and celiac disease.

Partners

The ExaMode consortium is composed by the following partners^[36]:

Information Systems Institute, University of Applied Sciences Western Switzerland (HES-SO Valais), Sierre, Switzerland, Coordinator of the project
Department of Information Engineering, University of Padua, Padua, Italy.
SIRMA AI (previously Ontotext), Bulgaria
Radboud University, Nijmegen, Netherlands
Microscopeit, Poland
Cannizzaro Hospital, Catania,Italy
SurfSara, Netherlands^[37]

References

↑ ^1.0 ^1.1 "The EXtreme-scale Analytics via Multimodal Ontology Discovery & Enhancement Project". 2022. Retrieved 14 January 2022.
↑ "The ExaMode Project". Retrieved 14 January 2022.
↑ "Whole slide imaging in pathology: advantages, limitations, and emerging perspectives". Retrieved 18 January 2022.
↑ Hoehndorf, Robert; Schofield, Paul N.; Gkuotos, Georgios V. (2015). "The role of ontologies in biological and biomedical research: a functional perspective". Briefings Bioinform. 16 (6): 1069–1080. doi:10.1093/bib/bbv011.
↑ Evans, Andrew J.; Salama, Mohamed E.; Henricks, Walter H.; Pantanowitz, Liron (2017). "Implementation of whole slide imaging for clinical purposes: issues to consider from the perspective of early adopters". Archives of pathology & laboratory medicine. the College of American Pathologists. 141 (7): 944--959.
↑ Madabhushi, Anant; Lee, George (2016). "Image analysis and machine learning in digital pathology: Challenges and opportunities". Medical Image Analysis. Elsevier. 33: 170--175.
↑ Lindman, Karin; Rose, Jeromino F.; Lindvall, Martin; Lundstrom, Claes; Treanor, Darren (2019). "Annotations, ontologies, and whole slide images--Development of an annotated ontology-driven whole slide image library of normal and abnormal human tissue". Journal of pathology informatics. Wolters Kluwer--Medknow Publications. 10.
↑ Krupinski, Elizabeth A.; Graham, Anna R.; Weinstein, Ronald S. (2013). "Characterizing the development of visual search expertise in pathology residents viewing whole slide images". Human pathology. Elsevier. 44 (3): 357--364.
↑ Campanella, Gabriele; Hanna, Matther G.; Geneslaw, Luke; Miraflor, Allen; Silva, Victor Weneck Kraysss; Busam, Klaus J.; Borgi, Edi; Reuter, Victor E.; Klimstra, David S.; Fuchs, Thomas J. (2019). "Clinical-grade computational pathology using weakly supervised deep learning on whole slide images". Nature Medicine. Nature Publishing Group. 25 (8): 1301--1309.
↑ Chiticariu, Laura; Li, Yunyao; Reiss, Frederick (2013). "Rule-based information extraction is dead! long live rule-based information extraction systems!". Proceedings of the 2013 conference on empirical methods in natural language processing: 827--832.
↑ Cheplygina, Veronika; de Bruijne, Marleen; Josien, P. W. Pluim (2019). "Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis". Medical Image Anal. 54: 280--296. doi:10.1016/j.media.2019.03.009.
↑ "ExaMode Objectives". Retrieved 24 January 2022.
↑ "ExaMode list of deliverables". Deliverables.
↑ "ExaMode list of publications". Publications. Retrieved 24 January 2022.
↑ "ExaMode Software". Software.
↑ "MedTag". Retrieved 24 January 2022.
↑ Giachelle, Fabio; Irrera, Ornella; Silvello, Gianmaria (2021). "MedTAG: a portable and customizable annotation tool for biomedical documents". BMC Medical Informatics Decis. Mak. 21 (1): 352. doi:10.1186/s12911-021-01706-4.
↑ "DocTAG". Retrieved 24 January 2022.
↑ Giachelle, Fabio; Silvello, Gianmaria (2021). "DocTAG: A Customizable Annotation Tool for Ground Truth Creation". BMC Medical Informatics and Decision Making. 21 (352).
↑ Katharopoulos, Angelo; Fleuret, Francois. "Processing Megapixel Images with Deep Attention-Sampling Models". Proceedings of the 36th International Conference on Machine Learning, ICML 2019. 97: 3282--3291.
↑ Zou, Jie; Thoma, George R.; Antani, Sameer K. "Unified deep neural network for segmentation and labeling of multipanel biomedical figures". J. Assoc. Inf. Sci. Technol. 71 (11): 1327--1340. doi:10.1002/asi.24334. Retrieved 24 January 2022.
↑ "NanoWeb". Retrieved 24 January 2022.
↑ Giachelle, Fabio; Dosso, Dennis; Silvello, Gianmaria. "NanoWeb: Search, Access and Explore Life Science Nanopublications on the Web" (PDF). roceedings of the 29th Italian Symposium on Advanced Database Systems, SEBD, CEUR Workshop Proceedings. 2994: 506--513.
↑ "ExaMode CERT". Retrieved 24 January 2022.
↑ "ExaMode HookNet". Retrieved 24 January 2022.
↑ "Few Shot Detection". Retrieved 24 January 2022.
↑ Tan, Mingxing; Pang, Ruoming; Lee, Quoc V. "EfficientDet: Scalable and Efficient Object Detection". CoRR. abs/1911.09070.
↑ Chen, Liang-Chieh; Zhu, Yukun; Papandreou, George; Schroff, Florian; Hartwig, Adam. "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation". CoRR. abs/1802.02611.
↑ "Color Information". Retrieved 24 January 2022.
↑ Chen, Ricky T. Q.; Behrmann, Jens; Duvenaud, David; Jacobsen, Jorn-Henrik. "Residual Flows for Invertible Generative Modeling". Advances in Neural Information Processing Systems.
↑ "ExaMode CERT". Retrieved 24 January 2022.
↑ "Virtum prototipe published by MicroscopeIT".
↑ ^33.0 ^33.1 Freitas, Fred; Schulz, Stefan; Moraes, Eduardo. "Survey of current terminologies and ontologies in biology and medicine". RECIIS-Electronic Journal in Communication, Information and Innovation in Health. 3 (1): 7--18.
↑ Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami; Setio, Arnaud Arindra Adiyoso; Ciompi, Francesco; Ghafoorian, Mohsen; Van Der Laak, Jeroen; Van Ginneken, Van Ginneken; Sanchez, Clara I. "A survey on deep learning in medical image analysis". Medical Image Analysis. 42: 60--88.
↑ "The ExaMode Ontology". The ExaMode Ontology. Retrieved 24 January 2022.
↑ "the ExaMode partners". Partners. Retrieved 24 January 2022.
↑ "Surfsara website". Retrieved 14 January 2022.

This article "ExaMode project" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:ExaMode project. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[ExaModeWebsite-1] 1.0 ^1.1 "The EXtreme-scale Analytics via Multimodal Ontology Discovery & Enhancement Project". 2022. Retrieved 14 January 2022.

[2] "The ExaMode Project". Retrieved 14 January 2022.

[wsi-3] "Whole slide imaging in pathology: advantages, limitations, and emerging perspectives". Retrieved 18 January 2022.

[hoehndorf-4] Hoehndorf, Robert; Schofield, Paul N.; Gkuotos, Georgios V. (2015). "The role of ontologies in biological and biomedical research: a functional perspective". Briefings Bioinform. 16 (6): 1069–1080. doi:10.1093/bib/bbv011.

[5] Evans, Andrew J.; Salama, Mohamed E.; Henricks, Walter H.; Pantanowitz, Liron (2017). "Implementation of whole slide imaging for clinical purposes: issues to consider from the perspective of early adopters". Archives of pathology & laboratory medicine. the College of American Pathologists. 141 (7): 944--959.

[6] Madabhushi, Anant; Lee, George (2016). "Image analysis and machine learning in digital pathology: Challenges and opportunities". Medical Image Analysis. Elsevier. 33: 170--175.

[7] Lindman, Karin; Rose, Jeromino F.; Lindvall, Martin; Lundstrom, Claes; Treanor, Darren (2019). "Annotations, ontologies, and whole slide images--Development of an annotated ontology-driven whole slide image library of normal and abnormal human tissue". Journal of pathology informatics. Wolters Kluwer--Medknow Publications. 10.

[8] Krupinski, Elizabeth A.; Graham, Anna R.; Weinstein, Ronald S. (2013). "Characterizing the development of visual search expertise in pathology residents viewing whole slide images". Human pathology. Elsevier. 44 (3): 357--364.

[campanella-9] Campanella, Gabriele; Hanna, Matther G.; Geneslaw, Luke; Miraflor, Allen; Silva, Victor Weneck Kraysss; Busam, Klaus J.; Borgi, Edi; Reuter, Victor E.; Klimstra, David S.; Fuchs, Thomas J. (2019). "Clinical-grade computational pathology using weakly supervised deep learning on whole slide images". Nature Medicine. Nature Publishing Group. 25 (8): 1301--1309.

[10] Chiticariu, Laura; Li, Yunyao; Reiss, Frederick (2013). "Rule-based information extraction is dead! long live rule-based information extraction systems!". Proceedings of the 2013 conference on empirical methods in natural language processing: 827--832.

[11] Cheplygina, Veronika; de Bruijne, Marleen; Josien, P. W. Pluim (2019). "Not-so-supervised: A survey of semi-supervised, multi-instance, and transfer learning in medical image analysis". Medical Image Anal. 54: 280--296. doi:10.1016/j.media.2019.03.009.

[12] "ExaMode Objectives". Retrieved 24 January 2022.

[13] "ExaMode list of deliverables". Deliverables.

[14] "ExaMode list of publications". Publications. Retrieved 24 January 2022.

[15] "ExaMode Software". Software.

[16] "MedTag". Retrieved 24 January 2022.

[17] Giachelle, Fabio; Irrera, Ornella; Silvello, Gianmaria (2021). "MedTAG: a portable and customizable annotation tool for biomedical documents". BMC Medical Informatics Decis. Mak. 21 (1): 352. doi:10.1186/s12911-021-01706-4.

[18] "DocTAG". Retrieved 24 January 2022.

[19] Giachelle, Fabio; Silvello, Gianmaria (2021). "DocTAG: A Customizable Annotation Tool for Ground Truth Creation". BMC Medical Informatics and Decision Making. 21 (352).

[20] Katharopoulos, Angelo; Fleuret, Francois. "Processing Megapixel Images with Deep Attention-Sampling Models". Proceedings of the 36th International Conference on Machine Learning, ICML 2019. 97: 3282--3291.

[21] Zou, Jie; Thoma, George R.; Antani, Sameer K. "Unified deep neural network for segmentation and labeling of multipanel biomedical figures". J. Assoc. Inf. Sci. Technol. 71 (11): 1327--1340. doi:10.1002/asi.24334. Retrieved 24 January 2022.

[22] "NanoWeb". Retrieved 24 January 2022.

[23] Giachelle, Fabio; Dosso, Dennis; Silvello, Gianmaria. "NanoWeb: Search, Access and Explore Life Science Nanopublications on the Web" (PDF). roceedings of the 29th Italian Symposium on Advanced Database Systems, SEBD, CEUR Workshop Proceedings. 2994: 506--513.

[24] "ExaMode CERT". Retrieved 24 January 2022.

[25] "ExaMode HookNet". Retrieved 24 January 2022.

[26] "Few Shot Detection". Retrieved 24 January 2022.

[27] Tan, Mingxing; Pang, Ruoming; Lee, Quoc V. "EfficientDet: Scalable and Efficient Object Detection". CoRR. abs/1911.09070.

[28] Chen, Liang-Chieh; Zhu, Yukun; Papandreou, George; Schroff, Florian; Hartwig, Adam. "Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation". CoRR. abs/1802.02611.

[29] "Color Information". Retrieved 24 January 2022.

[30] Chen, Ricky T. Q.; Behrmann, Jens; Duvenaud, David; Jacobsen, Jorn-Henrik. "Residual Flows for Invertible Generative Modeling". Advances in Neural Information Processing Systems.

[31] "ExaMode CERT". Retrieved 24 January 2022.

[32] "Virtum prototipe published by MicroscopeIT".

[freitas-33] 33.0 ^33.1 Freitas, Fred; Schulz, Stefan; Moraes, Eduardo. "Survey of current terminologies and ontologies in biology and medicine". RECIIS-Electronic Journal in Communication, Information and Innovation in Health. 3 (1): 7--18.

[34] Litjens, Geert; Kooi, Thijs; Bejnordi, Babak Ehteshami; Setio, Arnaud Arindra Adiyoso; Ciompi, Francesco; Ghafoorian, Mohsen; Van Der Laak, Jeroen; Van Ginneken, Van Ginneken; Sanchez, Clara I. "A survey on deep learning in medical image analysis". Medical Image Analysis. 42: 60--88.

[35] "The ExaMode Ontology". The ExaMode Ontology. Retrieved 24 January 2022.

[36] "the ExaMode partners". Partners. Retrieved 24 January 2022.

[37] "Surfsara website". Retrieved 14 January 2022.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

[33]

[34]

[35]

[36]

[37]