An ontology is an explicit specification of conceptualization which intended to capture and formalize the domain knowledge. The ontologies that formalize knowledge about life sciences are biomedical ontologies. The biomedical ontologies are consisted of concepts (are central units of description) which are used for unambiguous and consistent naming of biomedical data and events. The events refer the tasks, providing descriptions of actions among one or more entities in biomedical domain.
The purpose of study is to understand how the terms are composed in biomedical event ontologies. What are the term patterns exist between biomedical terms? How to represent these term patterns? How to categorize according to part-of-speech? The most important is the identification of primary and secondary events in biomedical event ontology terms and their location in biomedical terms patters.
The proposed method produces Finite State Machine explained in the results chapter 4. The FSM identify the primary event e.g. last token of biomedical term and secondary event is immediately followed by preposition in biomedical term. The FSM is a computation model which is used to design for computer programs. FSM is an abstract machine which can be easily programmed and thus is a step towards automatic event extraction which will further lead us to develop the domain specific application, e.g. semantic search, event database curation support, event extraction and event annotations.
For our study, we selected Gene Ontology (GO). The GO provides ontologies of gene products in three separate domains (i) Cellular components (ii) Molecular functions (iii) Biomedical process for our thesis we have selected biomedical process. For the compositional and structural analysis, a biomedical term is broken down into words, and each word is called as “token”. This process is called tokenization of terms. The tokens are categorized according to natural language grammar such as noun, adjectives, preposition, relational adjectives, etc.
After categorization of tokens on the basis of POS tagging, and the conversion into abbreviation form the resulting form is called as a biomedical term pattern. After collecting the sample of biomedical term patterns we generalized these term patterns and built Finite State Machines.
The research proposed Finite State Machine which provide abstract and concise view of the structure and composition of biomedical terms in biomedical event ontologies. Finite State Machine helps in identification of biomedical events (primary and secondary) in biomedical terms and their possible locations. These Finite State Machines can be used to build computer programs. The computer program will help in building domain specific application e.g. biomedical event corpus, annotation for biomedical event terms, etc.
The results (and evaluation of results) represent that FSMs are the generalized solution to explain the structure of biomedical terms, can identify the events and their locations in biomedical event ontologies.