In recent years many attempts have been made to design suitable sets of rules aimed at extracting the semantic meaning from plain text, and to achieve annotation, but very few approaches make extensive use of grammars. Current systems are mainly focused on extracting the semantic role of the entities described in the text. This approach has limitations: in such applications the semantic role is conceived merely as the meaning of the involved entities without considering their context. As an example, current semantic annotators often specify a date entity without any annotation regarding the kind of the date itself i.e. a birth date, a book publication date, and so on. Moreover, these systems use ontologies that have been developed specifically for the system’s purposes and have reduced portability. Extensive use of both linguistic resources and semantic representations of the domain are needed in this scenario; the semantic representation of the domain addresses the semantic interpretation of the con- text, while NLP tools can help to solve some linguistic problems related to the semantic annotation, as synonymy, ambiguities, and co-references.A novel framework inspired to Cognitive Linguistics theories is proposed in this work that is aimed at facing the problem outlined above. In particular, our work is based on Construction Grammar (CxG). CxG defines a ”construction” as a form- meaning couple. We use RDF triples in the domain ontology as the ”semantic seeds” to build constructions. A suitable set of rules based on linguistic typology have been designed to infer semantics and syntax from the semantic seed, while combining them as the poles of constructions. A hierarchy of rules to infer syntactic patterns for either single words or sentences using WordNet and FrameNet has been designed to overcome the limitations when expressing the syntactic poles using solely the terms stated in the ontology. As a consequence, semantic annotation of plain text is achieved by computing all possible syntactic forms for the same meaning during the analysis of document corpora.The proposed framework has been finalized to semantic annota- tion of Wikipedia pages; the result is a system for automatic gen- eration of SemanticWeb wiki contents from standard Wikipedia pages, leading to a possible solution of the big challenge to make existing wiki sources semantic wikis.
|Numero di pagine||8|
|Stato di pubblicazione||Published - 2012|
All Science Journal Classification (ASJC) codes