An approach of information extraction from web documents for automatic ontology generation
- Authors
- Yeom, KW; Park, JH
- Issue Date
- 2005-12
- Publisher
- SPRINGER-VERLAG BERLIN
- Citation
- COMPUTATIONAL INTELLIGENCE AND SECURITY, PT 1, PROCEEDINGS, v.3801, pp.450 - 457
- Abstract
- We examine an automated mechanism, which allows users to access this information in a structured manner by segmenting unformatted text records into structured elements, annotating these documents using XML tags and using specific query processing techniques. This research is the first step to make an automatic ontology generation system. Therefore, we focus on the explanation how we can automatically extract structure when seeded with a small number of training examples. We propose an approach based on Hidden Markov Models to build a powerful probabilistic model that corroborates multiple sources of information including, the sequence of elements, their length distribution, distinguishing words from the vocabulary and an optional external data dictionary. We introduce two different HMM models for information extraction from different sources such as bibliography and Call for Papers documents as a training dataset. The proposed HMM learn to distinguish the fields, and then extract title, authors, conference/journal names, etc. from the text.
- Keywords
- Ontology Generation; Web Document; Information; Extraction Approach
- ISSN
- 0302-9743
- URI
- https://pubs.kist.re.kr/handle/201004/135940
- Appears in Collections:
- KIST Article > 2005
- Files in This Item:
There are no files associated with this item.
- Export
- RIS (EndNote)
- XLS (Excel)
- XML
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.