Please use this identifier to cite or link to this item:
Title: A hybrid extraction model for Chinese noun/verb synonym bi-gram
Authors: Li, Wanyin
Lu, Qin
Subjects: Collocation extraction
Statistical model
Syntactic rules
Semantic relationship
Similarity calculation
Issue Date: 16-Dec-2011
Publisher: Institute for Digital Enhancement of Cognitive Development, Waseda University
Source: Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation (PACLIC 25), 16-18 Dec, Nanyang Technological University, Singapore, p. 430-439.
Abstract: Statistical-based collocation extraction approaches suffer from (1) low precision rate because high co-occurrence bi-grams may be syntactically unrelated and are thus not true collocations; (2) low recall rate because some true collocations with low occurrences cannot be identified successfully by statistical-based models. To integrate both syntactic rules as well as semantic knowledge into a statistical model for collocation extraction is one way to achieve a high precision while keeping a reasonable recall. This paper designs a cascade system which employs a hybrid model by integrating both syntactic and semantic knowledge into a statistical model for Chinese synonymous noun/verb collocations extraction. The grammatically bounded noun/verb collocations are extracted first from a syntactic-rule based module, which is then inputted to a semantic-based module for further retrieval of low frequent bi-gram collocations.
Rights: © 2011 The PACLIC 25 Organizing Committee and PACLIC Steering Committee
Copyright of contributed papers reserved by respective authors
Copyright 2011 by Wanyin Li, Qin Lu
Type: Conference Paper
ISBN: 978-4-905166-02-3
Appears in Collections:COMP Conference Papers & Presentations

Files in This Item:
File Description SizeFormat 
Li_Hybrid_Extraction_Bi-gram.pdf135.34 kBAdobe PDFView/Open

All items in the PolyU Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated. No item in the PolyU IR may be reproduced for commercial or resale purposes.