Computational Linguistics and Chinese Language Processing vol. 3, no. 1, February 1998, pp. 27-44 Computational Linguistics Society of R. O. C.
27
Unknown Word Detection for Chinese by a Corpus-based Learning Method
Keh-Jiann Chen*, Ming-Hong Bai*
Abstract
One of the most prominent problems in computer processing of the Chinese language is identification of the words in a sentence. Since there are no blanks to mark word boundaries, identifying words is difficult because of segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words). In this paper, a corpus-based learning method is proposed which derives sets of syntactic rules that are applied to distinguish monosyllabic words from monosyllabic morphemes which may be parts of unknown words or typographical errors. The corpus-based learning approach has the advantages of: 1. automatic rule learning, 2. automatic evaluation of the performance of each rule, and 3. balancing of recall and precision rates through dynamic rule set selection. The experimental results show that the rule set derived using the proposed method outperformed hand-crafted rules produced by human experts in detecting unknown words.
1. Introduction
One of the most prominent problems in computer processing of Chinese language is the identification of the words in a sentence. There are no blanks to mark word boundaries in Chinese text. As a result, identifying words is difficult because of segmentation ambiguities and occurrences of out-of-vocabulary words ( i.e., unknown words). For instance, 'Wang, Ying-Xiong' is a typical example of an unknown in (1), the proper name word, and it has ambiguous segmentation of 'king' 'hero'. Another example in (1) 'university student in Taiwan' also has ambiguous segmentations of 'Taiwan' 'university student' , 'National Taiwan University' 'give birth to' ,and 'Taiwan' 'university' 'give birth to' etc.:
3
UR3 UR3
UR (1) "+!"1 UR3`
- outofmind > out-of-vocabulary
-
out-of-vocabulary
下载该文档 文档格式:PDF 更新时间:2010-02-02 下载次数:0 点击次数:2文档基本属性 文档语言: English 文档格式: pdf 文档作者: LIEUQUAN1 关键词: 主题: 备注: 点击这里显示更多文档属性 经理: 单位: 分类: 创建时间: 上次保存者: 修订次数: 编辑时间: 文档创建者: 修订: 加密标识: 幻灯片: 段落数: 字节数: 备注: 演示格式: 上次保存时间:
- 下载地址 (推荐使用迅雷下载地址,速度快,支持断点续传)
- PDF格式下载
- 更多文档...
-
上一篇:Out-of-synchronisation
下一篇:out-of-tolerance
点击查看更多关于outofmind的相关文档
- 您可能感兴趣的
- outofmymind outofmind getoutofmymind timeoutofmind outofyourmind
- 大家在找
-
- · 江苏空心杯马达
- · 《51单片机应用实例详解》
- · 甘肃行测历年真题
- · 香椿可行性论证报告
- · 影视文学鉴赏
- · 武汉赶集网二手车
- · 华西妇女儿童医院
- · 手动压力机商务宝典
- · 智慧城5期
- · 混凝土重力坝
- · 高等数学理工类历届考研真题集
- · 招聘仓库保管员
- · 越语口语教案
- · 3d缩水软件免费下载
- · 西门子上海总部地址
- · myspace.com/582951350/blog/545810549
- · 读书郎点读机如何下载
- · 丙烯酸树脂
- · 国家规定的特殊工种
- · 优酷网韩剧
- · 中国设计网367art
- · 仪表自动化试题
- · 云南电工操作证考试卷
- · 形象设计大专学校
- · 沈阳东方斯卡拉dj兔子
- · geplc代理
- · qq飞车断位漂移教学
- · 论信用卡诈骗罪
- · 无锡招聘网58
- · 乌鲁木齐汽车租赁公司
- 赞助商链接