• outofmind > out-of-vocabulary
  • out-of-vocabulary

    免费下载 下载该文档 文档格式:PDF   更新时间:2010-02-02   下载次数:0   点击次数:2
    文档基本属性
    文档语言:English
    文档格式:pdf
    文档作者:LIEUQUAN1
    关键词:
    主题:
    备注:
    点击这里显示更多文档属性
    Computational Linguistics and Chinese Language Processing vol. 3, no. 1, February 1998, pp. 27-44 Computational Linguistics Society of R. O. C.
    27
    Unknown Word Detection for Chinese by a Corpus-based Learning Method
    Keh-Jiann Chen*, Ming-Hong Bai*
    Abstract
    One of the most prominent problems in computer processing of the Chinese language is identification of the words in a sentence. Since there are no blanks to mark word boundaries, identifying words is difficult because of segmentation ambiguities and occurrences of out-of-vocabulary words (i.e., unknown words). In this paper, a corpus-based learning method is proposed which derives sets of syntactic rules that are applied to distinguish monosyllabic words from monosyllabic morphemes which may be parts of unknown words or typographical errors. The corpus-based learning approach has the advantages of: 1. automatic rule learning, 2. automatic evaluation of the performance of each rule, and 3. balancing of recall and precision rates through dynamic rule set selection. The experimental results show that the rule set derived using the proposed method outperformed hand-crafted rules produced by human experts in detecting unknown words.
    1. Introduction
    One of the most prominent problems in computer processing of Chinese language is the identification of the words in a sentence. There are no blanks to mark word boundaries in Chinese text. As a result, identifying words is difficult because of segmentation ambiguities and occurrences of out-of-vocabulary words ( i.e., unknown words). For instance, 'Wang, Ying-Xiong' is a typical example of an unknown in (1), the proper name word, and it has ambiguous segmentation of 'king' 'hero'. Another example in (1) 'university student in Taiwan' also has ambiguous segmentations of 'Taiwan' 'university student' , 'National Taiwan University' 'give birth to' ,and 'Taiwan' 'university' 'give birth to' etc.:
    3
    UR3 UR3
    UR (1) "+!"1 UR3`

    下一页

  • 下载地址 (推荐使用迅雷下载地址,速度快,支持断点续传)
  • 免费下载 PDF格式下载
  • 您可能感兴趣的
  • outofmymind  outofmind  getoutofmymind  timeoutofmind  outofyourmind