Journal of Jishou University(Natural Sciences Edition) ›› 2021, Vol. 42 ›› Issue (4): 38-43.DOI: 10.13438/j.cnki.jdzk.2021.04.008

• Computer and communication • Previous Articles     Next Articles

Method of Obtaining Part-of-Speech Tagging Rules Utilizing FP-Growth Algorithm

MO Liping, HUANG Yongkun   

  1. (College of Information Science & Engineering, Jishou University, Jishou 416000, Hunan China)
  • Online:2021-07-25 Published:2021-11-17

Abstract: To improve the quality of the training corpus needed by the part-of-speech (POS) tagging model, a method for acquiring POS tagging rules based on FP-Growth algorithm to automatically extract the POS tagging rules from the training corpus is proposed. A comparative experiment was carried out between the proposed method and the existing method of obtaining POS tagging rules based on Apriori algorithm. The experiment results reveal that, for small-scale training corpora of 1 000, 2 000, and 10 000 words, the number of POS tagging rules obtained by the former is the same as that of the latter, but the time consumption is only 0.013 866%, 0.010 399% and 0.003 132% of the latter, respectively, and for training corpora with a scale of 100 000 words and 1 million words, the latter cannot get any rule, but the former can still obtain effective rules within a reasonable period of time. Obviously, proposed method is feasible and efficient, and can meet the actual needs of automatically obtaining POS tagging rules from corpora of different sizes when optimizing the training corpus.

Key words: part-of-speech tagging rule, corpora, association rule mining, Apriori algorithm, FP-Growth algorithm

WeChat e-book chaoxing Mobile QQ