吉首大学学报(自然科学版) ›› 2022, Vol. 43 ›› Issue (2): 45-52.DOI: 10.13438/j.cnki.jdzk.2022.02.008

• 无线电子与自动化 • 上一篇    下一篇

基于CTC-GRU模型的长沙方言识别

梁小林,沈湘菲,梁曌,邱海琳   

  1. (长沙理工大学数学与统计学院,湖南 长沙 410114)
  • 出版日期:2022-03-25 发布日期:2022-07-14
  • 作者简介:梁小林(1965—),男,湖南长沙人,长沙理工大学数学与统计学院副教授,主要从事图像识别与文本挖掘研究;沈湘菲(1997—),女,湖南长沙人,硕士研究生,主要从事自然语言处理研究.
  • 基金资助:
    国家自然科学基金面上资助项目(61972055);湖南省教育厅重点项目(17A003,18A145)

Changsha Dialect Recognition Based on CTC-GRU Model

LIANG Xiaolin,SHEN Xiangfei,LIANG Zhao,QIU Hailin   

  1. (School of Mathematics and Statistics Science,Changsha University of Science and Technology,Changsha 410114,China)
  • Online:2022-03-25 Published:2022-07-14

摘要:为了识别大词汇量下连续长沙话方言语音,提出了基于CTC算法的门控线性单元神经网络模型.先通过梅尔倒谱系数提取语音的特征参数,再把提取的特征参数输入门控线性单元神经网络,用CTC算法进行训练优化,得到输入序列整个的预测标签.最后在自建的长沙话方言语料库上,以词错率作为评价指标,对CTC模型、GRU模型和CTC-GRU模型进行对比,结果表明CTC-GRU模型相对于其他2个模型收敛速度更快,结果更精准.

关键词: CTC-GRU模型, 梅尔倒谱系数, 长沙话方言识别, 词错率

Abstract: In order to recognize continuous speech in Changsha dialect with a large vocabulary,a gated linear element neural network model based on Connectionist Temporal Classification(CTC) algorithm is proposed.Firstly,the characteristic parameters of speech are extracted by Mel-scale Frequency Cepstral Coefficients(MFCC),and then the extracted characteristic parameters are input into gated linear unit neural network.CTC algorithm is used for training and optimization,and the whole prediction label of input sequence is obtained.Finally,the results of the CTC model,the GRU model and the CTC-GRU model are compared on the self-built corpus of Changsha dialect,and the Word Error Rate(WER) is taken as the evaluation index.The results show that the CTC-GRU model can achieve faster convergence and greater accuracy compared with the other two models.

Key words: CTC-GRU model, MFCC, Changsha dialect recognition, WER

公众号 电子书橱 超星期刊 手机浏览 在线QQ