吉首大学学报(自然科学版)

• 计算机 • 上一篇    下一篇

基于模糊C-均值聚类的缺失数据填充方法

黄紫成,李影   

  1. (仰恩大学工程技术学院,福建 泉州 362014)
  • 出版日期:2020-03-25 发布日期:2020-09-08
  • 作者简介:黄紫成(1984—),男,福建泉州人,仰恩大学工程技术学院讲师,硕士,主要从事机器学习、粗糙集研究.
  • 基金资助:

    福建省中青年教师教育科研项目(JT180671);福建省科技厅软科学项目(2018R0097)

Missing Value Filling Method Based on Fuzzy C-Means Algorithm

HANG Zicheng, LI Ying   

  1. (College of Engineering Technology, Yang-En University, Quanzhou 362014, Fujian China)
  • Online:2020-03-25 Published:2020-09-08

摘要:

针对缺失数据的有效填充问题,提出利用模糊C-均值聚类(FCM)算法的隶属度矩阵作为待填数据的加权权重.首先使用同一属性均值对缺失数据作预填充,再进行FCM以得到每个类别的隶属度矩阵,最后用该矩阵作为权重去乘以每个类别的属性均值,得到最终的填充数据.在UCI数据实验中,将FCM填充算法与k近邻(KNN)填充算法作对比分析,结果表明,FCM填充得到的均方根误差总体小于KNN填充.

关键词: 缺失数据, 模糊C-均值聚类, 隶属度矩阵, k近邻

Abstract:

For effective missing data filling, the membership matrix of fuzzy C-means algorithm is proposed as the weighted weight of the data to be filled in. Firstly, the missing data is pre-filled with the same attribute mean, then the membership matrix of each category is obtained by means of fuzzy C-means algorithm. Finally, the matrix is used as the weight to multiply the attribute mean of each category as the final filling data. In the UCI data experiment, compared with the KNN filling, the results show that the error in the method based on the fuzzy C-means algorithm filling is smaller than in the KNN filling.

Key words: missing value, fuzzy C-means algorithm, membership matrix, k-nearest neighbor

公众号 电子书橱 超星期刊 手机浏览 在线QQ