C语言中K-means算法实现代码

作者：洱冬橙66_156 | 来源：互联网 | 2022-03-18 04:16

这篇文章主要为大家详细介绍了C语言中K-means算法的实现代码，具有一定的参考价值，感兴趣的小伙伴们可以参考一下

K-means算法是很典型的基于距离的聚类算法，采用距离作为相似性的评价指标，即认为两个对象的距离越近，其相似度就越大。该算法认为簇是由距离靠近的对象组成的，因此把得到紧凑且独立的簇作为最终目标。

算法过程如下：

1）从N个样本随机选取K个样本作为质心
2）对剩余的每个样本测量其到每个质心的距离，并把它归到最近的质心的类
3）重新计算已经得到的各个类的质心
4）迭代2～3步直至新的质心与原质心相等或小于指定阈值，算法结束

#include 
#include 
#include 
#include 
#include 
 
#define DIMENSIOM  2    //目前只是处理2维的数据 
#define MAX_ROUND_TIME 100   //最大的聚类次数 
 
typedef struct Item{ 
  int dimension_1;    //用于存放第一维的数据 
  int dimension_2;    //用于存放第二维的数据 
  int clusterID;     //用于存放该item的cluster center是谁 
}Item; 
Item* data; 
 
typedef struct ClusterCenter{ 
  double dimension_1; 
  double dimension_2; 
  int clusterID; 
}ClusterCenter; 
ClusterCenter* cluster_center_new; 
 
int isContinue; 
 
int* cluster_center;    //记录center 
double* distanceFromCenter; //记录一个“点”到所有center的距离 
int data_size; 
char filename[200]; 
int cluster_count; 
 
void initial(); 
void readDataFromFile(); 
void initial_cluster(); 
void calculateDistance_ToOneCenter(int itemID, int centerID, int count); 
void calculateDistance_ToAllCenter(int itemID); 
void partition_forOneItem(int itemID); 
void partition_forAllItem_OneCluster(int round); 
void calculate_clusterCenter(int round); 
void K_means(); 
void writeClusterDataToFile(int round); 
void writeClusterCenterToFile(int round); 
void compareNew_OldClusterCenter(double* new_X_Y); 
void test_1(); 
 
int main(int argc, char* argv[]){ 
  if( argc != 4 ) 
  { 
    printf("This application need other parameter to run:" 
        "\n\t\tthe first is the size of data set," 
        "\n\t\tthe second is the file name that contain data" 
        "\n\t\tthe third indicate the cluster_count" 
        "\n"); 
    exit(0); 
  } 
  srand((unsigned)time(NULL)); 
  data_size = atoi(argv[1]); 
  strcat(filename, argv[2]); 
  cluster_count = atoi(argv[3]); 
 
  initial(); 
  readDataFromFile(); 
  initial_cluster(); 
  //test_1(); 
  //partition_forAllItem_OneCluster(); 
  //calculate_clusterCenter(); 
  K_means(); 
  return 0; 
} 
 
/* 
 * 对涉及到的二维动态数组根据main函数中传入的参数分配空间 
 * */ 
void initial(){ 
  data = (Item*)malloc(sizeof(struct Item) * (data_size + 1)); 
  if( !data ) 
  { 
    printf("malloc error:data!"); 
    exit(0); 
  } 
  cluster_center = (int*)malloc(sizeof(int) * (cluster_count + 1)); 
  if( !cluster_center ) 
  { 
    printf("malloc error:cluster_center!\n"); 
    exit(0); 
  } 
  distanceFromCenter = (double*)malloc(sizeof(double) * (cluster_count + 1)); 
  if( !distanceFromCenter ) 
  { 
    printf("malloc error: distanceFromCenter!\n"); 
    exit(0); 
  } 
  cluster_center_new = (ClusterCenter*)malloc(sizeof(struct ClusterCenter) * (cluster_count + 1)); 
  if( !cluster_center_new ) 
  { 
    printf("malloc cluster center new error!\n"); 
    exit(0); 
  } 
} 
 
/* 
 * 从文件中读入x和y数据 
 * */ 
void readDataFromFile(){ 
  FILE* fread; 
  if( NULL == (fread = fopen(filename, "r"))) 
  { 
    printf("open file(%s) error!\n", filename); 
    exit(0); 
  } 
  int row; 
  for( row = 1; row <= data_size; row++ ) 
  { 
    if( 2 != fscanf(fread, "%d %d ", &data[row].dimension_1, &data[row].dimension_2)) 
    { 
      printf("fscanf error: %d\n", row); 
    } 
    data[row].clusterID = 0; 
  } 
} 
 
/* 
 * 根据从主函数中传入的@cluster_count(聚类的个数)来随机的选择@cluster_count个 
 * 初始的聚类的起点 
 * */ 
 
void initial_cluster(){ 
  //辅助产生不重复的数 
  int* auxiliary; 
  int i; 
  auxiliary = (int*)malloc(sizeof(int) * (data_size + 1)); 
  if( !auxiliary ) 
  { 
    printf("malloc error: auxiliary"); 
    exit(0); 
  } 
  for( i = 1; i <= data_size; i++ ) 
  { 
    auxiliary[i] = i; 
  } 
   
  //产生初始化的cluster_count个聚类 
  int length = data_size; 
  int random; 
  for( i = 1; i <= cluster_count; i++ ) 
  { 
    random = rand()%length + 1; 
    //printf("%d \n", auxiliary[random]); 
    //data[auxiliary[random]].clusterID = auxiliary[random]; 
    cluster_center[i] = auxiliary[random]; 
    auxiliary[random] = auxiliary[length--]; 
  } 
   
  for( i = 1; i <= cluster_count; i++ ) 
  { 
    cluster_center_new[i].dimension_1 = data[cluster_center[i]].dimension_1; 
    cluster_center_new[i].dimension_2 = data[cluster_center[i]].dimension_2; 
    cluster_center_new[i].clusterID = i; 
    data[cluster_center[i]].clusterID = i; 
  } 
} 
 
/* 
 * 计算一个点(还没有划分到cluster center的点)到一个cluster center的distance 
 *   @itemID:  不属于任何cluster中的点 
 *   @centerID: center的ID 
 *   @count:   表明在计算的是itemID到第几个@center的distance，并且指明了结果放在distanceFromCenter的第几号元素 
 * */ 
void calculateDistance_ToOneCenter(int itemID,int centerID){ 
  distanceFromCenter[centerID] = sqrt( (data[itemID].dimension_1-cluster_center_new[centerID].dimension_1)*(double)(data[itemID].dimension_1-cluster_center_new[centerID].dimension_1) + (double)(data[itemID].dimension_2-cluster_center_new[centerID].dimension_2) * (data[itemID].dimension_2-cluster_center_new[centerID].dimension_2) ); 
} 
 
/* 
 * 计算一个点(还没有划分到cluster center的点)到每个cluster center的distance 
 * */ 
void calculateDistance_ToAllCenter(int itemID){ 
  int i; 
  for( i = 1; i <= cluster_count; i++ ) 
  { 
    calculateDistance_ToOneCenter(itemID, i); 
  } 
} 
 
void test_1() 
{ 
  calculateDistance_ToAllCenter(3); 
  int i; 
  for( i = 1; i <= cluster_count; i++ ) 
  { 
    printf("%f ", distanceFromCenter[i]); 
  } 
} 
 
/* 
 * 在得到任一的点(不属于任一cluster的)到每一个cluster center的distance之后，决定它属于哪一个cluster center,即取距离最小的 
 *   函数功能：得到一个item所属的cluster center 
 * */ 
void partition_forOneItem(int itemID){ 
  //操作对象是 distanceFromCenter和cluster_center 
  int i; 
  int min_index = 1; 
  double min_value = distanceFromCenter[1]; 
  for( i = 2; i <= cluster_count; i++ ) 
  { 
    if( distanceFromCenter[i]

以上就是本文的全部内容，希望对大家的学习有所帮助，也希望大家多多支持。

算法

推荐阅读

算法
Linux服务器密码过期策略、登录次数限制、私钥登录等配置方法

本文介绍了在Linux服务器上进行密码过期策略、登录次数限制、私钥登录等配置的方法。通过修改配置文件中的参数，可以设置密码的有效期、最小间隔时间、最小长度，并在密码过期前进行提示。同时还介绍了如何进行公钥登录和修改默认账户用户名的操作。详细步骤和注意事项可参考本文内容。 ... [详细]

蜡笔小新 2023-12-14 17:57:01
nlp
学习SLAM的女生，很酷

本文介绍了学习SLAM的女生的故事，她们选择SLAM作为研究方向，面临各种学习挑战，但坚持不懈，最终获得成功。文章鼓励未来想走科研道路的女生勇敢追求自己的梦想，同时提到了一位正在英国攻读硕士学位的女生与SLAM结缘的经历。 ... [详细]

蜡笔小新 2023-12-14 17:55:18
算法
C#生成随机数的三种方法及其问题分析

本文介绍了C#中生成随机数的三种方法，并分析了其中存在的问题。首先介绍了使用Random类生成随机数的默认方法，但在高并发情况下可能会出现重复的情况。接着通过循环生成了一系列随机数，进一步突显了这个问题。文章指出，随机数生成在任何编程语言中都是必备的功能，但Random类生成的随机数并不可靠。最后，提出了需要寻找其他可靠的随机数生成方法的建议。 ... [详细]

蜡笔小新 2023-12-14 14:15:30
算法
qt学习(六)数据库注册用户的实现方法

本文介绍了在qt学习中实现数据库注册用户的方法，包括登录按钮按下后出现注册页面、账号可用性判断、密码格式判断、邮箱格式判断等步骤。具体实现过程包括UI设计、数据库的创建和各个模块调用数据内容。 ... [详细]

蜡笔小新 2023-12-14 13:29:32
算法
2020年AI产业报告：100个岗位抢1个人，计算机视觉成最大缺口

“你永远都不知道明天和‘公司的意外’哪个先来。”疫情期间，这是我们最战战兢兢的心情。但是显然，有些人体会不了。这份行业数据，让笔者“柠檬” ... [详细]

蜡笔小新 2023-12-14 12:23:22
算法
生成对抗式网络GAN及其衍生CGAN、DCGAN、WGAN、LSGAN、BEGAN介绍

一、GAN原理介绍学习GAN的第一篇论文当然由是IanGoodfellow于2014年发表的GenerativeAdversarialNetworks（论文下载链接arxiv：[h ... [详细]

蜡笔小新 2023-12-14 11:39:45
算法
[译]技术公司十年经验的职场生涯回顾

本文是一位在技术公司工作十年的职场人士对自己职业生涯的总结回顾。她的职业规划与众不同，令人深思又有趣。其中涉及到的内容有机器学习、创新创业以及引用了女性主义者在TED演讲中的部分讲义。文章表达了对职业生涯的愿望和希望，认为人类有能力不断改善自己。 ... [详细]

蜡笔小新 2023-12-14 11:31:05
算法
无线认证设置故障排除方法及注意事项

本文介绍了解决无线认证设置故障的方法和注意事项，包括检查无线路由器工作状态、关闭手机休眠状态下的网络设置、重启路由器、更改认证类型、恢复出厂设置和手机网络设置等。通过这些方法，可以解决无线认证设置可能出现的问题，确保无线网络正常连接和上网。同时，还提供了一些注意事项，以便用户在进行无线认证设置时能够正确操作。 ... [详细]

蜡笔小新 2023-12-14 10:32:21
算法
游戏开发中的人工智能技术及分类介绍

本文介绍了游戏开发中的人工智能技术，包括定性行为和非定性行为的分类。定性行为是指特定且可预测的行为，而非定性行为则具有一定程度的不确定性。其中，追逐算法是定性行为的具体实例。 ... [详细]

蜡笔小新 2023-12-14 10:22:59
算法
JavaScript设计模式之策略模式（Strategy Pattern）的优势及应用

本文介绍了JavaScript设计模式之策略模式（Strategy Pattern）的定义和优势，策略模式可以避免代码中的多重判断条件，体现了开放-封闭原则。同时，策略模式的应用可以使系统的算法重复利用，避免复制粘贴。然而，策略模式也会增加策略类的数量，违反最少知识原则，需要了解各种策略类才能更好地应用于业务中。本文还以员工年终奖的计算为例，说明了策略模式的应用场景和实现方式。 ... [详细]

蜡笔小新 2023-12-14 09:31:45
算法
PhysioNet生理信号处理（三）WFDB Toolbox for Matlab的安装和使用方法

本文介绍了PhysioNet网站提供的生理信号处理工具箱WFDB Toolbox for Matlab的安装和使用方法。通过下载并添加到Matlab路径中或直接在Matlab中输入相关内容，即可完成安装。该工具箱提供了一系列函数，可以方便地处理生理信号数据。详细的安装和使用方法可以参考本文内容。 ... [详细]

蜡笔小新 2023-12-13 20:46:48
算法
相机防抖设置详解及使用方法

本文详细介绍了相机防抖的设置方法和使用技巧，包括索尼防抖设置、VR和Stabilizer档位的选择、机身菜单设置等。同时解释了相机防抖的原理，包括电子防抖和光学防抖的区别，以及它们对画质细节的影响。此外，还提到了一些运动相机的防抖方法，如大疆的Osmo Action的Rock Steady技术。通过本文，你将更好地理解相机防抖的重要性和使用技巧，提高拍摄体验。 ... [详细]

蜡笔小新 2023-12-13 20:39:20
算法
图解redis的持久化存储机制RDB和AOF的原理和优缺点

本文通过图解的方式介绍了redis的持久化存储机制RDB和AOF的原理和优缺点。RDB是将redis内存中的数据保存为快照文件，恢复速度较快但不支持拉链式快照。AOF是将操作日志保存到磁盘，实时存储数据但恢复速度较慢。文章详细分析了两种机制的优缺点，帮助读者更好地理解redis的持久化存储策略。 ... [详细]

蜡笔小新 2023-12-13 20:24:11
算法
无损压缩算法专题——LZSS算法实现

本文介绍了基于无损压缩算法专题的LZSS算法实现。通过Python和C两种语言的代码实现了对任意文件的压缩和解压功能。详细介绍了LZSS算法的原理和实现过程，以及代码中的注释。 ... [详细]

蜡笔小新 2023-12-13 19:47:31
算法
解决Cydia数据库错误：could not open file /var/lib/dpkg/status 的方法

本文介绍了解决iOS系统中Cydia数据库错误的方法。通过使用苹果电脑上的Impactor工具和NewTerm软件，以及ifunbox工具和终端命令，可以解决该问题。具体步骤包括下载所需工具、连接手机到电脑、安装NewTerm、下载ifunbox并注册Dropbox账号、下载并解压lib.zip文件、将lib文件夹拖入Books文件夹中，并将lib文件夹拷贝到/var/目录下。以上方法适用于已经越狱且出现Cydia数据库错误的iPhone手机。 ... [详细]

蜡笔小新 2023-12-13 19:02:44

洱冬橙66_156

这个家伙很懒，什么也没留下！

Tags | 热门标签

RankList | 热门文章