机器学习实战之knn算法pandas,供大家参考,具体内容如下
开始学习机器学习实战这本书,打算看完了再回头看 周志华的 机器学习。机器学习实战的代码都是用numpy写的,有些麻烦,所以考虑用pandas来实现代码,也能回顾之前学的 用python进行数据分析。感觉目前章节的测试方法太渣,留着以后学了更多再回头写。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
|
# coding: gbk import pandas as pd import numpy as np def getdata(path): data = pd.read_csv(path, header = none, sep = '\t' ) character = data.iloc[:, : - 1 ] label = data.iloc[:, - 1 ] chara_max = character. max () chara_min = character. min () chara_range = chara_max - chara_min normal_chara = (character - chara_min) / chara_range return normal_chara, label # 获得归一化特征值和标记 def knn(inx, normal_chara, label, k): data_sub = normal_chara - inx data_square = data_sub.applymap(np.square) data_sum = data_square. sum (axis = 1 ) data_sqrt = data_sum. map (np.sqrt) dis_sort = data_sqrt.argsort() k_label = label[dis_sort[:k]] label_sort = k_label.value_counts() res_label = label_sort.index[ 0 ] return res_label # knn算法分类 |
小编为大家分享一段代码:机器学习--knn基本实现
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
|
# _*_ coding _*_ import numpy as np import math import operator def get_data(dataset): x = dataset[:,: - 1 ].astype(np. float ) y = dataset[:, - 1 ] return x,y # def cal_dis(a,b): # x1,y1 = a[:] # x2,y2 = b[:] # dist = math.sqrt(math.pow(2,x2)-math.pow(2,x1)) def knnclassifer(dataset,predict,k = 3 ): x,y = get_data(dataset) dic = {} distince = np. sum ((predict - x) * * 2 ,axis = 1 ) * * 0.5 sorted_dict = np.argsort(distince) #[2 1 0 3 4] countlabel = {} for i in range (k): label = y[sorted_dict[i]] # print(i,sorted_dict[i],label) countlabel[label] = countlabel.get(label, 0 ) + 1 new_dic = sorted (countlabel,key = operator.itemgetter( 0 ),reverse = true) return new_dic[ 0 ][ 0 ] if __name__ = = '__main__' : dataset = np.loadtxt( "dataset.txt" ,dtype = np. str ,delimiter = "," ) predict = [ 2 , 2 ] label = knnclassifer(dataset,predict, 3 ) print (label) |
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/weixin_38204423/article/details/74640625