1、余弦相似度
余弦相似度衡量的是2个向量间的夹角大小,通过夹角的余弦值表示结果,因此2个向量的余弦相似度为:
余弦相似度的取值为[-1,1],值越大表示越相似。
向量夹角的余弦公式很简单,不在此赘述,直接上代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
def cosVector(x,y): if ( len (x)! = len (y)): print ( 'error input,x and y is not in the same space' ) return ; result1 = 0.0 ; result2 = 0.0 ; result3 = 0.0 ; for i in range ( len (x)): result1 + = x[i] * y[i] #sum(X*Y) result2 + = x[i] * * 2 #sum(X*X) result3 + = y[i] * * 2 #sum(Y*Y) #print(result1) #print(result2) #print(result3) print ( "result is " + str (result1 / ((result2 * result3) * * 0.5 ))) #结果显示 cosVector([ 2 , 1 ],[ 1 , 1 ]) |
一个计算二维数组余弦值的例子:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
|
#求余弦函数 def cosVector(x,y): if ( len (x)! = len (y)): print ( 'error input,x and y is not in the same space' ) return ; result1 = 0.0 ; result2 = 0.0 ; result3 = 0.0 ; for i in range ( len (x)): result1 + = x[i] * y[i] #sum(X*Y) result2 + = x[i] * * 2 #sum(X*X) result3 + = y[i] * * 2 #sum(Y*Y) #print("result is "+str(result1/((result2*result3)**0.5))) #结果显示 return result1 / ((result2 * result3) * * 0.5 ) #print("result is ",cosVector([2,1],[1,1])) #计算query_output(60,20)和db_output(60,20)的余弦值,用60*1的向量存储 cosResult = [[ 0 ] * 1 for i in range ( 60 )] for i in range ( 60 ): cosResult[i][ 0 ] = cosVector(query_output[i], db_output[i]) print (cosResult) - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #计算query_output和db_output的余弦值,用60*1的向量存储 rows = query_output.shape[ 0 ] #行数 cols = query_output.shape[ 1 ] #列数 cosResult = [[ 0 ] * 1 for i in range (rows)] for i in range (rows): cosResult[i][ 0 ] = cosVector(query_output[i], db_output[i]) #print(cosResult) #将结果存入文件中,并且一行一个数字 file = open ( 'cosResult.txt' , 'w' ) for i in cosResult: file .write( str (i).replace( '[' ,' ').replace(' ] ',' ')+' \n') #\r\n为换行符 file .close() |
补充:python实现余弦近似度
方法一:
1
2
3
4
5
6
7
8
9
10
11
12
|
def cos(vector1,vector2): dot_product = 0.0 normA = 0.0 normB = 0.0 for a,b in zip (vector1,vector2): dot_product + = a * b normA + = a * * 2 normB + = b * * 2 if normA = = 0.0 or normB = = 0.0 : return None else : return 0.5 + 0.5 * dot_product / ((normA * normB) * * 0.5 ) #归一化 <span style="font-family: Arial, Helvetica, sans-serif;">从[-1,1]到[0,1]</span> |
方法二:
1
2
3
4
|
num = float (A.T * B) #若为行向量则 A * B.T denom = linalg.norm(A) * linalg.norm(B) cos = num / denom #余弦值 sim = 0.5 + 0.5 * cos #归一化 从[-1,1]到[0,1] |
以上为个人经验,希望能给大家一个参考,也希望大家多多支持服务器之家。如有错误或未考虑完全的地方,望不吝赐教。
原文链接:https://blog.csdn.net/zhuiqiuzhuoyue583/article/details/80145026