1、余弦相似度
余弦相似度衡量的是2个向量间的夹角大小,通过夹角的余弦值表示结果,因此2个向量的余弦相似度为:
cosθ=A⋅B||A||∗||B||(1)
余弦相似度的取值为[-1,1],值越大表示越相似。
向量夹角的余弦公式很简单,不在此赘述,直接上代码:
def cosVector(x,y): if(len(x)!=len(y)): print('error input,x and y is not in the same space') return; result1=0.0; result2=0.0; result3=0.0; for i in range(len(x)): result1+=x[i]*y[i] #sum(X*Y) result2+=x[i]**2 #sum(X*X) result3+=y[i]**2 #sum(Y*Y) #print(result1) #print(result2) #print(result3) print("result is "+str(result1/((result2*result3)**0.5))) #结果显示cosVector([2,1],[1,1])一个计算二维数组余弦值的例子:
#求余弦函数def cosVector(x,y): if(len(x)!=len(y)): print('error input,x and y is not in the same space') return; result1=0.0; result2=0.0; result3=0.0; for i in range(len(x)): result1+=x[i]*y[i] #sum(X*Y) result2+=x[i]**2 #sum(X*X) result3+=y[i]**2 #sum(Y*Y) #print("result is "+str(result1/((result2*result3)**0.5))) #结果显示 return result1/((result2*result3)**0.5)#print("result is ",cosVector([2,1],[1,1]))#计算query_output(60,20)和db_output(60,20)的余弦值,用60*1的向量存储cosResult= [[0]*1 for i in range(60)]for i in range(60): cosResult[i][0]=cosVector(query_output[i], db_output[i])print(cosResult)--------------------------------------------------------------------------------------------#计算query_output和db_output的余弦值,用60*1的向量存快三稳赚10大技巧果显示 return result1/((result2*result3)**0.5)#print("result is ",cosVector([2,1],[1,1]))#计算query_output(60,20)和db_output(60,20)的余弦值,用60*1的向量存储cosResult= [[0]*1 for i in range(60)]for i in range(60): cosResult[i][0]=cosVector(query_output[i], db_output[i])print(cosResult)--------------------------------------------------------------------------------------------#计算query_output和db_output的余弦值,用60*1的向量存储rows=query_output.shape[0] #行数cols=query_output.shape[1] #列数cosResult= [[0]*1 for i in range(rows)]for i in range(rows): cosResult[i][0]=cosVector(query_output[i], db_output[i])#print(cosResult)#将结果存入文件中,并且一行一个数字file=open('cosResult.txt','w')for i in cosResult: file.write(str(i).replace('[','').replace(']','')+'n') #rn为换行符file.close()