我们观察用户评论发现:属性词往往和情感词伴随出现,原因是用户通常会在描述属性时表达情感,属性是情感表达的对象。还发现:属性词和专用情感词基本都是名词或形容词(形谓词)。
算法流程图如下:
评论数据如下:
代码如下:
python" id="highlighter_853221">
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
|
#encoding=utf-8 ############################# # # 功能:给定一些中文的产品评论,希望从中找到评价对象及评价词。 # # @author:licl # ############################## fdata = open ( 'jd_dfb_comments_out.txt' , 'r' ) output = open ( 'pattern_result.txt' , 'a' ) try : data = fdata.readlines() listline = [] for line in data: listline = line.replace( " " , "/" ) listline = listline.split( "/" ) i = 1 while i < len (listline): if listline[i] ! = "名词" : i = i + 2 else : new_list = [" "," "," "] new_list[ 0 ] = listline[i - 1 ] a = i - 1 i = i + 2 while i < len (listline): if listline[i] = = "标点" : i = i + 2 break else : if listline[i - 1 ] = = '不' or listline[i - 1 ] = = '不怎么样' or listline[i - 1 ] = = '不怎么' or listline[i - 1 ] = = '不太' : new_list[ 1 ] = listline[i - 1 ] if listline[i] = = "形容词" or listline[i] = = "形谓词" : new_list[ 1 ] + = listline[i - 1 ] b = i - 1 t = (b - a) / 2 new_list[ 2 ] = str (t) for line in new_list: output.write(line + " " ) output.write( "\n" ) break else : i = i + 2 except : print "‘文件不存在'或者‘文件无法打开'" finally : fdata.close() output.close() |
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/m53931422/article/details/41042791