背景:
为了满足各个平台间数据的传输,以及能确保历史性和实时性。先选用kafka作为不同平台数据传输的中转站,来满足我们对跨平台数据发送与接收的需要。
kafka简介:
kafka is a distributed,partitioned,replicated commit logservice。它提供了类似于jms的特性,但是在设计实现上完全不同,此外它并不是jms规范的实现。kafka对消息保存时根据topic进行归类,发送消息者成为producer,消息接受者成为consumer,此外kafka集群有多个kafka实例组成,每个实例(server)成为broker。无论是kafka集群,还是producer和consumer都依赖于zookeeper来保证系统可用性集群保存一些meta信息。
总之:kafka做为中转站有以下功能:
1.生产者(产生数据或者说是从外部接收数据)
2.消费着(将接收到的数据转花为自己所需用的格式)
环境:
1.python3.5.x
2.kafka1.4.3
3.pandas
准备开始:
1.kafka的安装
1
|
pip install kafka - python |
2.检验kafka是否安装成功
3.pandas的安装
1
|
pip install pandas |
4.kafka数据的传输
直接撸代码:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
|
# -*- coding: utf-8 -*- ''' @author: 真梦行路 @file: kafka.py @time: 2018/9/3 10:20 ''' import sys import json import pandas as pd import os from kafka import kafkaproducer from kafka import kafkaconsumer from kafka.errors import kafkaerror kafaka_host = "xxx.xxx.x.xxx" #服务器端口地址 kafaka_port = 9092 #端口号 kafaka_topic = "topic0" #topic data = pd.read_csv(os.getcwd() + '\\data\\1.csv' ) key_value = data.to_json() class kafka_producer(): ''' 生产模块:根据不同的key,区分消息 ''' def __init__( self , kafkahost, kafkaport, kafkatopic, key): self .kafkahost = kafkahost self .kafkaport = kafkaport self .kafkatopic = kafkatopic self .key = key self .producer = kafkaproducer(bootstrap_servers = '{kafka_host}:{kafka_port}' . format ( kafka_host = self .kafkahost, kafka_port = self .kafkaport) ) def sendjsondata( self , params): try : parmas_message = params #注意dumps producer = self .producer producer.send( self .kafkatopic, key = self .key, value = parmas_message.encode( 'utf-8' )) producer.flush() except kafkaerror as e: print (e) class kafka_consumer(): def __init__( self , kafkahost, kafkaport, kafkatopic, groupid,key): self .kafkahost = kafkahost self .kafkaport = kafkaport self .kafkatopic = kafkatopic self .groupid = groupid self .key = key self .consumer = kafkaconsumer( self .kafkatopic, group_id = self .groupid, bootstrap_servers = '{kafka_host}:{kafka_port}' . format ( kafka_host = self .kafkahost, kafka_port = self .kafkaport) ) def consume_data( self ): try : for message in self .consumer: yield message except keyboardinterrupt as e: print (e) def sorteddictvalues(adict): items = adict.items() items = sorted (items,reverse = false) return [value for key, value in items] def main(xtype, group, key): ''' 测试consumer和producer ''' if xtype = = "p" : # 生产模块 producer = kafka_producer(kafaka_host, kafaka_port, kafaka_topic, key) print ( "===========> producer:" , producer) params = key_value producer.sendjsondata(params) if xtype = = 'c' : # 消费模块 consumer = kafka_consumer(kafaka_host, kafaka_port, kafaka_topic, group,key) print ( "===========> consumer:" , consumer) message = consumer.consume_data() for msg in message: msg = msg.value.decode( 'utf-8' ) python_data = json.loads(msg) ##这是一个字典 key_list = list (python_data) test_data = pd.dataframe() for index in key_list: print (index) if index = = 'month' : a1 = python_data[index] data1 = sorteddictvalues(a1) test_data[index] = data1 else : a2 = python_data[index] data2 = sorteddictvalues(a2) test_data[index] = data2 print (test_data) # print('value---------------->', python_data) # print('msg---------------->', msg) # print('key---------------->', msg.kry) # print('offset---------------->', msg.offset) if __name__ = = '__main__' : main(xtype = 'p' ,group = 'py_test' ,key = none) main(xtype = 'c' ,group = 'py_test' ,key = none) |
数据1.csv如下所示:
几点注意:
1、一定要有一个服务器的端口地址,不要用本机的ip或者乱写一个ip不然程序会报错。(我开始就是拿本机ip怼了半天,总是报错)
2、注意数据的传输格式以及编码问题(二进制传输),数据先转成json数据格式传输,然后将json格式转为需要格式。(不是json格式的注意dumps)
例中,dataframe->json->dataframe
3、例中dict转dataframe,也可以用简单方法直接转。
eg: type(data) ==>dict,data=pd.dateframe(data)
以上这篇在python环境下运用kafka对数据进行实时传输的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/qq_27280237/article/details/82256752