创建测试数据:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
|
import pandas as pd import numpy as np #create a dataframe df1 = { 'subject' :[ 'semester1' , 'semester2' , 'semester3' , 'semester4' , 'semester1' , 'semester2' , 'semester3' ], 'score' :[ 62 , 47 , 55 , 74 , 31 , 77 , 85 ]} df2 = { 'subject' :[ 'semester1' , 'semester2' , 'semester3' , 'semester4' ], 'score' :[ 90 , 47 , 85 , 74 ]} df1 = pd.dataframe(df1,columns = [ 'subject' , 'score' ]) df2 = pd.dataframe(df2,columns = [ 'subject' , 'score' ]) print (df1) print (df2) |
运行结果:
求两个dataframe的交集
1
2
|
intersected_df = pd.merge(df1, df2, how = 'inner' ) print (intersected_df) |
也可以指定求交集的列:
1
2
|
intersected_df = pd.merge(df1, df2, on = [ 'subject' ], how = 'inner' ) print (intersected_df) |
求差集
df2-df1:
1
2
|
set_diff_df = pd.concat([df2, df1, df1]).drop_duplicates(keep = false) print (set_diff_df) |
df1-df2:
1
2
|
set_diff_df = pd.concat([df1, df2, df2]).drop_duplicates(keep = false) print (set_diff_df) |
另一种求差集的方法是:
以df1-df2为例:
1
2
3
4
|
df1 = df1.append(df2) df1 = df1.append(df2) set_diff_df = df1.drop_duplicates(subset = [ 'subject' , 'score' ],keep = false) print (set_diff_df) |
得到的df1-df2结果是一样的:
到此这篇关于pandas中两个dataframe的交集和差集的示例代码的文章就介绍到这了,更多相关pandas dataframe交集差集内容请搜索服务器之家以前的文章或继续浏览下面的相关文章希望大家以后多多支持服务器之家!
原文链接:https://blog.csdn.net/ljp1919/article/details/107165778/