pandas
代码如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
|
import pandas as pd import numpy as np salaries = pd.DataFrame({ 'name' : [ 'BOSS' , 'Lilei' , 'Lilei' , 'Han' , 'BOSS' , 'BOSS' , 'Han' , 'BOSS' ], 'Year' : [ 2016 , 2016 , 2016 , 2016 , 2017 , 2017 , 2017 , 2017 ], 'Salary' : [ 1 , 2 , 3 , 4 , 5 , 6 , 7 , 8 ], 'Bonus' : [ 2 , 2 , 2 , 2 , 3 , 4 , 5 , 6 ] }) print (salaries) print (salaries[ 'Bonus' ].duplicated(keep = 'first' )) print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'first' )].index) print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'first' )]) print (salaries[ 'Bonus' ].duplicated(keep = 'last' )) print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'last' )].index) print (salaries[salaries[ 'Bonus' ].duplicated(keep = 'last' )]) |
输出如下:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
|
Bonus Salary Year name 0 2 1 2016 BOSS 1 2 2 2016 Lilei 2 2 3 2016 Lilei 3 2 4 2016 Han 4 3 5 2017 BOSS 5 4 6 2017 BOSS 6 5 7 2017 Han 7 6 8 2017 BOSS 0 False 1 True 2 True 3 True 4 False 5 False 6 False 7 False Name: Bonus, dtype: bool Int64Index([ 1 , 2 , 3 ], dtype = 'int64' ) Bonus Salary Year name 1 2 2 2016 Lilei 2 2 3 2016 Lilei 3 2 4 2016 Han 0 True 1 True 2 True 3 False 4 False 5 False 6 False 7 False Name: Bonus, dtype: bool Int64Index([ 0 , 1 , 2 ], dtype = 'int64' ) Bonus Salary Year name 0 2 1 2016 BOSS 1 2 2 2016 Lilei 2 2 3 2016 Lilei |
非pandas
对于如nunpy中的这些操作主要如下:
假设有数组
a = np.array([1, 2, 1, 3, 3, 3, 0])
想找出 [1 3]
则有
1
2
3
4
5
|
方法 1 m = np.zeros_like(a, dtype = bool ) m[np.unique(a, return_index = True )[ 1 ]] = True a[~m] |
1
2
3
|
方法 2 a[~np.in1d(np.arange( len (a)), np.unique(a, return_index = True )[ 1 ], assume_unique = True )] |
1
2
3
|
方法 3 np.setxor1d(a, np.unique(a), assume_unique = True ) |
1
2
3
4
|
方法 4 u, i = np.unique(a, return_inverse = True ) u[np.bincount(i) > 1 ] |
1
2
3
4
|
方法 5 s = np.sort(a, axis = None ) s[: - 1 ][s[ 1 :] = = s[: - 1 ]] |
参考:https://stackoverflow.com/questions/11528078/determining-duplicate-values-in-an-array
以上这篇Pandas统计重复的列里面的值方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/hguo11/article/details/82556171