LabelEncoder可以将标签分配一个0—n_classes-1之间的编码
将各种标签分配一个可数的连续编号:
1
2
3
4
5
6
7
8
9
10
|
>>> le = preprocessing.LabelEncoder() >>> le.fit([ 1 , 2 , 2 , 6 ]) LabelEncoder() >>> le.classes_ array([ 1 , 2 , 6 ]) >>> le.transform([ 1 , 1 , 2 , 6 ]) # Transform Categories Into Integers array([ 0 , 0 , 1 , 2 ], dtype = int64) >>> le.inverse_transform([ 0 , 0 , 1 , 2 ]) # Transform Integers Into Categories array([ 1 , 1 , 2 , 6 ]) |
1
2
3
4
5
6
7
8
9
|
>>> le = preprocessing.LabelEncoder() >>> le.fit([ "paris" , "paris" , "tokyo" , "amsterdam" ]) LabelEncoder() >>> list (le.classes_) [ 'amsterdam' , 'paris' , 'tokyo' ] >>> le.transform([ "tokyo" , "tokyo" , "paris" ]) # Transform Categories Into Integers array([ 2 , 2 , 1 ], dtype = int64) >>> list (le.inverse_transform([ 2 , 2 , 1 ])) #Transform Integers Into Categories [ 'tokyo' , 'tokyo' , 'paris' ] |
将DataFrame中的所有ID标签转换成连续编号:
1
2
3
4
|
from sklearn.preprocessing import LabelEncoder import numpy as np import pandas as pd df = pd.read_csv( 'testdata.csv' ,sep = '|' ,header = None ) |
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5 0 37 52 55 50 38 54 1 17 32 20 9 6 48 2 28 10 56 51 45 16 3 27 49 41 30 53 19 4 44 29 8 1 46 13 5 11 26 21 14 7 33 6 0 39 22 33 35 43 7 18 15 47 5 25 34 8 23 2 4 9 3 31 9 12 57 36 40 42 24 |
1
2
3
|
le = LabelEncoder() le.fit(np.unique(df.values)) df. apply (le.transform) |
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5 0 37 52 55 50 38 54 1 17 32 20 9 6 48 2 28 10 56 51 45 16 3 27 49 41 30 53 19 4 44 29 8 1 46 13 5 11 26 21 14 7 33 6 0 39 22 33 35 43 7 18 15 47 5 25 34 8 23 2 4 9 3 31 9 12 57 36 40 42 24 |
将DataFrame中的每一行ID标签分别转换成连续编号:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
|
import pandas as pd from sklearn.preprocessing import LabelEncoder from sklearn.pipeline import Pipeline class MultiColumnLabelEncoder: def __init__( self ,columns = None ): self .columns = columns # array of column names to encode def fit( self ,X,y = None ): return self # not relevant here def transform( self ,X): ''' Transforms columns of X specified in self.columns using LabelEncoder(). If no columns specified, transforms all columns in X. ''' output = X.copy() if self .columns is not None : for col in self .columns: output[col] = LabelEncoder().fit_transform(output[col]) else : for colname,col in output.iteritems(): output[colname] = LabelEncoder().fit_transform(col) return output def fit_transform( self ,X,y = None ): return self .fit(X,y).transform(X) |
1
|
MultiColumnLabelEncoder(columns = [ 0 , 1 , 2 , 3 , 4 , 5 ]).fit_transform(df) |
或者
1
|
df. apply (LabelEncoder().fit_transform) |
1
2
3
4
5
6
7
8
9
10
11
|
0 1 2 3 4 5 0 8 8 8 7 5 9 1 3 5 2 2 1 8 2 7 1 9 8 7 1 3 6 7 6 4 9 2 4 9 4 1 0 8 0 5 1 3 3 3 2 5 6 0 6 4 5 4 7 7 4 2 7 1 3 6 8 5 0 0 2 0 4 9 2 9 5 6 6 3 |
1
2
3
4
5
6
|
# Create some toy data in a Pandas dataframe fruit_data = pd.DataFrame({ 'fruit' : [ 'apple' , 'orange' , 'pear' , 'orange' ], 'color' : [ 'red' , 'orange' , 'green' , 'green' ], 'weight' : [ 5 , 6 , 3 , 4 ] }) |
1
2
3
4
5
|
color fruit weight 0 red apple 5 1 orange orange 6 2 green pear 3 3 green orange 4 |
1
|
MultiColumnLabelEncoder(columns = [ 'fruit' , 'color' ]).fit_transform(fruit_data) |
或者
1
|
fruit_data[[ 'fruit' , 'color' ]] = fruit_data[[ 'fruit' , 'color' ]]. apply (LabelEncoder().fit_transform) |
1
2
3
4
5
|
color fruit weight 0 2 0 5 1 1 1 6 2 0 2 3 3 0 1 4 |
以上这篇使用sklearn之LabelEncoder将Label标准化的方法就是小编分享给大家的全部内容了,希望能给大家一个参考,也希望大家多多支持服务器之家。
原文链接:https://blog.csdn.net/u010412858/article/details/78386407