前言
本文主要给大家介绍了关于python3中全角和半角字符转换的相关内容,分享出来供大家参考学习,下面话不多说了,来一起看看详细的介绍吧。
一、背景介绍
解决什么问题:快速方便的对文本进行全角半角自动转换
适用什么场景:学生答题数据中全角字符替换为半角字符
二、全角半角原理
全角即:Double Byte Character,简称DBC
半角即:Single Byte Character,简称SBC
在 windows 中,中文和全角字符都占两个字节,并且使用了 ascii chart 2 (codes 128–255);
全角字符的第一个字节总是被置为 163,而第二个字节则是相同半角字符码加上128(不包括空格,全角空格和半角空格也要考虑进去);
对于中文来说,它的第一个字节被置为大于163,如'阿'为:176 162,检测到中文时不进行转换。
例如:半角 a 为 65,则全角 a 是 163(第一个字节)、193(第二个字节,128+65)。
全角半角示例:(文本 test.txt 包含全角和半角字符)
1
2
3
4
5
6
|
F:\test>type test.txt 123456 123456 abcdefg abcdefg 中国你好 |
三、使用 Python3 实现全角半角转换
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
|
# -*- coding:utf-8 -*- # i@mail.chenpeng.info ”' 全角即:Double Byte Character,简称:DBC 半角即:Single Byte Character,简称:SBC ”' def DBC2SBC(ustring): ” ' 全角转半角 ”' rstring = “” for uchar in ustring: inside_code = ord (uchar) if inside_code = = 0x3000 : inside_code = 0x0020 else : inside_code - = 0xfee0 if not ( 0x0021 < = inside_code and inside_code < = 0x7e ): rstring + = uchar continue rstring + = chr (inside_code) return rstring def SBC2DBC(ustring): ” ' 半角转全角 ”' rstring = “” for uchar in ustring: inside_code = ord (uchar) if inside_code = = 0x0020 : inside_code = 0x3000 else : if not ( 0x0021 < = inside_code and inside_code < = 0x7e ): rstring + = uchar continue inside_code + = 0xfee0 rstring + = chr (inside_code) return rstring s = ”' array(‘0 ' => ‘0' , ‘1 ' => ‘1' , ‘2 ' => ‘2' , ‘3 ' => ‘3' , ‘4 ' => ‘4' , ‘5 ' => ‘5' , ‘6 ' => ‘6' , ‘7 ' => ‘7' , ‘8 ' => ‘8' , ‘9 ' => ‘9' , ‘A ' => ‘A' , ‘B ' => ‘B' , ‘C ' => ‘C' , ‘D ' => ‘D' , ‘E ' => ‘E' , ‘F ' => ‘F' , ‘G ' => ‘G' , ‘H ' => ‘H' , ‘I ' => ‘I' , ‘J ' => ‘J' , ‘K ' => ‘K' , ‘L ' => ‘L' , ‘M ' => ‘M' , ‘N ' => ‘N' , ‘O ' => ‘O' , ‘P ' => ‘P' , ‘Q ' => ‘Q' , ‘R ' => ‘R' , ‘S ' => ‘S' , ‘T ' => ‘T' , ‘U ' => ‘U' , ‘V ' => ‘V' , ‘W ' => ‘W' , ‘X ' => ‘X' , ‘Y ' => ‘Y' , ‘Z ' => ‘Z' , ‘a ' => ‘a' , ‘b ' => ‘b' , ‘c ' => ‘c' , ‘d ' => ‘d' , ‘e ' => ‘e' , ‘f ' => ‘f' , ‘g ' => ‘g' , ‘h ' => ‘h' , ‘i ' => ‘i' , ‘j ' => ‘j' , ‘k ' => ‘k' , ‘l ' => ‘l' , ‘m ' => ‘m' , ‘n ' => ‘n' , ‘o ' => ‘o' , ‘p ' => ‘p' , ‘q ' => ‘q' , ‘r ' => ‘r' , ‘s ' => ‘s' , ‘t ' => ‘t' , ‘u ' => ‘u' , ‘v ' => ‘v' , ‘w ' => ‘w' , ‘x ' => ‘x' , ‘y ' => ‘y' , ‘z ' => ‘z' , ‘( ' => ‘(‘, ‘)' = > ‘) ', ‘〔' = > ‘[‘, ‘〕 ' => ‘]' , ‘【' = > ‘[‘, ‘】 ' => ‘]' , ‘〖 ' => ‘[‘, ‘〗' = > ‘] ', ‘”‘ => ‘[‘, ‘”‘ => ‘]' , ‘\” = > ‘[‘, ‘\” = > ‘] ', ‘{' = > ‘{‘, ‘} ' => ‘}' , ‘《' = > ‘<‘, ‘》 ' => ‘>' , ‘% ' => ‘%' , ‘+ ' => ‘+' , ‘— ' => ‘-‘, ‘-' = > ‘ - ‘, ‘~' = > ‘ - ‘, ‘: ' => ‘:' , ‘。 ' => ‘.' , ‘、 ' => ‘,' , ‘, ' => ‘.' , ‘、 ' => ‘.' , ‘; ' => ‘,' , ‘? ' => ‘?' , ‘! ' => ‘!' , ‘… ' => ‘-‘, ‘‖' = > ‘|', ‘”‘ = > ‘”‘, ‘\” = > ‘` ', ‘\” => ‘`' , ‘| ' => ‘|' , ‘〃' = > ‘”‘, ‘ ' = > ‘ ‘); ”' # 全角转半角 print (DBC2SBC(s)) # 半角转全角 print (SBC2DBC(s)) s = ” '中文测试”' # 全角转半角 print (DBC2SBC(s)) # 半角转全角 print (SBC2DBC(s)) |
四、总结
以上就是这篇文章的全部内容了,希望本文的内容对大家的学习或者工作能带来一定的帮助,如果有疑问大家可以留言交流,谢谢大家对服务器之家的支持。
五、参考资料
http://thinkerou.com/2015-06/covert-dbc-sbc/
原文链接:http://chenpeng.info/html/3795