Python进阶:生成器懒人版本的迭代器详解_Python

从容器、可迭代对象谈起

所有的容器都是可迭代的（iterable）,迭代器提供了一个next方法。iter()返回一个迭代器，通过next()函数可以实现遍历。

				?

									def is_iterable(param):

									try: 

									iter(param) 

									return True

									except TypeError:

									return False

									params = [

									1234,

									'1234',

									[1, 2, 3, 4],

									set([1, 2, 3, 4]),

									{1:1, 2:2, 3:3, 4:4},

									(1, 2, 3, 4)

									]

									for param in params:

									print('{} is iterable? {}'.format(param, is_iterable(param)))

									########## 输出 ##########

									# 1234 is iterable? False

									# 1234 is iterable? True

									# [1, 2, 3, 4] is iterable? True

									# {1, 2, 3, 4} is iterable? True

									# {1: 1, 2: 2, 3: 3, 4: 4} is iterable? True

									# (1, 2, 3, 4) is iterable? True

除了数字外,其他数据结构都是可迭代的。

生成器是什么

生成器是懒人版本的迭代器。例:

				?

									import os

									import psutil

									#显示当前 python 程序占用的内存大小

									def show_memory_info(hint):

									pid = os.getpid()

									p = psutil.Process(pid)

									info = p.memory_full_info()

									memory = info.uss / 1024. / 1024

									print('{} memory used: {} MB'.format(hint, memory))

									def test_iterator():

									show_memory_info('initing iterator')

									list_1 = [i for i in range(100000000)]

									show_memory_info('after iterator initiated')

									print(sum(list_1))

									show_memory_info('after sum called')

									def test_generator():

									show_memory_info('initing generator')

									list_2 = (i for i in range(100000000))

									show_memory_info('after generator initiated')

									print(sum(list_2))

									show_memory_info('after sum called')

									test_iterator()

									test_generator()

									%time test_iterator()

									%time test_generator()

									######### 输出 ##########

									initing iterator memory used: 48.9765625 MB

									after iterator initiated memory used: 3920.30078125 MB

									4999999950000000

									after sum called memory used: 3920.3046875 MB

									Wall time: 17 s

									initing generator memory used: 50.359375 MB

									after generator initiated memory used: 50.359375 MB

									4999999950000000

									after sum called memory used: 50.109375 MB

									Wall time: 12.5 s

[i for i in range(100000000)] 声明了一个迭代器，每个元素在生成后都会保存到内存中，占用了巨量的内存。(i for i in range(100000000)) 初始化了一个生成器，可以看到，生成器并不会像迭代器一样占用大量的内存，相比于 test_iterator()，test_generator()函数节省了一次生成一亿个元素的过程。在调用next()的时候，才会生成下一个变量.

生成器能玩啥花样

数学中有一个恒等式，(1 + 2 + 3 + ... + n)^2 = 1^3 + 2^3 + 3^3 + ... + n^3，用以下代码表达

				?

									def generator(k):

									i = 1

									while True:

									yield i ** k

									i += 1

									gen_1 = generator(1)

									gen_3 = generator(3)

									print(gen_1)

									print(gen_3)

									def get_sum(n):

									sum_1, sum_3 = 0, 0

									for i in range(n):

									next_1 = next(gen_1)

									next_3 = next(gen_3)

									print('next_1 = {}, next_3 = {}'.format(next_1, next_3))

									sum_1 += next_1

									sum_3 += next_3

									print(sum_1 * sum_1, sum_3)

									get_sum(8)

									########## 输出 ##########

									# <generator object generator at 0x000001E70651C4F8>

									# <generator object generator at 0x000001E70651C390>

									# next_1 = 1, next_3 = 1

									# next_1 = 2, next_3 = 8

									# next_1 = 3, next_3 = 27

									# next_1 = 4, next_3 = 64

									# next_1 = 5, next_3 = 125

									# next_1 = 6, next_3 = 216

									# next_1 = 7, next_3 = 343

									# next_1 = 8, next_3 = 512

									# 1296 1296

generator()这个函数，它返回了一个生成器，当运行到yield i ** k时，暂停并把i ** k作为next()的返回值。每次调用next(gen)时，暂停的程序会启动并往下执行，而且i的值也会被记住，继续累加，最后next_1为8，next_3为512.

仔细查看这个示例，发现迭代器是一个有限集合，生成器则可以成为一个无限集。调用next()，生成器根据运算会自动生成新的元素，然后返回给你，非常便捷。

再来看一个问题：给定一个list和一个指定数字，求这个数字在list中的位置:

				?

									#常规写法

									def index_normal(L, target):

									result = []

									for i, num in enumerate(L):

									if num == target:

									result.append(i)

									return result

									print(index_normal([1, 6, 2, 4, 5, 2, 8, 6, 3, 2], 2))

									########## 输出 ##########

									[2, 5, 9]

									#生成器写法

									def index_generator(L, target):

									for i, num in enumerate(L):

									if num == target:

									yield i

									print(list(index_generator([1, 6, 2, 4, 5, 2, 8, 6, 3, 2], 2)))

									######### 输出 ##########

									[2, 5, 9]

再看一例子:

查找子序列：给定两个字符串a,b,查找字符串a是否字符串b的子序列,所谓子序列,即一个序列包含在另一个序列中并且顺序一

算法:分别用两个指针指向两个字符串的头，然后往后移动找出相同的值，如果其中一个指针走完了整个字符串也没有相同的值，则不是子序列

				?

									def is_subsequence(a, b):

									b = iter(b)

									return all(i in b for i in a)

									print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))

									print(is_subsequence([1, 4, 3], [1, 2, 3, 4, 5]))

									######### 输出 ##########

									True

									False

下面代码为上面代码的演化版本

				?

									def is_subsequence(a, b):

									b = iter(b)

									print(b)

									gen = (i for i in a)

									print(gen)

									for i in gen:

									print(i)

									gen = ((i in b) for i in a)

									print(gen)

									for i in gen:

									print(i)

									return all(((i in b) for i in a))

									print(is_subsequence([1, 3, 5], [1, 2, 3, 4, 5]))

									print(is_subsequence([1, 4, 3], [1, 2, 3, 4, 5]))

									########## 输出 ##########

									# <list_iterator object at 0x000001E7063D0E80>

									# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C570>

									# 1

									# 3

									# 5

									# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C5E8>

									# True

									# True

									# True

									# False

									# <list_iterator object at 0x000001E7063D0D30>

									# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C5E8>

									# 1

									# 4

									# 3

									# <generator object is_subsequence.<locals>.<genexpr> at 0x000001E70651C570>

									# True

									# True

									# False

									# False

首先iter(b)把b转为迭代器。目的是内部实现next函数，(i for i in a) 会产生一个生成器，同样((i in b) for i in a)也是。然后(i in b)等阶于:

				?

									while True:

									val = next(b)

									if val == i:

									yield True

这里非常巧妙地利用生成器的特性，next()函数运行的时候，保存了当前的指针。比如下面这个示例

				?

									b = (i for i in range(5))

									print(2 in b)

									print(4 in b)

									print(3 in b)

									########## 输出 ##########

									True

									True

									False