pytorch显存一直变大的解决方案_Python

在代码中添加以下两行可以解决：

				?

									torch.backends.cudnn.enabled = True

									torch.backends.cudnn.benchmark = True

补充：pytorch训练过程显存一直增加的问题

之前遇到了爆显存的问题，卡了很久，试了很多方法，总算解决了。

总结下自己试过的几种方法：

**1. 使用torch.cuda.empty_cache()

在每一个训练epoch后都添加这一行代码，可以让训练从较低显存的地方开始，但并不适用爆显存的问题，随着epoch的增加，最大显存占用仍然会提示out of memory 。

2.使用torch.backends.cudnn.enabled = True 和 torch.backends.cudnn.benchmark = True

原理不太清楚，用法和1一样。但是几乎没有效果，直接pass。

3.最重要的：查看自己的forward函数是否存在泄露。

常需要在forward函数里调用其他子函数，这时候要特别注意：

input尽量不要写在for循环里面！！！

子函数里如果有append（）等函数，一定少用，能不用就不用！！！

子函数list一定少用，能不用就不用！！！

总之，子函数一般也不会太复杂，直接写出来，别各种for，嵌套，变量。！！！

补充：Pytorch显存不断增长问题的解决思路

这个问题，我先后遇到过两次，每次都异常艰辛的解决了。

在网上，关于这个问题，你可以找到各种看似不同的解决方案，但是都没能解决我的问题。所以只能自己摸索，在摸索的过程中，有了一个排查问题点的思路。

下面举个例子说一下我的思路。

大体思路

其实思路很简单，就是在代码的运行阶段输出显存占用量，观察在哪一块存在显存剧烈增加或者显存异常变化的情况。

但是在这个过程中要分级确认问题点，也即如果存在三个文件main.py、train.py、model.py。

在此种思路下，应该先在main.py中确定问题点，然后，从main.py中进入到train.py中，再次输出显存占用量，确定问题点在哪。

随后，再从train.py中的问题点，进入到model.py中，再次确认。

如果还有更深层次的调用，可以继续追溯下去。

具体例子

main.py

				?

									def train(model,epochs,data):

									    for e in range(epochs):

									        print("1:{}".format(torch.cuda.memory_allocated(0)))

									        train_epoch(model,data)

									        print("2:{}".format(torch.cuda.memory_allocated(0)))

									        eval(model,data)

									        print("3:{}".format(torch.cuda.memory_allocated(0)))

假设1与2之间显存增加极为剧烈，说明问题出在train_epoch中，进一步进入到train.py中。

train.py

				?

									def train_epoch(model,data):

									    model.train()

									    optim=torch.optimizer()

									    for batch_data in data:

									        print("1:{}".format(torch.cuda.memory_allocated(0)))

									        output=model(batch_data)

									        print("2:{}".format(torch.cuda.memory_allocated(0)))

									        loss=loss(output,data.target)

									        print("3:{}".format(torch.cuda.memory_allocated(0)))

									        optim.zero_grad()

									        print("4:{}".format(torch.cuda.memory_allocated(0)))

									        loss.backward()

									        print("5:{}".format(torch.cuda.memory_allocated(0)))

									        utils.func(model)

									        print("6:{}".format(torch.cuda.memory_allocated(0)))

如果在1,2之间，5,6之间同时出现显存增加异常的情况。此时需要使用控制变量法，例如我们先让5,6之间的代码失效，然后运行，观察是否仍然存在显存爆炸。如果没有，说明问题就出在5,6之间下一级的代码中。进入到下一级代码，进行调试：

utils.py

				?

									def func(model):

									print("1:{}".format(torch.cuda.memory_allocated(0)))

									a=f1(model)

									print("2:{}".format(torch.cuda.memory_allocated(0)))

									b=f2(a)

									print("3:{}".format(torch.cuda.memory_allocated(0)))

									c=f3(b)

									print("4:{}".format(torch.cuda.memory_allocated(0)))

									d=f4(c)

									print("5:{}".format(torch.cuda.memory_allocated(0)))

此时我们再展示另一种调试思路，先注释第5行之后的代码，观察显存是否存在先训爆炸，如果没有，则注释掉第7行之后的，直至确定哪一行的代码出现导致了显存爆炸。假设第9行起作用后，代码出现显存爆炸，说明问题出在第九行，显存爆炸的问题锁定。

几种导致显存爆炸的情况

pytorch的hook机制可能导致，显存爆炸，hook函数取出某一层的输入输出跟权重后，不可进行存储，修改等操作，这会造成hook不能回收，进而导致取出的输入输出权重都可能不被pytorch回收，所以模型的负担越来也大，最终导致显存爆炸。

这种情况是我第二次遇到显存爆炸查出来的，非常让人匪夷所思。在如下代码中，p.sub_(torch.mm(k, torch.t(k)) / (alpha + torch.mm(r, k)))，导致了显存爆炸，这个问题点就是通过上面的方法确定的。

这个P是一个矩阵，在使用p.sub_的方式更新P的时候，导致了显存爆炸。

将这行代码修改为p=p-(torch.mm(k, torch.t(k)) / (alpha + torch.mm(r, k)))，显存爆炸的问题解决。

				?

									def pro_weight(p, x, w, alpha=1.0, cnn=True, stride=1):

									              if cnn:

									                  _, _, H, W = x.shape

									                  F, _, HH, WW = w.shape

									                  S = stride  # stride

									                  Ho = int(1 + (H - HH) / S)

									                  Wo = int(1 + (W - WW) / S)

									                  for i in range(Ho):

									                      for j in range(Wo):

									                          # N*C*HH*WW, C*HH*WW = N*C*HH*WW, sum -> N*1

									                          r = x[:, :, i * S: i * S + HH, j * S: j * S + WW].contiguous().view(1, -1)

									                          # r = r[:, range(r.shape[1] - 1, -1, -1)]

									                          k = torch.mm(p, torch.t(r))

									                          p.sub_(torch.mm(k, torch.t(k)) / (alpha + torch.mm(r, k)))

									                  w.grad.data = torch.mm(w.grad.data.view(F, -1), torch.t(p.data)).view_as(w)

									              else:

									                  r = x

									                  k = torch.mm(p, torch.t(r))

									                  p.sub_(torch.mm(k, torch.t(k)) / (alpha + torch.mm(r, k)))

									                  w.grad.data = torch.mm(w.grad.data, torch.t(p.data))