pytorch In the process of training GPU Low utilization - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

<> problem

In training face dataset MS1M Time , use pytorch Of ImageFolder
Read the original image . Because the face data is small , And the quantity is large , cause GPU The training will be finished soon , however IO But it's slow , So it took the whole training time .

<> resolvent

The root cause of the above problems lies in pytorch You don't have your own data format , image TF Of TFrecorde,mx Of rec And documents caffe use lmdb, They all have their own formats . therefore , We can use the format of other frameworks to read data ,pytorch Let's do the training .

*
Because I don't like to use it myself tf Of TFrecorde（ Early learning tf I don't like it at all ）,mx Of and torch Very similar ,lmdb Although it can be used independently of the framework , But you need to master it well , Here is only about your own use mx Of rec
<> step

First, the compression problem ： stay mxnet Of github Download the source code file , among tools Of img2rec.py That is, the code file given by the official website

The image folder is as follows ：
imgs

* id1---->images
* id2---->images
First generate .lst file , This file contains all the paths of the image . Execution code
python img2rec.py train_data imgs --list --recursive --num-thread=10
* train_data by .lst The name of the file
* imgs The path to the folder containing the image
* --list Representation generation .lst file
* --recursive Represents all files in the browse path
* --num-thread Represents multithreading , Be sure to set it , Otherwise, it's the default 1, It will be very slow
And then according to the generated .lst File generation rec file

Execution code
python img2rec train_data images --num-thread=10
* here train_data Still .lst file name
* images For the rec The file name of the
* Two files are generated : images.rec and images.idx, These are the documents we need
The formal code section

* Recommended mxnet Of gluon This packaged module import mxnet as mx from mxnet.gluon.data.vision
import ImageRecordDataset from mxnet.gluon.data import DataLoader import torch
import numpy as np from PIL import Image def load_mx_rec(): data =
ImageRecordDataset('F:/MXnet/train_data.rec') train_loader = DataLoader(data,
batch_size=4, shuffle=False) train_transform = transforms.Compose([transforms.
Resize([int(128 * 128 / 112) , int(128 * 128 / 112)]), transforms.RandomCrop([
128, 128]) , transforms.RandomHorizontalFlip(), transforms.ToTensor()]) for
input, label in iter(train_loader): inputs = input.asnumpy() nB = torch.rand(4,
3, 128, 128) for i in range(4): image = Image.fromarray(inputs[i,:,:,:]) image =
train_transform(image) nB[i,:,:,:] = image labels = label.asnumpy() labels =
torch.from_numpy(labels).long() # load_mx_rec() import mxnet as mx from mxnet.
gluon.data.vision import ImageRecordDataset from mxnet.gluon.data import
DataLoaderimport torch import numpy as np import cv2 def load_mx_rec_2(): data =
ImageRecordDataset('F:/MXnet/train_data.rec') data1 = datasets.ImageFolder(
'F:/MXnet/images') train_loader = DataLoader(data, batch_size=4, shuffle=False)
# train_transform = transforms.Compose([transforms.Resize([int(128 * 128 / 112)
# , int(128 * 128 / 112)]), transforms.RandomCrop([128, 128]) # ,
transforms.RandomHorizontalFlip(), transforms.ToTensor()]) for input, label in
iter(train_loader): inputs = input.asnumpy() nB = torch.rand(4, 3, 128, 128) for
iin range(4): image = cv2.cvtColor(inputs[i,:,:,:], cv2.COLOR_RGB2BGR) size = (
int(128 * 128 / 112), int(128 * 128 / 112)) image = cv2.resize(image, size) x =
np.random.randint(0, int(128*128/112)-128) y = np.random.randint(0, int(128*128/
112)-128) image = image[x:x+128, y:y+128] if random.choice([0,1])>0: cv2.flip(
image, 1, image) image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB) image = image.
transpose(3, 1, 2).astype(np.float32) / 255 image[0,:,:] = (image[0,:,:] - 0.5)
/ 0.5 image[1,:,:] = (image[1,:,:] - 0.5) / 0.5 image[2,:,:] = (image[2,:,:] -
0.5) / 0.5 image = torch.from_numpy(image) nB[i,:,:,:] = image labels = label.
asnumpy() labels = torch.from_numpy(labels).long() load_mx_rec_2()
* The above two codes can achieve the function we want
* code 1 In order to use pytorch Of transforms, It's in the middle PIL Format of , If transforms It's complicated , It's not easy to use itself cv2 realization , You can do that
* Recommended code 2, utilize cv2 To replace transforms, More efficient .

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...