Summary and experience of model pruning - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

Contact pruning is a recent thing , In order to comprehensively and systematically learn about pruning , The author makes a summary of the paper . from 2016 Han Song's deep compression
To the latest lottery hypothesis , I mainly divide pruning into three categories . They are hard pruning without data participation , Soft pruning with data for training , And direct search structure NAS.

one ： Hard pruning

This kind of pruning algorithm usually starts from the parameters of the model itself , Find or design appropriate statistics to show the importance of connection . By sorting the importance of the algorithm , Delete some unimportant connections , The remaining model is the pruning model .

The core problem of this kind of algorithm lies in two aspects . first , The choice of Statistics , So there are many papers and experiments . second , Sparse training in training process .

《D EEP C OMPRESSION : C OMPRESSING D EEP N EURALN ETWORKS WITH P RUNING , T
RAINED Q UANTIZATION AND H UFFMAN C ODING》

Han Song's classic thesis , This paper integrates pruning of model compression , Distillation and quantification . In terms of pruning , Han song and others judge the importance based on the absolute value of the weight . Why do you choose this statistic , You can refer to another paper by Han song 《DSD》

《Filter Pruning via Geometric Median for Deep Convolutional Neural Networks
Acceleration》

This paper mainly puts forward some suggestions Geometric Median（ Geometric median ）, be based on GM Statistics of , Remove unimportant connections .

They choose GM The starting point of this paper is based on the pruning of weighted norm .

《Learning Efficient Convolutional Networks through Network Slimming》

In this paper, based on the conv After layer BN Of scale parameter , Use this parameter to measure the importance of the connection , At the same time, the paper also puts forward some suggestions based on BN Parameter loss Regular term , constraint BN
scale parameter , To achieve better sparse purpose , So as to get a more accurate evaluation .

two ： Soft pruning

Direct statistics of hard pruning , Soft pruning is mainly to reduce the effect of hard pruning , Parameter training is introduced in the pruning process . This kind of method , Some of them are pruned synchronously in the process of training the whole sample , Some are based on a small number of sample pruning .

《Channel Pruning for Accelerating Very Deep Neural Networks》

This article is based on two steps , Channel selection and feature map reconstruction are carried out iteratively , Reduce the cumulative error caused by pruning . meanwhile , In this paper, we focus on the reconstruction of loss and loss LASSO
regression To constrain the training process , Channel selection . however , In the strict sense, this paper belongs to the pruning of reasoning stage , It is only a small number of samples involved in the calculation process .

《Soft Filter Pruning for Accelerating Deep Convolutional Neural Networks》

This article belongs to the pruning process of training , In this article , The new idea is to set pruning channel as 0 Instead of deleting it directly , And still participate in the training in the follow-up process , Subtract it when it's finished .

《 PRUNING C ONVOLUTIONAL N EURAL N ETWORKS FOR R ESOURCE E FFICIENT I NFERENCE》

In this paper, from the perspective of back propagation , According to Taylor expansion , Put forward the standard of measurement . A greedy algorithm is proposed to select channel pruning .

three Architecture search

In this part , The first paper to be mentioned should be 《RETHINKING THE VALUE OF NETWORK
PRUNING》, What is the theme of this article , Is the structure or weight of pruning important , What the article shows is that the structure is important . But some researchers think that weight is important . I think this issue should be considered from different angles , Think weight is important , It should be based on training data not available , Big time cost , The angle to get the best result quickly . And the task structure is very complex , The main idea is that the ability of the model itself lies in the thinking of the structure .

Based on the idea of more important structure , There are many corresponding ones AutoMl The idea of .

《MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning》

This paper is based on the idea of meta learning , Through reinforcement learning to search the appropriate number of channels in the generation structure . In order to achieve the purpose of a structure search . The method generates pruning network through meta learning network , Network is used to generate the structure and parameters of network , So the pruning network with parameters can be obtained at once .

《THE LOTTERY TICKET HYPOTHESIS :FINDING SPARSE , TRAINABLE NEURAL NETWORKS》

This article ICLR The best paper , It puts forward the lottery hypothesis ： In a series of subnetworks （ lottery ）, The best structure （ Winning lottery ） Under the premise of random initialization, the same precision as the original network can be obtained . Through setting mask To generate a small network , The correctness of this hypothesis is verified by experiments .

Yes, of course , There are many papers on pruning , Like the pruning itself , For example, the improvement of training methods derived from pruning , wait . No matter from which direction , The experiment proved that , Application is the most important .

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...