Python The solution of crawler's abnormal retrying - Blog

[{"createTime":1735734952000,"id":1,"img":"hwy_ms_500_252.jpeg","link":"https://activity.huaweicloud.com/cps.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=V1g3MDY4NTY=&utm_medium=cps&utm_campaign=201905","name":"华为云秒杀","status":9,"txt":"华为云38元秒杀","type":1,"updateTime":1735747411000,"userId":3},{"createTime":1736173885000,"id":2,"img":"txy_480_300.png","link":"https://cloud.tencent.com/act/cps/redirect?redirect=1077&cps_key=edb15096bfff75effaaa8c8bb66138bd&from=console","name":"腾讯云秒杀","status":9,"txt":"腾讯云限量秒杀","type":1,"updateTime":1736173885000,"userId":3},{"createTime":1736177492000,"id":3,"img":"aly_251_140.png","link":"https://www.aliyun.com/minisite/goods?userCode=pwp8kmv3","memo":"","name":"阿里云","status":9,"txt":"阿里云2折起","type":1,"updateTime":1736177492000,"userId":3},{"createTime":1735660800000,"id":4,"img":"vultr_560_300.png","link":"https://www.vultr.com/?ref=9603742-8H","name":"Vultr","status":9,"txt":"Vultr送$100","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":5,"img":"jdy_663_320.jpg","link":"https://3.cn/2ay1-e5t","name":"京东云","status":9,"txt":"京东云特惠专区","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":6,"img":"new_ads.png","link":"https://www.iodraw.com/ads","name":"发布广告","status":9,"txt":"发布广告","type":1,"updateTime":1735660800000,"userId":3},{"createTime":1735660800000,"id":7,"img":"yun_910_50.png","link":"https://activity.huaweicloud.com/discount_area_v5/index.html?fromacct=261f35b6-af54-4511-a2ca-910fa15905d1&utm_source=aXhpYW95YW5nOA===&utm_medium=cps&utm_campaign=201905","name":"底部","status":9,"txt":"高性能云服务器2折起","type":2,"updateTime":1735660800000,"userId":3}]

When we do data capture , Often encountered due to network problems caused by the program save , Previously, only the error was recorded , And the error content is post processed .

Original process ：
def crawl_page(url): pass def log_error(url): pass url = "" try:
crawl_page(url) except: log_error(url)
Improved process ：
attempts = 0 success = False while attempts < 3 and not success: try:
crawl_page(url) success = True except: attempts += 1 if attempts == 3: break
Recently discovered new solutions ：retrying

retrying It's a Python Retrying package for , It can be used to automatically retry some program segments that may fail to run .retrying Provides a decorator function retry
, The decorated function will be executed again if it fails to run , By default, as long as the error has been reported, it will continue to try again .
import random from retrying import retry @retry def do_something_unreliable():
if random.randint(0, 10) > 1: raise IOError("Broken sauce, everything is
hosed!!!111one") else: return "Awesome sauce!" print do_something_unreliable()
If we run have_a_try function , So until random.randint return 5, It's the end of execution , Otherwise, it will be repeated all the time .

retry You can also accept some parameters , This is from the source code Retrying Class's initialization function can see the optional parameters ：

* stop_max_attempt_number： Used to set the maximum number of attempts , Stop retrying after this number
* stop_max_delay
： For example, set it to 10000, Then start at the point in time when the decorated function starts to execute , The time point to the end of a successful function or the termination of a failure , As long as this period exceeds 10 second , The function is no longer executed
* wait_fixed： Set to twice retrying Stay time between
* wait_random_min and wait_random_max： Produce twice in a random way retrying Stay time between
* wait_exponential_multiplier and wait_exponential_max： Twice in the form of an index retrying
Stay time between , The resulting value is 2^previous_attempt_number * wait_exponential_multiplier,
previous_attempt_number It's already there retry Times of , If the generated value exceeds wait_exponential_max
Size of , So the next two retrying The retention values between the wait_exponential_max. This design caters to exponential backoff
algorithm , It can reduce congestion .
* We can specify which exceptions we want to go back to retry, This one needs to be used retry_on_exception Pass in a function object ：def
retry_if_io_error(exception): return isinstance(exception, IOError)
@retry(retry_on_exception=retry_if_io_error) def read_a_file(): with
open("file", "r") as f: return f.read() In execution read_a_file Function , If an exception is reported , Then this exception will take the form parameter
exception afferent retry_if_io_error Function , If exception yes IOError Then go ahead retry, If not, stop running and throw an exception .

We can also specify when we want to get the results retry, This one needs to be used retry_on_result Pass in a function object ：
def retry_if_result_none(result): return result is None
@retry(retry_on_result=retry_if_result_none) def get_result(): return None
In execution get_result After success , The return value of the function is passed through a formal parameter result In the form of retry_if_result_none Function , If the return value is None Then go ahead
retry, Otherwise, it ends and returns the function value .

Technology

Java296 blogs
Python265 blogs
Vue125 blogs
C Language122 blogs
Algorithm108 blogs
MySQL96 blogs
Flow Chart85 blogs
JavaScript79 blogs
More...