Python Collection of problems seen by crawler :

1. NotImplementedError: Only the following pseudo-classes are implemented:
nth-of-type

Answer:nth-child Change to nth-of-type

3.expected string or bytes-like object:

Answer: Code conversion .

eg:re.findall(r’’,html.decode(‘utf-8’))

4. 'gbk' codec can't encode character '\xa5' in position 184823: illegal
multibyte sequence

Answer:f = open(html,’w’,encoding = ‘utf-8’)

5.  Web page garbled problem : Chinese random code

Answer :response = requests.get(url,headers = header).content

#content Solve the garbled code

6. expected string or bytes-like object

Answer: Convert to string format ,str(urls)

urls = soup.select(' div > div.nei_left.fl > ul > li:nth-of-type(2)> a')[0]

Href = re.findall(r'<a href="(.*?)" target="" title=".*?">(.*?)</a>',str(urls))

7.  Raise URLError (err)  <urlopen error [Errno 11001] getaddrinfo failed>

usually ,URLError There is no network connection at ( There is no routing to a specific server ), Or when the server does not exist

Technology
©2020 ioDraw All rights reserved
2020 Nobel Prize in physiology or medicine announced Implementation and challenge of metadata service in data Lake Enterprises face SEM Bidding and SEO How to choose ? Or both ?spark.sql.shuffle.partitions and spark.default.parallelism The difference between JavaScript Do a simple guess number games What are the types of variables ? Trump's "VIP therapy ": Is receiving a drug treatment that has not yet been approved ( Essence )2020 year 6 month 26 day C# Class library DataTable( Extension method ) program ( process ) How is it stored in the operating system , Space allocation Understanding neural network machine translation in three minutes