Urllib2.httperror: http error 403: forbidden solution

When using Python to crawl a web crawler, it is common to assume that the target site has 403 Forbidden crawlers
Question: why are the 403 Forbidden errors
answer: urllib2.httperror: HTTPError 403: Forbidden errors occur mainly because the target website prohibits the crawler. The request header can be added to the request.
Q: So how do you solve it?
answer: just add a headers

req = urllib.request.Request(url="http://en.wikipedia.org"+pageUrl)
HTML = urlopen(req)

to add a headers to make it become

 headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
    req = urllib.request.Request(url="http://en.wikipedia.org"+pageUrl, headers=headers)
    # req = urllib.request.Request(url="http://en.wikipedia.org"+pageUrl)
    html = urlopen(req)

Q: How does Headers look?Answer: you can use the Internet in browser developer tools, such as in firefox
Q: Is there any other problem with pretending to be a browser?
answer: yes, for example, the target site will block the query too many times IP address

Reproduced in: https://www.cnblogs.com/fonttian/p/7294845.html


Read More: