Tag Archives: urllib.error.HTTPError: HTTP Error 403: Forbidden

urllib.error.HTTPError: HTTP Error 403: Forbidden [How to Solve]

problem:

 The urllib.request.urlopen() method is often used to open the source code of a webpage, and then analyze the source code of the page, but for some websites when using this method, an “HTTP Error 403: Forbidden” exception will be thrown
For example, when the following statement is executed
[python] 
<span style=”font-size:14px;”> urllib.request.urlopen(“http://blog.csdn.net/eric_sunah/article/details/11099295”)</span>  
The following exception will occur:
[python]  
<span style=”color:#FF0000;”> File “D:\Python32\lib\urllib\request.py”, line 475, in open  
    response = meth(req, response)  
  File “D:\Python32\lib\urllib\request.py”, line 587, in http_response  
    ‘http’, request, response, code, msg, hdrs)  
  File “D:\Python32\lib\urllib\request.py”, line 513, in error  
    return self._call_chain(*args)  
  File “D:\Python32\lib\urllib\request.py”, line 447, in _call_chain  
    result = func(*args)  
  File “D:\Python32\lib\urllib\request.py”, line 595, in http_error_default  
    raise HTTPError(req.full_url, code, msg, hdrs, fp)  
urllib.error.HTTPError: HTTP Error 403: Forbidden</span>  
analysis:
The reason for the above exception is that if you open a URL with urllib.request.urlopen, the server will only receive a simple request for access to the page, but the server does not know the browser used to send this request . Operating system, hardware platform and other information, and requests for missing this information are often abnormal access, such as crawlers.
In order to prevent such abnormal access, some websites will verify the UserAgent in the request information (its information includes hardware platform, system software, application software, and user personal preferences). If the UserAgent is abnormal or does not exist, then this request Will be rejected (as shown in the error message above)
So you can try to include UserAgent information in the request
Program:
For Python  3.x, it is very simple to add UserAgent information in the request, the code is as follows
[python]  
#If you do not add the following line, urllib2.HTTPError: HTTP Error 403: Forbidden error will appear  
    #It is mainly caused by the prohibition of crawlers on this website. You can add header information to the request and pretend to be a browser to access User-Agent. The specific information can be queried through the FireBug plug-in of Firefox  
    headers = {‘User-Agent’:’Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0′}  
    req = urllib.request.Request(url=chaper_url, headers=headers)  
    urllib.request.urlopen(req).read()  
After replacing urllib.request.urlopen.read() with the above code, the page with the problem can be accessed normally