problem:
The urllib.request.urlopen() method is often used to open the source code of a webpage, and then analyze the source code of the page, but for some websites when using this method, an “HTTP Error 403: Forbidden” exception will be thrown
For example, when the following statement is executed
[python]
<span style=”font-size:14px;”> urllib.request.urlopen(“http://blog.csdn.net/eric_sunah/article/details/11099295”)</span>
The following exception will occur:
[python]
<span style=”color:#FF0000;”> File “D:\Python32\lib\urllib\request.py”, line 475, in open
response = meth(req, response)
File “D:\Python32\lib\urllib\request.py”, line 587, in http_response
‘http’, request, response, code, msg, hdrs)
File “D:\Python32\lib\urllib\request.py”, line 513, in error
return self._call_chain(*args)
File “D:\Python32\lib\urllib\request.py”, line 447, in _call_chain
result = func(*args)
File “D:\Python32\lib\urllib\request.py”, line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden</span>
analysis:
The reason for the above exception is that if you open a URL with urllib.request.urlopen, the server will only receive a simple request for access to the page, but the server does not know the browser used to send this request . Operating system, hardware platform and other information, and requests for missing this information are often abnormal access, such as crawlers.
In order to prevent such abnormal access, some websites will verify the UserAgent in the request information (its information includes hardware platform, system software, application software, and user personal preferences). If the UserAgent is abnormal or does not exist, then this request Will be rejected (as shown in the error message above)
So you can try to include UserAgent information in the request
Program:
For Python 3.x, it is very simple to add UserAgent information in the request, the code is as follows
[python]
#If you do not add the following line, urllib2.HTTPError: HTTP Error 403: Forbidden error will appear
#It is mainly caused by the prohibition of crawlers on this website. You can add header information to the request and pretend to be a browser to access User-Agent. The specific information can be queried through the FireBug plug-in of Firefox
headers = {‘User-Agent’:’Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0′}
req = urllib.request.Request(url=chaper_url, headers=headers)
urllib.request.urlopen(req).read()
After replacing urllib.request.urlopen.read() with the above code, the page with the problem can be accessed normally
Read More:
- urllib.error.HTTPError http error 403 forbidden
- Python error: urllib.error.HTTPError : http Error 404: not found
- How to Fix “HTTP error 403: forbidden” in Python 3. X
- Python 3 urllib has no URLEncode attribute
- [Solved] HTTPError: 404 Client Error: Not Found for url: https://huggingface.co/saved_model
- [Solved] urllib.error.URLError: <urlopen error [SSL: WRONG_VERSION_NUMBER] wrong version number
- [Solved] Forbidden (403) CSRF verification failed. Request aborted.
- SSL error of urllib3 when Python uploads files using Minio
- [Solved]AttributeError: module ‘urllib’ has no attribute ‘quote’
- urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘localhost‘, port=8097): Max retries excee
- [Solved] Conda install Error: An HTTP error occurred when trying to retrieve this URL. HTTP errors are often…
- [Solved] raise ContentTooShortError(urllib.error.ContentTooShortError: <urlopen error retrieval incomplete:
- urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
- [Solved] urllib.error.URLError: urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]
- [Solved] Pytorch Download CIFAR1 Datas Error: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certi
- Error when downloading the built-in dataset of pytoch = urllib.error.urlerror: urlopen error [SSL: certificate_verify_failed]
- When sending HTTP request, python encountered: error 54, ‘connection reset by peer’ solution
- odoo Error: KeyError:‘ir.http‘ [How to Solve]
- [Solved] Connection Error: couldn‘t reach http://raw.githubusercontent.com/huggingface/…