The urllib.request-urlopen () method is often used to open the source code of a web page and then analyze the source code of the page, but it will throw an “HTTP Error 403: Forbidden” exception for some websites
For example, when the following statement is executed,
urllib.request.urlopen("http://blog.csdn.net/eric_sunah/article/details/11099295")
will appear the following exception:
File "D:\Python32\lib\urllib\request.py", line 475, in open
response = meth(req, response)
File "D:\Python32\lib\urllib\request.py", line 587, in http_response
'http', request, response, code, msg, hdrs)
File "D:\Python32\lib\urllib\request.py", line 513, in error
return self._call_chain(*args)
File "D:\Python32\lib\urllib\request.py", line 447, in _call_chain
result = func(*args)
File "D:\Python32\lib\urllib\request.py", line 595, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Analysis:
Appear abnormal, because if use urllib request. Open a URL urlopen way, the server will only receive a simple access request for the page, but the server does not know to send the request to use the browser, operating system, hardware platform, such as information, and often lack the information request are normal access, such as the crawler.
In order to prevent this abnormal access, some websites will verify the UserAgent in the request information (its information includes hardware platform, system software, application software, and user preferences). If the UserAgent is abnormal or does not exist, then the request will be rejected (as shown in the error message above).
So you can try to add the UserAgent’s information
to the request
Solution:
For Python 3.x, adding the UserAgent information to the request is simple as follows
#HTTPError: HTTP Error 403: Forbidden error appears if the following line is not added
#The main reason is that the site is forbidden to crawl, you can add header information to the request, pretending to be a browser to access the User-Agent, specific information can be found through the Firefox FireBug plugin.
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64; rv:23.0) Gecko/20100101 Firefox/23.0'}
req = urllib.request.Request(url=chaper_url, headers=headers)
urllib.request.urlopen(req).read()
The urllib. Request. Urlopen. Read () to replace the above code, for problems page can normal visit
Read More:
- urllib.error.HTTPError: HTTP Error 403: Forbidden [How to Solve]
- urllib.error.HTTPError http error 403 forbidden
- How to Fix Errors encountered in executing Python scripts with command line parameters
- How to Fix keyerror in Python dictionary lookup
- How to Fix Python reading large local file memory error
- Python error: urllib.error.HTTPError : http Error 404: not found
- How to Solve Python WARNING: Ignoring invalid distribution -ip (e:\python\python_dowmload\lib\site-packages)
- An introduction to sys modules in Python and how packages are imported and used
- When sending HTTP request, python encountered: error 54, ‘connection reset by peer’ solution
- How to Solve Python ImportError: cannot import name UnrewindableBodyError
- Python: How to Obtaining Publick IP Quickly
- Python: How to Solve mysqlclient Install Error in Mac
- Python: How to Disable InsecureRequestWarning error
- Python: How to Reshape the data in Pandas DataFrame
- Python: How to Delete Empty Files or Folders in the Directory
- Python: How to Solve multiprocessing module Error in Windows
- How to Fix tensorflow2.0 tf.placeholder Error
- Python: How to parses HTML, extracts data, and generates word documents
- [Solved] ERROR: No matching distribution found for torch-cluster==x.x.x
- Pychar: How to Fix using SQLite to report an error: java.lang.ClassNotFoundException