1. Problem description
The following error occurred during crawler batch download
raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 0 out of 290758 bytes>
2. Cause of problem
Problem cause: urlretrieve download is incomplete
3. Solution
1. Solution I
Use the recursive method to solve the incomplete method of urlretrieve to download the file. The code is as follows:
def auto_down(url,filename):
try:
urllib.urlretrieve(url,filename)
except urllib.ContentTooShortError:
print 'Network conditions is not good.Reloading.'
auto_down(url,filename)
However, after testing, urllib.ContentTooShortError appears in the downloaded file, and it will take too long to download the file again, and it will often try several times, or even more than a dozen times, and occasionally fall into a dead cycle. This situation is very unsatisfactory.
2. Solution II
Therefore, the socket module is used to shorten the time of each re-download and avoid falling into a dead cycle, so as to improve the operation efficiency
the following is the code:
import socket
import urllib.request
#Set the timeout period to 30s
socket.setdefaulttimeout(30)
#Solve the problem of incomplete download and avoid falling into an endless loop
try:
urllib.request.urlretrieve(url,image_name)
except socket.timeout:
count = 1
while count <= 5:
try:
urllib.request.urlretrieve(url,image_name)
break
except socket.timeout:
err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count
print(err_info)
count += 1
if count > 5:
print("downloading picture fialed!")
Read More:
- [Solved] urllib.error.URLError: <urlopen error [SSL: WRONG_VERSION_NUMBER] wrong version number
- urllib.error.URLError: <urlopen error [Errno -3] Temporary failure in name resolution>
- [Solved] Pytorch Download CIFAR1 Datas Error: urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certi
- [Solved] urllib.error.URLError: urlopen error [SSL: CERTIFICATE_VERIFY_FAILED]
- Error when downloading the built-in dataset of pytoch = urllib.error.urlerror: urlopen error [SSL: certificate_verify_failed]
- Python 3 urllib has no URLEncode attribute
- SSL error of urllib3 when Python uploads files using Minio
- visdom Install and Run Error: raise Connectionerror [How to Solve]
- Python error: urllib.error.HTTPError : http Error 404: not found
- urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host=‘localhost‘, port=8097): Max retries excee
- [Solved]AttributeError: module ‘urllib’ has no attribute ‘quote’
- urllib.error.HTTPError: HTTP Error 403: Forbidden [How to Solve]
- urllib.error.HTTPError http error 403 forbidden
- [Solved] ansible Command Error: Error -5 while decompressing data: incomplete or truncated stream
- python chatterbot [nltk_data] Error loading stopwords: <urlopen error [Errno 11004]
- [Solved] raise KeyError(key) from err KeyError: ‘Dates‘
- Using Python error urlopen error unknown URL type: the solution of HTTPS
- [Solved] python tqdm raise RuntimeError(“cannot join current thread“) RuntimeError: cannot join current thr