[Solved] raise ContentTooShortError(urllib.error.ContentTooShortError: ＜urlopen error retrieval incomplete:

1. Problem description

The following error occurred during crawler batch download

 raise ContentTooShortError(
urllib.error.ContentTooShortError: <urlopen error retrieval incomplete: got only 0 out of 290758 bytes>

2. Cause of problem

Problem cause: urlretrieve download is incomplete

3. Solution

1. Solution I

Use the recursive method to solve the incomplete method of urlretrieve to download the file. The code is as follows:

def auto_down(url,filename):
    try:
        urllib.urlretrieve(url,filename)
    except urllib.ContentTooShortError:
        print 'Network conditions is not good.Reloading.'
        auto_down(url,filename)

However, after testing, urllib.ContentTooShortError appears in the downloaded file, and it will take too long to download the file again, and it will often try several times, or even more than a dozen times, and occasionally fall into a dead cycle. This situation is very unsatisfactory.

2. Solution II

Therefore, the socket module is used to shorten the time of each re-download and avoid falling into a dead cycle, so as to improve the operation efficiency
the following is the code:

import socket
import urllib.request
#Set the timeout period to 30s
socket.setdefaulttimeout(30)
#Solve the problem of incomplete download and avoid falling into an endless loop
try:
    urllib.request.urlretrieve(url,image_name)
except socket.timeout:
    count = 1
    while count <= 5:
        try:
            urllib.request.urlretrieve(url,image_name)                                                
            break
        except socket.timeout:
            err_info = 'Reloading for %d time'%count if count == 1 else 'Reloading for %d times'%count
            print(err_info)
            count += 1
    if count > 5:
        print("downloading picture fialed!")

ProgrammerAH

Programmer Guide, Tips and Tutorial

[Solved] raise ContentTooShortError(urllib.error.ContentTooShortError: ＜urlopen error retrieval incomplete: