[Solved] Error: java.io.EOFException: Premature EOF from inputStream

Solve the problem of error: java.io.eofexception: precondition EOF from InputStream

1. Question

1. Problem process

During the log parsing task, an error is reported suddenly, and the task is always very stable. How can an error be reported suddenly?A tight heart

2. Detailed error type:

Check the log and find the following errors

21/11/18 14:36:29 INFO mapreduce.Job: Task Id : attempt_1628497295151_1290365_m_000002_2, Status : FAILED
Error: java.io.EOFException: Premature EOF from inputStream
	at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
	at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
	at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
	at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
	at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:58)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

The error is queried through a search engine, and the result points to the upper limit of the dfs.datanode.max.transfer.threads parameter, such as
https://blog.csdn.net/zhoujj303030/article/details/44422415

Viewing the cluster configuration, it is found that the parameter is modified to 8192. Check other problems.

Later, it was found that there was an LZO empty file in the log file. After deletion, the task was executed again and successfully.

2. Solution

To prevent the above problems from happening again, write a script to delete LZO empty files before performing the parsing task

1. Traverse the files under the specified path

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
do  
	echo $file; 
done

Result output:

/xxx/xxx/2037-11-05/pageview/log.1631668209557.lzo
/xxx/xxx/2037-11-05/pageview/log.1631668211445.lzo

2. Judge whether the file is empty

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
do  
	echo $file; 
	lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
	echo $lzoIsEmpty;
	if [[ $lzoIsEmpty -eq 0 ]];then 
		# is empty, delete the file
		hdfs dfs -rm $file;
	else
		echo "Loading data"
	fi
done

3. Final script

for type in webclick error pageview exposure login
do
    isEmpty=$(hdfs dfs -count /xxx/xxx/$do_date/$type | awk '{print $2}')
    if [[ $isEmpty -eq 0 ]];then 
        echo "------ Given Path:/xxx/xxx/$do_date/$type is empty" 
    else 
		for file in `hdfs dfs -ls /xxx/xxx/$do_date/$type | sed '1d;s/  */ /g' | cut -d\  -f8`;
		do  
			echo $file; 
			lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
			echo $lzoIsEmpty;
			if [[ $lzoIsEmpty -eq 0 ]];then 
				echo Delete Files: $file
				hdfs dfs -rm $file;
			fi
		done
		
		echo ================== Import log data of type $do_date $type into ods layer ==================
		... Handling log parsing logic
   fi
done

Read More: