When running mapreduce, Error: GC overhead limit exceeded appears. Check the log and find that the abnormal information is
2015-12-11 11:48:44,716 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.io.DataInputStream.readUTF(DataInputStream.java : 661 )
at java.io.DataInputStream.readUTF(DataInputStream.java: 564 )
at xxxx.readFields(DateDimension.java: 186)
at xxxx.readFields(StatsUserDimension.java:67)
at xxxx.readFields(StatsBrowserDimension.java:68 )
at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 158 )
at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java: 158 )
at org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java: 239 )
at xxx.reduce(BrowserReducer.java: 37)
at xxx.reduce(BrowserReducer.java:16 )
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java: 171 )
at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java: 627 )
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 389 )
at org.apache.hadoop.mapred.YarnChild$ 2.run(YarnChild.java:168 )
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java: 415 )
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java: 1614 )
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java: 163)
From the exception, we can see that when reduce reads the next data, there is a problem of insufficient memory. From the code, I found that the reduce side uses a read map set, which will cause the problem of insufficient memory. In hadoop2.x, the default container’s yarn child jvm heap size is 200M, which is specified by the parameter mapred.child.java.opts, which can be given when the job is submitted. It is a client-side effective parameter, which is configured in mapred-site. In the xml file, by modifying the parameter to -Xms200m -Xmx1000m to change the jvm heap size, the exception is resolved.
parameter name |
Defaults |
description |
mapred.child.java.opts |
-Xmx200m |
Define the execution jvm parameters of the container container executed by mapreduce |
mapred.map.child.java.opts |
|
Separately specify the execution jvm parameters of the map phase |
mapred.reduce.child.java.opts |
|
Separately specify the execution jvm parameters of the reduce phase |
mapreduce.admin.map.child.java.opts |
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
|
The administrator specifies the jvm parameters executed in the map phase |
mapreduce.admin.reduce.child.java.opts |
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
|
The administrator specifies the execution jvm parameters of the reduce phase |
The respective execution order of the above five parameters taking effect is:
map stage: mapreduce.admin.map.child.java.opts <mapred.child.java.opts <mapred.map.child.java.opts, which means that the definition of mapred.map.child.java.opts will be used eventually jvm parameters, if there is a conflict.
Reduce phase: mapreduce.admin.reduce.child.java.opts <mapred.child.java.opts <mapred.reduce.child.java.opts
Hadoop source code reference: org.apache.hadoop.mapred.MapReduceChildJVM.getChildJavaOpts method.
private static String getChildJavaOpts(JobConf jobConf, boolean isMapTask) {
String userClasspath = "" ;
String adminClasspath = "" ;
if (isMapTask) {
userClasspath = jobConf.get(JobConf.MAPRED_MAP_TASK_JAVA_OPTS,
jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
adminClasspath = jobConf.get(
MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS,
MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
} else {
userClasspath = jobConf.get(JobConf.MAPRED_REDUCE_TASK_JAVA_OPTS,
jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS,
JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS));
adminClasspath = jobConf.get(
MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS,
MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS);
}
// Add admin classpath first so it can be overridden by user.
return adminClasspath + "" + userClasspath;
}