When running mapreduce, Error: GC overhead limit exceeded appears. Check the log and find that the abnormal information is
2015-12-11 11:48:44,716 FATAL [main] org.apache.hadoop.mapred.YarnChild: Error running child: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.io.DataInputStream.readUTF(DataInputStream.java : 661 ) at java.io.DataInputStream.readUTF(DataInputStream.java: 564 ) at xxxx.readFields(DateDimension.java: 186) at xxxx.readFields(StatsUserDimension.java:67) at xxxx.readFields(StatsBrowserDimension.java:68 ) at org.apache.hadoop.io.WritableComparator.compare(WritableComparator.java: 158 ) at org.apache.hadoop.mapreduce.task.ReduceContextImpl.nextKeyValue(ReduceContextImpl.java: 158 ) at org.apache.hadoop.mapreduce.task.ReduceContextImpl$ValueIterator.next(ReduceContextImpl.java: 239 ) at xxx.reduce(BrowserReducer.java: 37) at xxx.reduce(BrowserReducer.java:16 ) at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java: 171 ) at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java: 627 ) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java: 389 ) at org.apache.hadoop.mapred.YarnChild$ 2.run(YarnChild.java:168 ) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java: 415 ) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java: 1614 ) at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java: 163)
From the exception, we can see that when reduce reads the next data, there is a problem of insufficient memory. From the code, I found that the reduce side uses a read map set, which will cause the problem of insufficient memory. In hadoop2.x, the default container’s yarn child jvm heap size is 200M, which is specified by the parameter mapred.child.java.opts, which can be given when the job is submitted. It is a client-side effective parameter, which is configured in mapred-site. In the xml file, by modifying the parameter to -Xms200m -Xmx1000m to change the jvm heap size, the exception is resolved.
parameter name | Defaults | description |
mapred.child.java.opts | -Xmx200m | Define the execution jvm parameters of the container container executed by mapreduce |
mapred.map.child.java.opts | Separately specify the execution jvm parameters of the map phase | |
mapred.reduce.child.java.opts | Separately specify the execution jvm parameters of the reduce phase | |
mapreduce.admin.map.child.java.opts |
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
|
The administrator specifies the jvm parameters executed in the map phase |
mapreduce.admin.reduce.child.java.opts |
-Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN
|
The administrator specifies the execution jvm parameters of the reduce phase |
The respective execution order of the above five parameters taking effect is:
map stage: mapreduce.admin.map.child.java.opts <mapred.child.java.opts <mapred.map.child.java.opts, which means that the definition of mapred.map.child.java.opts will be used eventually jvm parameters, if there is a conflict.
Reduce phase: mapreduce.admin.reduce.child.java.opts <mapred.child.java.opts <mapred.reduce.child.java.opts
Hadoop source code reference: org.apache.hadoop.mapred.MapReduceChildJVM.getChildJavaOpts method.
private static String getChildJavaOpts(JobConf jobConf, boolean isMapTask) { String userClasspath = "" ; String adminClasspath = "" ; if (isMapTask) { userClasspath = jobConf.get(JobConf.MAPRED_MAP_TASK_JAVA_OPTS, jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS, JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS)); adminClasspath = jobConf.get( MRJobConfig.MAPRED_MAP_ADMIN_JAVA_OPTS, MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS); } else { userClasspath = jobConf.get(JobConf.MAPRED_REDUCE_TASK_JAVA_OPTS, jobConf.get(JobConf.MAPRED_TASK_JAVA_OPTS, JobConf.DEFAULT_MAPRED_TASK_JAVA_OPTS)); adminClasspath = jobConf.get( MRJobConfig.MAPRED_REDUCE_ADMIN_JAVA_OPTS, MRJobConfig.DEFAULT_MAPRED_ADMIN_JAVA_OPTS); } // Add admin classpath first so it can be overridden by user. return adminClasspath + "" + userClasspath; }
Read More:
- Hive Error: Error: GC overhead limit exceeded
- [Solved] kettle Error: GC overhead limit exceeded
- IDEA Compile Project Error: GC overhead limit exceeded solution
- Idea Error: java: java.lang.OutOfMemoryError: GC overhead limit exceeded
- [Solved] Hadoop Error: Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
- Tidb2.1 reports Error statement count 5001 exceeded the transaction limit, autocommit = false
- [Solved] “HttpSendRequest“ failed, Windows error code=12002 andretry limit (0) exceeded for URL
- Impala Find Error: InternalException: Memory limit exceeded: Error occurred on backend
- [Solved] Keil Error: *** ERROR L250: CODE SIZE LIMIT IN RESTRICTED VERSION EXCEEDED
- How to Solve Hadoop Missing Hadoop.dll and winutils.exe file error
- [Solved] Hadoop error java.lang.nosuchmethoderror
- Hadoop Error: hdfs.DFSClient: Exception in createBlockOutputStream
- [Solved] Flink Hadoop is not in the classpath/dependencies
- [Solved] Doris BrokerLoad Error: Scan bytes per broker scanner exceed limit: 3221225472
- [Solved] Flume Error: java.lang.NoClassDefFoundError: org/apache/hadoop/conf/Configuration
- [HBase Error]“java.lang.OutOfMemoryError: Requested array size exceeds VM limit”
- wind System Black Windows Start hadoop Error [How to Solve]
- Mac M1 Start Virtual Machine Centos8 with PD to install Kafka error: Error: VM option ‘UseG1GC‘
- [Solved] Flink Error: Flink Hadoop is not in the classpath/dependencies
- This limit can be set by changing the [search.max_open_scroll_context] setting