Tag Archives: Hive

Hive: Hive partition sorting error [How to Solve]

First, the error information is as follows:

Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
java.io.IOException: org.apache.hadoop.yarn.exceptions.InvalidResourceRequestException: Invalid resource request, requested resource type=[memory-mb] < 0 or greater than maximum allowed allocation. Requested resource=<memory:1536, vCores:1>, maximum allowed allocation=<memory:256, vCores:4>, please note that maximum allowed allocation is calculated by scheduler based on maximum resource of registered NodeManagers, which might be less than configured maximum allocation=<memory:256, vCores:4>

It can be seen from the error message that the main reason is that the maximum memory of MR is less than the requested content. Here, we can find it on the yarn site Configure RM data information in XML:

   <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>2548</value>
    <discription>Available memory per node, units MB</discription>
  </property>
  
  <property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <discription>Minimum memory that can be requested for a single task, default 1024MB</discription>
  </property>
  
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
    <discription>Maximum memory that can be requested for a single task, default 8192MB</discription>
  </property>

When we restart the Hadoop cluster and run the partition again, an error is reported:

Diagnostic Messages for this Task:
[2021-12-19 10:04:27.042]Container [pid=5821,containerID=container_1639879236798_0001_01_000005] is running 253159936B beyond the 'VIRTUAL' memory limit. Current usage: 92.0 MB of 1 GB physical memory used; 2.3 GB of 2.1 GB virtual memory used. Killing container.
Dump of the process-tree for container_1639879236798_0001_01_000005 :
        |- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
        |- 5821 5820 5821 5821 (bash) 0 0 9797632 286 /bin/bash -c /opt/module/jdk1.8.0_161/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN   -Xmx820m -Djava.io.tmpdir=/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/atguigu/appcache/application_1639879236798_0001/container_1639879236798_0001_01_000005/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/module/hadoop-3.1.3/logs/userlogs/application_1639879236798_0001/container_1639879236798_0001_01_000005 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.17.42 42894 attempt_1639879236798_0001_m_000000_3 5 1>/opt/module/hadoop-3.1.3/logs/userlogs/application_1639879236798_0001/container_1639879236798_0001_01_000005/stdout 2>/opt/module/hadoop-3.1.3/logs/userlogs/application_1639879236798_0001/container_1639879236798_0001_01_000005/stderr  
        |- 5833 5821 5821 5821 (java) 338 16 2498220032 23276 /opt/module/jdk1.8.0_161/bin/java -Djava.net.preferIPv4Stack=true -Dhadoop.metrics.log.level=WARN -Xmx820m -Djava.io.tmpdir=/opt/module/hadoop-3.1.3/data/nm-local-dir/usercache/atguigu/appcache/application_1639879236798_0001/container_1639879236798_0001_01_000005/tmp -Dlog4j.configuration=container-log4j.properties -Dyarn.app.container.log.dir=/opt/module/hadoop-3.1.3/logs/userlogs/application_1639879236798_0001/container_1639879236798_0001_01_000005 -Dyarn.app.container.log.filesize=0 -Dhadoop.root.logger=INFO,CLA -Dhadoop.root.logfile=syslog org.apache.hadoop.mapred.YarnChild 192.168.17.42 42894 attempt_1639879236798_0001_m_000000_3 5 

It can be seen from the above information that the physical memory data is too small, but in fact our virtual memory is enough, so we can configure here not to check the physical memory:

<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

Then restart the cluster: run SQL

The results can be output normally.

[Solved] bin/hive Startup Error: Operation category READ is not supported in state standby

The specific error information is as follows:

[sonkwo@sonkwo-bj-data001 hive-3.1.2]$ bin/hive

which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/ha/hadoop-3.1.3/bin:/opt/module/ha/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/hive-3.1.2/bin:/home/sonkwo/.local/bin:/home/sonkwo/bin)
Hive Session ID = cb685500-b6ba-42be-b652-1aa7bdf0e134

Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Exception in thread "main" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2017)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3125)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1173)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:973)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)

        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:651)
        at org.apache.hadoop.hive.ql.session.SessionState.beginStart(SessionState.java:591)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:747)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:683)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:318)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:232)
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
        at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:98)
        at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2017)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1441)
        at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:3125)
        at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1173)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:973)
        at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:527)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1036)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1000)
        at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:928)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2916)

        at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1545)
        at org.apache.hadoop.ipc.Client.call(Client.java:1491)
        at org.apache.hadoop.ipc.Client.call(Client.java:1388)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
        at com.sun.proxy.$Proxy28.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:904)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
        at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
        at com.sun.proxy.$Proxy29.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1661)
        at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1577)
        at org.apache.hadoop.hdfs.DistributedFileSystem$29.doCall(DistributedFileSystem.java:1574)
        at org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1589)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1683)
        at org.apache.hadoop.hive.ql.exec.Utilities.ensurePathIsWritable(Utilities.java:4486)
        at org.apache.hadoop.hive.ql.session.SessionState.createRootHDFSDir(SessionState.java:760)
        at org.apache.hadoop.hive.ql.session.SessionState.createSessionDirs(SessionState.java:701)
        at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:627)
        ... 9 more

reason:

The status of the three namenodes is standby

Overview ‘sonkwo-bj-data001:8020’ (standby)

Overview ‘sonkwo-bj-data002:8020’ (standby)

Overview ‘sonkwo-bj-data003:8020’ (standby)

Solution:

1) Close the HDFS cluster
stop DFS SH

2) start zookeeper cluster
ZK SH start

3) initialize ha status in zookeeper
HDFS zkfc – formatzk

4) start HDFS service
start DFS sh

Re execute bin/hive link successfully:

hive>
[1]+  Stopped                 hive
[sonkwo@sonkwo-bj-data001 hive-3.1.2]$ hive
which: no hbase in (/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/opt/module/jdk1.8.0_212/bin:/opt/module/ha/hadoop-3.1.3/bin:/opt/module/ha/hadoop-3.1.3/sbin:/opt/module/zookeeper-3.5.7/bin:/opt/module/hive-3.1.2/bin:/home/sonkwo/.local/bin:/home/sonkwo/bin)
Hive Session ID = 698d0919-f46c-42c4-b92e-860f501a7711

Logging initialized using configuration in jar:file:/opt/module/hive-3.1.2/lib/hive-common-3.1.2.jar!/hive-log4j2.properties Async: true
Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
hive>

[Solved] hive sql Error: ParseException in subquery source

hive sql error: ParseException in subquery source
org.apache.hadoop.hive.ql.parse.ParseException:line 368:18 cannot recognize input near ‘group’ ‘by’ ‘order_phone_num’ in subquery source
sql:

     customer_Flag as (

                  select order_phone_num,
                         concat_ws(';', collect_list(c)) as a,
                         sum(customer_flag)              as b
                  from (
                           select order_phone_num,
                                  customer_flag,
                                  concat_ws(',', collect_list(cast(r_diff as string))) as c
                            from add_payment_period
                           group by order_phone_num,
                                    customer_flag
                       )
                  group by order_phone_num

     ),

Solution: give the outermost group by order_phone_Num plus an alias C1

     customer_Flag as (

                  select order_phone_num,
                         concat_ws(';', collect_list(c)) as a,
                         sum(customer_flag)              as b
                  from (
                           select order_phone_num,
                                  customer_flag,
                                  concat_ws(',', collect_list(cast(r_diff as string))) as c
                            from add_payment_period
                           group by order_phone_num,
                                    customer_flag
                       ) c1
                  group by order_phone_num

     ),

How to Solve HiveServer2 & Beeline Error

1. Hive hiveserver2 starts, the process exists, but the exception of port 10000 is not found?

Add a configuration in hive-site.xml and restart the hiveserver2 service

<property>
    <name>hive.metastore.event.db.notification.api.auth</name>
    <value>false</value>
</property>

2. Beeline can’t connect to hiveserver2, report to org apache.hadoop.security.authorize.AuthorizationException?

Permission problem. Entity users such as root are not allowed to access

Add the following configuration in core-site.xml, distribute the configuration, remember to distribute , restart Hadoop, and connect again

<property>
    <name>hadoop.proxyuser.root.hosts</name>
    <value>*</value>
</property>
<property>
    <name>hadoop.proxyuser.root.groups</name>
    <value>*</value>
</property>

[Solved] Error: Could not open client transport with JDBC Uri

Error Messages:

[root@cpucode100 bin]# beeline -u jdbc:hive2://cpucode100:10000 -n root
Connecting to jdbc:hive2://cpucode100:10000
21/12/15 21:41:51 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive

Connecting to jdbc:hive2://cpucode100:10000
21/12/16 21:15:53 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive

Solution:
The configuration information in the hadoop file core-site.xml is as follows, restart Hadoop, and start hiveserver2 and beeline again. Can
Replace the root below with your own username

	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
	</property>
	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
	</property>

distribute

xsync /etc

Restart start

myhadoop.sh stop

myhadoop.sh start

hive --service hiveserver2

Wait, it will be a long time here, about 10 minutes

beeline -u jdbc:hive2://cpucode100:10000 -n root

[Solved] hive find error: The original program header file is: #include Modify to read

These errors occur when executing query statements in hive,

ERROR : Job Submission failed with exception 'java.net.ConnectException(Call From ************ failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused)'
java.net.ConnectException: Call From *************** to ****************:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
	at sun.reflect.GeneratedConstructorAccessor34.newInstance(Unknown Source)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:831)
	at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:755)
	at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
	at org.apache.hadoop.ipc.Client.call(Client.java:1491)
	at org.apache.hadoop.ipc.Client.call(Client.java:1388)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
	at com.sun.proxy.$Proxy85.getNewApplication(Unknown Source)
	at org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getNewApplication(ApplicationClientProtocolPBClientImpl.java:274)
	at sun.reflect.GeneratedMethodAccessor32.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:422)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:165)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:157)
	at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
	at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:359)
	at com.sun.proxy.$Proxy86.getNewApplication(Unknown Source)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNewApplication(YarnClientImpl.java:270)
	at org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.createApplication(YarnClientImpl.java:278)
	at org.apache.hadoop.mapred.ResourceMgrDelegate.getNewJobID(ResourceMgrDelegate.java:196)
	at org.apache.hadoop.mapred.YARNRunner.getNewJobID(YARNRunner.java:271)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:157)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1570)
	at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1567)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1567)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:576)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:571)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:571)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:562)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:423)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:149)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2664)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2335)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:2011)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1709)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1703)
	at org.apache.hadoop.hive.ql.reexec.ReExecDriver.run(ReExecDriver.java:157)
	at org.apache.hive.service.cli.operation.SQLOperation.runQuery(SQLOperation.java:224)
	at org.apache.hive.service.cli.operation.SQLOperation.access$700(SQLOperation.java:87)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork$1.run(SQLOperation.java:316)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
	at org.apache.hive.service.cli.operation.SQLOperation$BackgroundWork.run(SQLOperation.java:329)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.ConnectException: Denied to connect
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
	at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
	at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700)
	at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:804)
	at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:421)
	at org.apache.hadoop.ipc.Client.getConnection(Client.java:1606)
	at org.apache.hadoop.ipc.Client.call(Client.java:1435)
	... 54 more

ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Call From ****** to hadoop102:8032 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused

After careful analysis of this problem, the problem should be that the port belonging to 8032 has an error and has been prompting for abnormal connection. However, I set the 8032 port as the yarn port, which is also 8032. I found that my own yarn service has not been turned on. So you can enter the following command again

start-yarn.sh

Restart yarn and execute the query statement again to display success.

[Solved] Hive execute insert overwrite error: could not be cleared up

Problem description

    1. User Zhangsan executes insert overwrite:
INSERT OVERWRITE table temp.push_temp PARTITION(d_layer='app_video_uid_d_1')
SELECT ...

Could not be cleaned up:

Failed with exception Directory hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1 could not be cleaned up.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Directory hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1 could not be cleaned up.
      1. check the HDFS directory permissions and find that the directory is writable by everyone, and the directory owner is Lisi:
drwxrwxrwt   - lisi supergroup          0 2021-11-29 15:04 /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1
        1. the user Lisi executes step 1. The SQL in step 1 can execute  successfully

Cause of problem

Three words – viscous bit
look carefully at the directory permissions above. The last bit is “t”, which means that the sticky bit is enabled for the directory, that is, only the owner of the directory can delete files under the directory

# Non-owner delete sticky bit files
$ hadoop fs -rm /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1/000000_0
21/11/29 16:32:59 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 7320 minutes, Emptier interval = 0 minutes.
rm: Failed to move to trash: hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1/000000_0: Permission denied by sticky bit setting: user=admin, inode=000000_0

Because insert overwrite needs to delete the original file in the directory, but it cannot be deleted due to sticky bits, resulting in HQL execution failure

Solution

Cancel the sticky bit of the directory

# Cancellation of sticking position
hadoop fs -chmod -R o-t /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1

# Open sticking position
hadoop fs -chmod -R o+t /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1

[Solved] HiveSQL Error: “Error while processing statement: FAILED: Execution Error, return code 2“

Project scenario:

tip: briefly describe the project background here:
when writing hivesql today, some codes are executed, and the same errors are reported after execution
 


Problem Description:

tip: problems encountered in the project are described here: </ font>

Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive

Solution:

tip: fill in the specific solution to the problem here:
later, it was found that this is an MR execution error. The reason is that the program here needs to go through Mr, and I manually turned on the local mode. Just turn it off now. Slow down and slow down

set hive.exec.mode.local.auto=false;

[Solved] Error: java.io.EOFException: Premature EOF from inputStream

Solve the problem of error: java.io.eofexception: precondition EOF from InputStream

1. Question

1. Problem process

During the log parsing task, an error is reported suddenly, and the task is always very stable. How can an error be reported suddenly?A tight heart

2. Detailed error type:

Check the log and find the following errors

21/11/18 14:36:29 INFO mapreduce.Job: Task Id : attempt_1628497295151_1290365_m_000002_2, Status : FAILED
Error: java.io.EOFException: Premature EOF from inputStream
	at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
	at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
	at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
	at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
	at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:58)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

The error is queried through a search engine, and the result points to the upper limit of the dfs.datanode.max.transfer.threads parameter, such as
https://blog.csdn.net/zhoujj303030/article/details/44422415

Viewing the cluster configuration, it is found that the parameter is modified to 8192. Check other problems.

Later, it was found that there was an LZO empty file in the log file. After deletion, the task was executed again and successfully.

2. Solution

To prevent the above problems from happening again, write a script to delete LZO empty files before performing the parsing task

1. Traverse the files under the specified path

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
do  
	echo $file; 
done

Result output:

/xxx/xxx/2037-11-05/pageview/log.1631668209557.lzo
/xxx/xxx/2037-11-05/pageview/log.1631668211445.lzo

2. Judge whether the file is empty

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
do  
	echo $file; 
	lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
	echo $lzoIsEmpty;
	if [[ $lzoIsEmpty -eq 0 ]];then 
		# is empty, delete the file
		hdfs dfs -rm $file;
	else
		echo "Loading data"
	fi
done

3. Final script

for type in webclick error pageview exposure login
do
    isEmpty=$(hdfs dfs -count /xxx/xxx/$do_date/$type | awk '{print $2}')
    if [[ $isEmpty -eq 0 ]];then 
        echo "------ Given Path:/xxx/xxx/$do_date/$type is empty" 
    else 
		for file in `hdfs dfs -ls /xxx/xxx/$do_date/$type | sed '1d;s/  */ /g' | cut -d\  -f8`;
		do  
			echo $file; 
			lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
			echo $lzoIsEmpty;
			if [[ $lzoIsEmpty -eq 0 ]];then 
				echo Delete Files: $file
				hdfs dfs -rm $file;
			fi
		done
		
		echo ================== Import log data of type $do_date $type into ods layer ==================
		... Handling log parsing logic
   fi
done

[Solved] Sqoop Mysqltohive error: Error: java.lang.RuntimeException: java.lang.RuntimeException…

Problem Description
Execute statementbin/sqoop import --connect jdbc:mysql://localhost:3306/gdcmxy --username root --password root --table 2019bigdata --fields-terminated-by '\t' --delete-target-dir --num-mappers 1 --hive-import --hive-database gdcmxy --hive-table 2019bigdata

Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

After seeing the error report, it was preliminarily judged that the main problem was the database connection problem. At first, I thought the firewall was not turned off, but considering that hadoo had certainly turned off the firewall before, it should have nothing to do with the firewall.

Cluster environment:

Solution:

Change the localhost of the execution statement to the IP address of the local machine bin/sqoop import -- connect JDBC: mysql://192.168.112.81:3306/gdcmxy --Username root -- password root -- table 2019bigdata -- fields terminated by '\ t' -- delete target dir -- num mappers 1 -- hive import -- hive database gdcmxy -- hive table 2019bigdata

successfully imported