Tag Archives: Hadoop

How to Solve null/hadoopbinary/wintils.exe Error

Null/Hadoop binary/wintils.exe reports an error
because Windows lacks Hadoop common package, you need to download and unzip hadoop-common-2.2.0-bin-master package

Set the environment variable in the environment variable,
1. Create Hadoop in the user variable_Home variable name, variable value is the location of common package
2. Add% Hadoop in the system variable path_HOME%\bin;
click OK to save and restart the computer

ERROR server.datanode.DataNode: BlockSender.sendChunks() exception [How to Solve]

View log error messages:

2021-09-13 14:56:08,737 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: BlockSender.sendChunks() exception: 
java.io.IOException: Connection reset by the other party
	at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
	at sun.nio.ch.FileChannelImpl.transferToDirectlyInternal(FileChannelImpl.java:428)
	at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:493)
	at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:608)
	at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:223)
	at org.apache.hadoop.hdfs.server.datanode.FileIoProvider.transferToSocketFully(FileIoProvider.java:280)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:619)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.doSendBlock(BlockSender.java:803)
	at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:750)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:606)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:152)
	at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:104)
	at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:289)
	at java.lang.Thread.run(Thread.java:748

Add configuration of yarn-site XML

<!-- The minimum memory resources (in MB) requested per container. -->
<property>
	<name>yarn.scheduler.minimum-allocation-mb</name>
	<value>512</value>
</property>
<! -- The maximum memory resource (in MB) requested per container. -->
<property>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>512</value>
</property>
<property>
	<name>yarn.scheduler.minimum-allocation-mb</name>
	<value>512</value>
</property>
<! -- The maximum memory resource (in MB) requested per container. -->
<property>
	<name>yarn.scheduler.maximum-allocation-mb</name>
	<value>2048</value>
</property>
<! -- The ratio between container virtual memory and physical memory-->
<property>
	<name>yarn.nodemanager.vmem-pmem-ratio</name>
	<value>4</value>
</property>

These configurations are all about memory resources

Previously, running with small files can be successful   If you change to a large file, you will report the above error   It should have something to do with memory

Pro test effective

How to Solve Zeppelin page 503 error

Problem Description: error 503 is reported on the page after Zeppelin is started

Solution:

You need to modify the directory permissions and attribution of webapp

drwxrwxr-x   3 hadoop hadoop     4096 Sep   9 11:23 webapps

chmod 755 webapps

chown -R hadoop:hadoop webapps/

After modification, restart Zeppelin

[hadoop@10 /usr/local/service/zeppelin-0.9.0/bin]$ ./zeppelin-daemon.sh restart
Please specify HADOOP_CONF_DIR if USE_HADOOP is true
Zeppelin stop                                              [  OK  ]
Zeppelin start                                             [  OK  ]
[hadoop@10 /usr/local/service/zeppelin-0.9.0/bin]$

web UI

ERROR: JAVA_HOME is not set and could not be found.

ERROR: JAVA_ HOME is not set and could not be found.

Background

    attempting to operate on HDFS namenode as root, but there is no HDFS_ NAMENODE_ USER defined. Aborting

Solution

    in $Hadoop_ Add the location of the JDK in the home/etc/Hadoop directory. Note that it is distributed to other machines in the cluster

    export JAVA_HOME=/opt/module/jdk1.8
    

HDFS and local file transfer and error reporting

  Exception during HDFS and local file transfer in Maven project:
  org.apache.hadoop.io.nativeio.NativeIO$Windows.createFileWithMode0(Ljava/lang/String; JJJI)Ljava/io/FileDescriptor;
  org.apache.hadoop.util.NativeCrc32.nativeComputeChunkedSumsByteArray(II[BI[BIILjava/lang/String; JZ)V

  Solution: add hadoop.dll and winutils files in the windows/system32 directory of the local C disk (in the bin directory of Hadoop) and
in the system environment variable   Configure Hadoop_ Home path and% Hadoop_ HOME%\bin

Beeline connection hive2 reports an error permission denied

Error message:

Error: Could not open client transport with JDBC Uri: jdbc:hive2://localhost:10000:
Failed to open new session: 
java.lang.RuntimeException: 
org.apache.hadoop.security.AccessControlException: 
Permission denied: user=anonymous, access=EXECUTE   , inode="/tmp":root:supergroup:drwx------

After trying to add an account and password to hive, it is found that the problem is stored in the last sentence. I am anonymous and the read-write permission of/tmp directory is drwx——

The first character: - indicates that this is a file, d indicates that this is a folder, | indicates that this is a connection file
is divided into three characters in a group
the first three: that is, RWX indicates owner permissions
the middle three: --- user permissions in the same group
the last three: --- other user permissions

permission

represents

value

binary

specific role

R

read

4

00000100

the current user can read the file content and browse the directory

W

write write

2

00000010

the current user can add or modify file contents, and the current user can delete or move directories or files in directories

x

execute

1

00000001

the current user can execute files and enter the directory

Aka, my root account is accessing hive2, so I belong to other user permissions, so I was rejected
solution:

Change the access permission of this file/file directory in the HDFS file system and relax it. The syntax of changing permissions is similar to that of Linux
HDFS DFS - Chmod - R 777/tmp log in with the account password of HDFS

[Solved] Error running query: MetaException(message:Got exception: java.net.ConnectException Call From XXXX

Problem screenshot

Problem description

Error: Error running query: MetaException(message:Got exception: 
java.net.ConnectException Call From hadoop102/192.168.121.102 to hadoop102:9000 
failed on connection exception: 
java.net.ConnectException: Connection denied;
For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused) (state=,code=0)

Cause:

Hadoop cluster is not started

Solution:

Start Hadoop cluster: detailed steps

Result screenshot

HBase hangs up immediately after startup. The port reports an error of 500 and hmaster aborted

[error 1]:

java.lang.RuntimeException: HMaster Aborted
	at org.apache.hadoop.hbase.master.HMasterCommandLine.startMaster(HMasterCommandLine.java:261)
	at org.apache.hadoop.hbase.master.HMasterCommandLine.run(HMasterCommandLine.java:149)
	at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
	at org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:149)
	at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:2971)
2021-08-26 12:25:35,269 INFO  [main-EventThread] zookeeper.ClientCnxn: EventThread shut down for session: 0x37b80a4f6560008

[attempt 1]: delete the HBase node under ZK
but not solve my problem
[attempt 2]: reinstall HBase
but not solve my problem
[attempt 3]: turn off HDFS security mode

hadoop dfsadmin -safemode leave

Still can’t solve my problem
[try 4]: check zookeeper. You can punch in spark normally or add new nodes. No problem.

Turn up and report an error
[error 2]:

master.HMaster: Failed to become active master
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby. Visit https://s.apache.org/sbnn-error
	at org.apache.hadoop.hdfs.server.namenode.ha.StandbyState.checkOperation(StandbyState.java:108)
	at org.apache.hadoop.hdfs.server.namenode.NameNode$NameNodeHAContext.checkOperation(NameNode.java:2044)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkOperation(FSNamesystem.java:1409)
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:2961)
	at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:1160)
	at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:880)
	at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
	at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:507)
	at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1034)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1003)
	at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:931)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1926)
	at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2854)

Here’s the point: an error is reported that my master failed to become an hmaster. The reason is: operation category read is not supported in state standby. That is, the read operation fails because it is in the standby state. Check the NN1 state at this time

#  hdfs haadmin -getServiceState nn1

Sure enough, standby

[solution 1]:
manual activation

hdfs haadmin -transitionToActive --forcemanual nn1

Kill all HBase processes, restart, JPS view, access port

Bravo!!!

It’s been changed for a few hours. It’s finally good. The reason is that I forgot the correct startup sequence
zookeeper —— & gt; hadoop——-> HBase
every time I start Hadoop first, I waste a long time. I hope I can help you.

Resourcemanger reported an error: the port is unavailable

standby   The resourcemanger machine reports that the port is unavailable, port 8088 is normal, the process is running, and the log is normal

Solution ideas

1. Log in to the two resourcemanger machines and check whether the/var/log/Hadoop yarn/Hadoop Hadoop resourcemanager-emr-header-1.cluster *. Log logs contain error messages

2. Check whether there is an automatically pulled log: startup_ MSG: starting ResourceManager to judge whether active standby switching has occurred

2021-08-10 17:58:01,750 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Already in standby state

3. Check the standby log and find the problem: I can’t connect ZK and keep trying to reconnect. ZK connection exception lasts for a long time. It may not be reconnected due to timeout, and it may enter the terminated state.

This is because there is a problem with the ResourceManager state of header-1. The state of HA and the state of hazookeeperconnectionstate are terminated.

4. Status can be by running on header-1, curl http://localhost:8088/ws/v1/cluster/info

Normal active connection, (active RM) is OK, and yarn service is OK:

5. Solution: ARN active RM can provide normal services. The standby RM is in a wrong state. When restarting active ha later, it may not be switched in time. Restart standby RM to solve the problem.

 

How to Solve Hmaster hangs up issue due to namenode switching in Ha mode

Solve the problem that hmaster hangs up due to namenode switching in Ha mode

Question:

When we build our own big data cluster for learning, the virtual machine often gets stuck and the nodes hang up inexplicably because the machine configuration is not high enough.

In Hadoop’s highly available cluster, the machine configuration is not enough, and the two namenodes always switch state automatically, resulting in the hang up of the hmaster node of the HBase cluster.

Causes of problems:

Let’s check the master log of HBase:

# Go to the log file directory
[root@hadoop001 ~]# cd /opt/module/hbase-1.3.1/logs/
[root@hadoop001 logs]# vim hbase-root-master-hadoop001.log 

From the log, it is easy to find that the error is caused by the active/standby switching of namenode.

resolvent:

1. Modify the hbase-site.xml configuration file

Modify the configuration of base.roodir

<property>
     <name>hbase.roodir</name>
     <value>hdfs://hadoop001:9000/hbase</value>
</property>

# change to 
<property>
     <name>hbase.roodir</name>
     <value>hdfs://ns/hbase</value>
</property>

# Note that the ns here is the value of hadoop's dfs.nameservices (configured in hdfs-site-xml, fill in according to your own configuration)

2. Establish soft connection

[root@hadoop001 ~]# ln -s /opt/module/hadoop-2.7.6/etc/hadoop/hdfs-site.xml /opt/module/hbase-1.3.1/conf/hdfs-site.xml
[root@hadoop001 ~]# ln -s /opt/module/hadoop-2.7.6/etc/hadoop/core-site.xml /opt/module/hbase-1.3.1/conf/core-site.xml 

3. Synchronize HBase profiles for all clusters

Use SCP instruction to distribute to other nodes

Then restart the cluster to solve the hang up problem of the hmaster node

Dbeaver connects hive to solve the problem that hive custom UDF functions cannot be used in SQL queries in dbeaver

1. Emergence of problems

Today, connect hive with dbeaver and test several SQL executed on the hive client yesterday. There are custom UDF, udtf, udaf, etc. in the SQL, but when the execute button is pressed in dbeaver, an error is reported, saying that it is an invalid function. But it has been registered as a permanent function in hive and has been run. How can it be invalid in dbeaver?

2. Settle

1.Put the create permanent function statement executed at the hive command line into Dbeaver and execute it again

(1) The statement to create a permanent function is as follows:

create function testudf as 'test.CustomUDF' using jar 'hdfs://cls:8020/user/hive/warehouse/testudf/TESTUDF.jar';

3.Cause (not carefully verified)

1. Because my hive client uses hive commands to connect and register functions, and because Dbeaver connects to hive with hiveserver2 service, which is beeline connection. It is said that hive client registration hiveserver2 cannot be used.
2. In the actual operation process, when I execute the instruction to register the permanent function in Dbeaver, the execution result reports that the function already exists, and it will be fine when I execute the sql statement again. So I think it’s possible that the function information was refreshed, because the function was reported to be invalid at the beginning of the execution, indicating that the sql was also executed.

The transaction log for database ‘xxxx’ is full due to AVAILABILITY_REPLICA error message in SQL Ser…

reason:

The log has reached the maximum space on the primary copy or the disk is full.

analysis

The log block of the primary replica can only be reused after it is fixed and redo on other replicas.

So if

1. Transmission delay, due to network delay or bandwidth delay.

2. Copy redo is slow due to delay, blocking or insufficient resources.

Causes the log to grow and cannot be backed up.

log_ send_ queue_ Size: a log block that has not been received by the replica. More than one log block means delivery delay.

redo_ queue_ Size: there is no redo log block on the replica. If there is more, it means redo delay.

SELECT ag.name AS [availability_group_name]
, d.name AS [database_name]
, ar.replica_server_name AS [replica_instance_name]
, drs.truncation_lsn , drs.log_send_queue_size
, drs.redo_queue_size
FROM sys.availability_groups ag
INNER JOIN sys.availability_replicas ar
    ON ar.group_id = ag.group_id
INNER JOIN sys.dm_hadr_database_replica_states drs
    ON drs.replica_id = ar.replica_id
INNER JOIN sys.databases d
    ON d.database_id = drs.database_id
WHERE drs.is_local=0
ORDER BY ag.name ASC, d.name ASC, drs.truncation_lsn ASC, ar.replica_server_name ASC

resolvent:

1. Remove the DB from the most delayed replica and join it later.

2. If the redo thread on the replica is blocked by frequent read operations, set the replica as unreadable and change it back later.

3. If there is still space on the disk, the log file will grow automatically.

4. If the maximum space limit is reached and the disk still has space, increase the maximum space limit.

5. If the log file reaches the maximum value of 2T system and there are idle disks, add the log file.

reference material

https://docs.microsoft.com/en-US/troubleshoot/sql/availability-groups/error-9002-transaction-log-large