Tag Archives: hdfs

[Solved] Kafka Restarts error | Cloudera Manager Access Returns 500 | HDFS Startup Error

Hi~ Long time no update

1.Problems that need attention after restarting kafka:
Kafka will have a write file a in the target storage location during execution,this file a will keep a write state for a while,usually one hour Heavy
Generate a new write file b,End the last write file a(The duration of this ending needs to check the configuration of each cluster). then restart
Here comes the problem,The last write file a,will be recreated after restarting,The last write file b,So the current
a will keep writing status,when reading and writing file a, it will report an error,including importing Hive query will also report an error&# xff08; load to hive
The table will not report an error,but it will report an error when selecting),because this file is always in the write state,It is inoperable,It is also called writing
Lock(I believe everyone has heard of).
Solution:Then we need to manually terminate the write status of the write file,First we need to determine the status of the write file,In the command
Execute the command on the line :
hdfs fsck /data/logs/( Write the directory where the file is located,Change according to where your file is located) -openforwrite
The displayed files are all in the write state:

insert picture description here

 After seeing the writing file,execute the command to stop all writing files,here explain,why all stop&# xff0c; Logically, it should be stopped before
A write file, but stopping all of them can also solve the problem, is relatively simple and violent, because manual stopping will automatically generate a
, writing files, so you can stop them all. then now execute the command :
hdfs debug recoverLease -path /logs/common_log/2022-09 -16/FlumeData.1663292498820.tmp(Execute the previous command to display Output write file path) -retries 3
It can be solved by executing each file once,Say more,If this file has been loaded into hive,, you need to go to /user/warehouse/hive/ to find this write status file

insert picture description here

2.CDH's Cloudera Manager launch browser access returns 500error:
① First check the configuration of the /etc/hosts file, only need to leave these two lines with the cluster The intranet IP mapping can be
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6

②It is also necessary to check whether the ports related to ,cm are occupied by the firewall.

③ Then restart CM, execute the command
nameNode:systemctl stop cloudera-scm-server
Then execute :systemctl stop cloudera-scm-agent on each node

nameNode:systemctl start cloudera-scm-server
Then execute :systemctl start cloudera-scm-agent on each node
Attention Pay attention to!!! The execution order of these commands cannot be reversed, Otherwise, there may be problems with cluster startup.
Then you can systemctl status cloudera-scm-server, systemctl status cloudera-scm-agent
Check out the operation.

②If cm starts and can access , but starts HDFS error 1 or 2
1.Unable to retrieve non-local non-loopback IP address. Seeing address: cm/127.0.0.1
2.ERROR ScmActive-0:com.cloudera.server.cmf. components.ScmActive: ScmActive was not able to access CM identity  to validate it.2017-04-18 09:40 :29,308 ERROR ScmActive-0

So congratulations ,find a solution.
First find the source database of CM,Some of them were configured at that time,If you don’t know, ask the person who installed them,Almost all of them are in
Don't ask me for the , account password on nameNode ~, then show databases; can See that there is a cm or scm library

insert picture description here

 use this library,then show tables;
You will see a table called HOSTS,View the data of this table-select * from HOSTS ;

insert picture description here

 You will find that there is a different line , that is, there is a difference between NAME and IP_ADDRESS, Then you need to modify it back, to
The name and IP_ADDRESS of the intranet,I believe everyone will modify it!Then restart the CM,It's done!

[Solved] hadoop-2.7.1 Error: Error Cannot find configuration directory etchadoop

Since the configuration is hadoop-2.7.1, it will be found later during startup

The terminal executes./start yarn sh
starting yarn daemons
Error: Cannot find configuration directory: /etc/hadoop
Error: Cannot find configuration directory: /etc/hadoop

This is the reason why the directory cannot be found. You can find the solution by reading the corresponding shell script~

Solution:

Configures the directory where a Hadoop configuration file is located in hadoop-env.sh (codes as below):

export HADOOP_CONF_DIR=/opt/hadoop-2.7.1/etc/hadoop/

Execute command

source  hadoop-env.sh

[Solved] YarnClientSchedulerBackend: Yarn application has already exited with state FAILED

In starting spark shell –master yarn, we will find an error when spark shell is started

YarnClientSchedulerBackend: Yarn application has already exited with state FAILED

At this point we visit the yarn process to see the history of the start-up time error exception: ERRORorg.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM (as shown), the general access to the port number is http://Localhost_name+8088 (default)

Solution:

This problem often occurs when the jdk version is 1.8. You can directly modify the configuration of yarn-site.xml in hadoop and distribute it to each cluster, and restart the cluster.
 <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>10</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

How to Solve Hadoop Missing Hadoop.dll and winutils.exe file error

The problem encountered today in running mapreduce locally:

Could not locate executable null \bin\winutils.exe in the hadoop binaries
Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable

 

Reason:

  1. Miss winutils.exe file: Could not locate executable null \bin\winutils.exe in the hadoop binaries
  2. Miss hadoop.dll File: Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable

 

Solution: Download these two files, download and import these two files into the bin directory under the hadoop directory

[Solved] java.io.IOException: Got error, status=ERROR, status message, ack with firstBadLink as

Today, I reported such an error when uploading files to Hadoop

2022-03-17 17:17:11,994 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741946_1137
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 120.78.239.136:9866
	at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:134)
	at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:110)
	at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1778)
	at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1679)
	at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:716)
2022-03-17 17:17:11,998 WARN hdfs.DataStreamer: Abandoning BP-1890970308-172.25.12.163-1646541195774:blk_1073741946_1137
2022-03-17 17:17:12,007 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[120.78.239.136:9866,DS-87287b18-21ac-4314-884e-d78b139945b8,DISK]

The result is that

slave1 has no copy

Reason and Solution:

slave1 did not turn off the firewall, so just turning off the firewall will be OK.

 

[Solved] ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defi

Question:

ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation

Solution:

Look for the start-dfs.sh and stop-dfs.sh files in the /hadoop/sbin path, and add both to the top of them:

#!/usr/bin/env bash
HDFS_DATANODE_USER=root
HADOOP_SECURE_DN_USER=hdfs
HDFS_NAMENODE_USER=root
HDFS_SECONDARYNAMENODE_USER=root

By the way, at the top of the start-yarn.sh and stop-yarn.sh files, add:

#!/usr/bin/env bash
YARN_RESOURCEMANAGER_USER=root
HADOOP_SECURE_DN_USER=yarn
YARN_NODEMANAGER_USER=root

[Solved] Error: Could not open client transport with JDBC Uri

Error Messages:

[root@cpucode100 bin]# beeline -u jdbc:hive2://cpucode100:10000 -n root
Connecting to jdbc:hive2://cpucode100:10000
21/12/15 21:41:51 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive

Connecting to jdbc:hive2://cpucode100:10000
21/12/16 21:15:53 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive

Solution:
The configuration information in the hadoop file core-site.xml is as follows, restart Hadoop, and start hiveserver2 and beeline again. Can
Replace the root below with your own username

	<property>
		<name>hadoop.proxyuser.root.hosts</name>
		<value>*</value>
	</property>
	<property>
		<name>hadoop.proxyuser.root.groups</name>
		<value>*</value>
	</property>

distribute

xsync /etc

Restart start

myhadoop.sh stop

myhadoop.sh start

hive --service hiveserver2

Wait, it will be a long time here, about 10 minutes

beeline -u jdbc:hive2://cpucode100:10000 -n root

wind System Black Windows Start hadoop Error [How to Solve]

wind system black window start hadoop error solution (continuously updated)
cmd command start process error cmd start prompt io error

cmd command start process error

Failed to setup local dir /tmp/hadoop-GK/nm-local-dir, which was marked as good. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions incorrectly set for dir /tmp/hadoop-GK/nm-local-dir/nmPrivate, should be rwx------, actual value = rwxrwx-

Solution: Run cmd as administrator

cmd startup prompt io error

IOException: Incompatible clusterIDs in D:\hadoop\3.0.3\data\dfs\datanode: namenode clusterID = CID-45d4d17f-96fd-4644-b0ee- 7835ef5bc790; datanode clusterID = CID-01f27c2a-6229-4a10-b098-89e89d4c62e4

Solution: Delete the data directory in hadoop and restart

[Solved] Hive execute insert overwrite error: could not be cleared up

Problem description

    1. User Zhangsan executes insert overwrite:
INSERT OVERWRITE table temp.push_temp PARTITION(d_layer='app_video_uid_d_1')
SELECT ...

Could not be cleaned up:

Failed with exception Directory hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1 could not be cleaned up.
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask. Directory hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1 could not be cleaned up.
      1. check the HDFS directory permissions and find that the directory is writable by everyone, and the directory owner is Lisi:
drwxrwxrwt   - lisi supergroup          0 2021-11-29 15:04 /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1
        1. the user Lisi executes step 1. The SQL in step 1 can execute  successfully

Cause of problem

Three words – viscous bit
look carefully at the directory permissions above. The last bit is “t”, which means that the sticky bit is enabled for the directory, that is, only the owner of the directory can delete files under the directory

# Non-owner delete sticky bit files
$ hadoop fs -rm /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1/000000_0
21/11/29 16:32:59 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 7320 minutes, Emptier interval = 0 minutes.
rm: Failed to move to trash: hdfs://Ucluster/user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1/000000_0: Permission denied by sticky bit setting: user=admin, inode=000000_0

Because insert overwrite needs to delete the original file in the directory, but it cannot be deleted due to sticky bits, resulting in HQL execution failure

Solution

Cancel the sticky bit of the directory

# Cancellation of sticking position
hadoop fs -chmod -R o-t /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1

# Open sticking position
hadoop fs -chmod -R o+t /user/hive/warehouse/temp.db/push_temp/d_layer=app_video_uid_d_1

Datanode startup failed with an error: incompatible clusterids

Article catalog

Datanode failed to start and reported an error incompatible clusterids information error summary problem description problem cause analysis steps solution reference

Datanode startup failed with an error: incompatible clusterids

Information

Environment version: Hadoop 3.3.1 system version: CentOS 7.4 java version: Java se 1.8.0_ three hundred and one

Error report summary

java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9

Problem description

An error is reported when datanode is started. The contents of the error reported in/opt/module/hadoop-3.3.1/logs/hadoop-bordy-datanode-hadoop 102.log log are as follows:

2021-11-29 21:58:51,350 INFO org.apache.hadoop.hdfs.server.common.Storage: Using 1 threads to upgrade data directories (dfs.datanode.parallel.volumes.load.threads.num=1, dataDirs=1)
2021-11-29 21:58:51,354 INFO org.apache.hadoop.hdfs.server.common.Storage: Lock on /opt/module/hadoop-3.3.1/data/dfs/data/in_use.lock acquired by nodename 13694@hadoop102
2021-11-29 21:58:51,356 WARN org.apache.hadoop.hdfs.server.common.Storage: Failed to add storage directory [DISK]file:/opt/module/hadoop-3.3.1/data/dfs/data
java.io.IOException: Incompatible clusterIDs in /opt/module/hadoop-3.3.1/data/dfs/data: namenode clusterID = CID-aa23cfe4-9ad3-4c06-87fc-e862c8f3a722; datanode clusterID = CID-55fa9a51-7777-4ff4-87d6-4df7cf2cb8b9
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.doTransition(DataStorage.java:746)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadStorageDirectory(DataStorage.java:296)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.loadDataStorage(DataStorage.java:409)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.addStorageLocations(DataStorage.java:389)
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:561)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,358 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020. Exiting.
java.io.IOException: All specified directories have failed to load.
        at org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:562)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:1753)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:1689)
        at org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:394)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:295)
        at org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:854)
        at java.lang.Thread.run(Thread.java:748)
2021-11-29 21:58:51,359 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Ending block pool service for: Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405) service to hadoop101/192.168.2.101:8020
2021-11-29 21:58:51,363 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Removed Block pool <registering> (Datanode Uuid a4eeff59-0192-4402-8278-4743158fa405)
2021-11-29 21:58:53,364 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2021-11-29 21:58:53,424 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down DataNode at hadoop102/192.168.2.102
************************************************************/

Cause of problem

The upgrade function of Hadoop requires data node to store a permanent clusterid in its version file. When datanode starts, it will check and match the clusterid in the version file of namenode. If the two do not match, an exception of “incompatible clusterids” will appear. See the official CCR [hdfs-107]

Analysis steps

    View clusterid in the version file under /opt/module/hadoop-3.3.1/data/DFs/data/current in the datanode directory /opt/module/hadoop-3.3.1/data/DFs/name/current .
    view clusterid in the version file under /code> in the namnode directory /opt/module/hadoop-3.3.1/data/name/current . Br> found two files The clusterid in is missing and does not match. It is understood that in the HDFS architecture, each datanode needs to communicate with the namenode, and the clusterid is the unique ID of the namenode.

    terms of settlement

    Modify the clusterid value of the failed datanode to the clusterid of the primary namenode

    reference resources

    Hadoop failed to start datanode. There is a problem with clusterid - Wang Shen - blog Garden (cnblogs. Com)

[Solved] Beeline Error: Error: Could not open client transport with JDBC Failed to Connection

 

Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)

Solution

When connecting to beeline through the command, it is found that the client connection fails

[ root@node03 ~]# beeline -u jdbc:hive2://node01:10000 -n root

Check port 10000 and find that it is not started

[ root@node01 ~]# netstat -anp|grep 10000

It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).

Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.

This is a successful start, so don’t worry and deal with it calmly when you report an error.