Hi~ Long time no update
1.Problems that need attention after restarting kafka：
Kafka will have a write file a in the target storage location during execution，this file a will keep a write state for a while，usually one hour Heavy
Generate a new write file b，End the last write file a（The duration of this ending needs to check the configuration of each cluster）. then restart
Here comes the problem，The last write file a，will be recreated after restarting，The last write file b，So the current
a will keep writing status，when reading and writing file a, it will report an error，including importing Hive query will also report an error&# xff08; load to hive
The table will not report an error，but it will report an error when selecting），because this file is always in the write state，It is inoperable，It is also called writing
Lock（I believe everyone has heard of）.
Solution：Then we need to manually terminate the write status of the write file，First we need to determine the status of the write file，In the command
Execute the command on the line ：
hdfs fsck /data/logs/（ Write the directory where the file is located，Change according to where your file is located） -openforwrite
The displayed files are all in the write state：
After seeing the writing file，execute the command to stop all writing files，here explain，why all stop&# xff0c; Logically, it should be stopped before
A write file， but stopping all of them can also solve the problem， is relatively simple and violent， because manual stopping will automatically generate a
， writing files, so you can stop them all. then now execute the command ：
hdfs debug recoverLease -path /logs/common_log/2022-09 -16/FlumeData.1663292498820.tmp（Execute the previous command to display Output write file path） -retries 3
It can be solved by executing each file once，Say more，If this file has been loaded into hive，, you need to go to /user/warehouse/hive/ to find this write status file
2.CDH's Cloudera Manager launch browser access returns 500error：
① First check the configuration of the /etc/hosts file， only need to leave these two lines with the cluster The intranet IP mapping can be
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
②It is also necessary to check whether the ports related to ，cm are occupied by the firewall.
③ Then restart CM， execute the command
nameNode：systemctl stop cloudera-scm-server
Then execute ：systemctl stop cloudera-scm-agent on each node
nameNode：systemctl start cloudera-scm-server
Then execute ：systemctl start cloudera-scm-agent on each node
Attention Pay attention to!!! The execution order of these commands cannot be reversed， Otherwise, there may be problems with cluster startup.
Then you can systemctl status cloudera-scm-server, systemctl status cloudera-scm-agent
Check out the operation.
②If cm starts and can access ， but starts HDFS error 1 or 2
1.Unable to retrieve non-local non-loopback IP address. Seeing address: cm/127.0.0.1
2.ERROR ScmActive-0:com.cloudera.server.cmf. components.ScmActive: ScmActive was not able to access CM identity to validate it.2017-04-18 09:40 :29,308 ERROR ScmActive-0
So congratulations ，find a solution.
First find the source database of CM，Some of them were configured at that time，If you don’t know, ask the person who installed them，Almost all of them are in
Don't ask me for the ， account password on nameNode ~， then show databases; can See that there is a cm or scm library
use this library，then show tables;
You will see a table called HOSTS，View the data of this table-select * from HOSTS ;
You will find that there is a different line ， that is, there is a difference between NAME and IP_ADDRESS， Then you need to modify it back， to
The name and IP_ADDRESS of the intranet，I believe everyone will modify it！Then restart the CM，It's done!
In starting spark shell –master yarn, we will find an error when spark shell is started
YarnClientSchedulerBackend: Yarn application has already exited with state FAILED
At this point we visit the yarn process to see the history of the start-up time error exception: ERRORorg.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM (as shown), the general access to the port number is http://Localhost_name+8088 (default)
This problem often occurs when the jdk version is 1.8. You can directly modify the configuration of yarn-site.xml in hadoop and distribute it to each cluster, and restart the cluster.
The problem encountered today in running mapreduce locally:
Could not locate executable null \bin\winutils.exe in the hadoop binaries
Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable
- Miss winutils.exe file: Could not locate executable null \bin\winutils.exe in the hadoop binaries
- Miss hadoop.dll File: Unable to load native-hadoop library for your platform… using builtin-Java classes where applicable
Solution: Download these two files, download and import these two files into the bin directory under the hadoop directory
Today, I reported such an error when uploading files to Hadoop
2022-03-17 17:17:11,994 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741946_1137
java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 220.127.116.11:9866
2022-03-17 17:17:11,998 WARN hdfs.DataStreamer: Abandoning BP-1890970308-172.25.12.163-1646541195774:blk_1073741946_1137
2022-03-17 17:17:12,007 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[18.104.22.168:9866,DS-87287b18-21ac-4314-884e-d78b139945b8,DISK]
The result is that
slave1 has no copy
Reason and Solution:
slave1 did not turn off the firewall, so just turning off the firewall will be OK.
ERROR: Attempting to operate on hdfs namenode as root
ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation
Look for the start-dfs.sh and stop-dfs.sh files in the /hadoop/sbin path, and add both to the top of them:
By the way, at the top of the start-yarn.sh and stop-yarn.sh files, add:
[[email protected] bin]# beeline -u jdbc:hive2://cpucode100:10000 -n root
Connecting to jdbc:hive2://cpucode100:10000
21/12/15 21:41:51 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: Failed to open new session: java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.authorize.AuthorizationException): User: root is not allowed to impersonate root (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive
Connecting to jdbc:hive2://cpucode100:10000
21/12/16 21:15:53 [main]: WARN jdbc.HiveConnection: Failed to connect to cpucode100:10000
Could not open connection to the HS2 server. Please check the server URI and if the URI is correct, then ask the administrator to check the server status.
Error: Could not open client transport with JDBC Uri: jdbc:hive2://cpucode100:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Beeline version 3.1.2 by Apache Hive
The configuration information in the
core-site.xml is as follows, restart Hadoop, and start
beeline again. Can
Replace the root below with your own username
hive --service hiveserver2
Wait, it will be a long time here, about 10 minutes
beeline -u jdbc:hive2://cpucode100:10000 -n root
wind system black window start hadoop error solution (continuously updated)
cmd command start process error cmd start prompt io error
cmd command start process error
Failed to setup local dir /tmp/hadoop-GK/nm-local-dir, which was marked as good. org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Permissions incorrectly set for dir /tmp/hadoop-GK/nm-local-dir/nmPrivate, should be rwx------, actual value = rwxrwx-
Solution: Run cmd as administrator
cmd startup prompt io error
IOException: Incompatible clusterIDs in D:\hadoop\3.0.3\data\dfs\datanode: namenode clusterID = CID-45d4d17f-96fd-4644-b0ee- 7835ef5bc790; datanode clusterID = CID-01f27c2a-6229-4a10-b098-89e89d4c62e4
Solution: Delete the data directory in hadoop and restart
Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)
Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)
When connecting to beeline through the command, it is found that the client connection fails
[ [email protected] ~]# beeline -u jdbc:hive2://node01:10000 -n root
Check port 10000 and find that it is not started
[ [email protected] ~]# netstat -anp|grep 10000
It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).
Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.
This is a successful start, so don’t worry and deal with it calmly when you report an error.