Tag Archives: Hadoop

[Solved] Hive Error while processing statement: FAILED: Execution Error

SQL Error [1] [08S01]: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
    at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)

 

Solution: if su hdfs enter hdfs user vi /etc/password will
The back of hdfs is changed to the following

Then execute hadoop fs -chmod 777 /user

Hive install initialization error: Error: Duplicate key name ‘PCS_STATS_IDX‘ (state=42000,code=1061)

Error log

[root@mihaoyu151 conf]# schematool -dbType mysql -initSchema
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/bin:/opt/soft/jdk180/bin:/opt/soft/zookeeper345/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/jdk180/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/zookeeper345/bin:/opt/soft/hive110/bin)
21/11/09 14:25:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/09 14:25:20 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
21/11/09 14:25:20 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
Metastore connection URL:	 jdbc:mysql://192.168.133.151:3306/hive151?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***

Error reporting reason

Hive database already exists in MySQL

Solution:

Delete hive database in MySQL

[root@mihaoyu151 soft]# mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 135
Server version: 5.7.36 MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| hive151            |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
5 rows in set (0.00 sec)

mysql> drop database hive151;
Query OK, 54 rows affected (0.13 sec)

mysql> exit;
Bye

[root@mihaoyu151 soft]# schematool -dbType mysql -initSchema
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/bin:/opt/soft/jdk180/bin:/opt/soft/zookeeper345/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/jdk180/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/zookeeper345/bin:/opt/soft/hive110/bin)
21/11/09 14:30:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/09 14:30:24 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
21/11/09 14:30:24 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
Metastore connection URL:	 jdbc:mysql://192.168.133.151:3306/hive151?createDatabaseIfNotExist=true
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed

[Solved] Hbase Error: ERROR: KeeperErrorCode = NoNode for /hbase/master

Reason: power failure (including computer sleep, etc.) caused Hmaster to fail to connect, and no master node could be found in zookeeper

Solution: delete the hbase node in zookeeper, open hbase will automatically create this node
1) log in to the zookeeper client: zkCli.sh
2) delete the hbase node: deleteall /hbase

The most critical step: restart hbase, restart zookeeper
1) close zookeeper: my_zk.sh stop This is my script to start zk, don’t copy
2) close hbase: stop-hbase.sh is invalid, use the jps command to find each in the cluster The port number of the hmaster and hregionserver of the machine, kill -9 + port number one by one kills the hbase process, which is equivalent to manually closing hbase
3) Open zookeeper and hbase, my_zk.sh start start-hbase.sh
hdfs does not need to be moved, If yours doesn’t work, hdfs can also be restarted.

[Solved] Sqoop Error: ERROR tool.ImportTool: Import failed: java.io.IOException

21/11/08 12:13:10 ERROR tool.ImportTool: Import failed: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
        at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:143)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:108)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:101)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1311)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:1306)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1335)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

The above error occurs when running the sqoop script (the content of the script is to import MySQL data into HDFS). It is found that there is a lack of dependency. I still report an error after copying the two jar packages hadoop-mapreduce-client-common-2.8.5.jar and hadoop-mapreduce-client-core-2.8.5.jar to the Lib directory of sqoop, Then I copied all the jar packages in the hadoop-2.8.5/share/hadoop/mapreduce directory of Hadoop to solve the problem and run the script successfully. It was simple and violent.

[Solved] Hadoop failed on connection exception: java.net.ConnectException: Connection refused

First use the following command to check whether the port number is enabled

sudo netstat -ntlp

If you don’t find the port number you want to connect to, go to core-site.xml to see if you have this port number.

I don’t deserve 8020. I haven’t been connected with 8020 for a long time.

<property>
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop131:9820</value>
</property>

You can change the port number of the connection to your own

[Solved] Hive Error: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask

Error while processing statement: failed: execution error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.mapredlocaltask

1. Cluster environment

CDH cluster, hive’s engine is Mr.

2. Origin of error

Today, I ran a hive task in the cluster of the test environment and reported an error while processing statement: failed: execution error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.mapredlocaltask.

3. Error reason

This error is because the map join parameter of hive is on by default:

hive.auto.convert.join=true

When using hive for map join, this type of error will be reported if the node memory is insufficient.

4. Error analysis

Mapjoin refers to join on the map side. Its principle is broadcast join, that is, the small table is used as a complete driving table for join operation. Usually, the data in each table to be connected will be processed in different maps. That is, the value corresponding to the same key may exist in different maps. In this way, you must wait until you connect in reduce. To make mapjoin work smoothly, you must meet the following conditions: except that the data of one table is distributed in different maps, the data of other connected tables must have a complete copy in each map. Map join will read all the small tables into memory and directly match the data of another table with the data of the table in memory in the map stage (at this time, the distributed cache can be used to distribute the small tables to various nodes for mapper loading). Due to the join operation during map, the reduction operation is omitted and the efficiency will be much higher.

When the machine memory is insufficient, an error will be reported if you cannot join on the map side.

5. Solution

1. You can close the above map join and change it to common join
shell command line: set hive. Auto. Convert. Join = false 2. Modify the parameters under the configuration file to close the map join. Use common join
hive_conf.xml

<property>
<name>hive.auto.convert.join</name>
<value>false</value>//Modify true to false
<description>Enables the optimization about converting common join into mapjoin</description>
</property>

Invalid column reference when using round in hive



Group by must be added to the hive query SQL, otherwise the column col in the selection list is invalid because it is not included in the aggregate function or group by clause.

Generally speaking, count , round should be regarded as aggregate functions without group by
however, hive does not consider the use of round as an aggregate function. The fields used should be added to group by, and the fields used in round should be added to group by
otherwise, an error is reported invalid column reference

[Solved] there are special symbols in the initial password for installing MySQL in Hadoop, and an error is reported

Today, I installed a MySQL database in the server because there was a ‘)’ in the initial password assigned. I always reported an error when entering the password. I tried many changes on the Internet. It’s useless to wrap anything in quotation marks. Next, let’s talk about my solution:

Step 1:

vi /etc/my.cnf

After opening, add a sentence: skip grant tables

The function is equivalent to that no password is required for login

Step 2:

Restart MySQL

systemctl restart mysqld

Step 3:

Direct Logins

mysql -uroot -p

Step 4:

// goto database
use mysql
// refresh the data
flush privileges;

Step 5:

Change password

ALTER USER 'root'@'localhost' IDENTIFIED WITH mysql_native_password BY 'PASSWORD';

Then, delete the just secret free statement and restart MySQL

Login

mysql -uroot -p[NEW PASSWORD]

[Solved] Hadoop Error: Exception in thread “main“ java.io.IOException: Error opening job jar: /usr/local/hadoop-2.

An exception occurred while running MapReduce task today:
at first, I thought it was my JDK version. The JDK version of Linux was 1.8 and my windows JDK version was 11.0. I changed the JDK environment variable to 1.8, but the problem remained the same after running.

Later, I checked the size of the jar package and found that it was 0kb. Er… I checked the size of other jar packages, no problem. Then I think the jar package is damaged
I transported it through the window again, and it can succeed later

I hope this article is helpful to you~

Sparkcontext: error initializing sparkcontext workaround

Sparkcontext: error initializing sparkcontext workaround

Spark reports an error when configuring a highly available cluster
error sparkcontext: error initializing sparkcontext. Java.net.connectexception: call from Hadoop 102/192.168.10.102 to Hadoop 102: 8020 failed on connection exception: java.net.connectexception: connection rejected

This is because we configured spark logs to be stored in HDFS, but Hadoop was not opened after the spark cluster was started, resulting in an error when submitting tasks.

Solution:

    no longer store the event log
    find the spark installation directory/conf/spark-defaults.conf file, as shown in the figure, and comment out the corresponding event log part
    store the event log locally instead of in HDFS
    replace the directory in the second line in the figure above with the Linux local directory to start the Hadoop cluster (i.e. HDFS service)
    investigate the cause of the error, Or we configured the spark log to be stored in HDFS, but did not open HDFS, so start the Hadoop cluster

An error occurs when HBase uses the shell command: pleaseholdexception: Master is initializing solution

Article catalog

Project scenario: Problem Description: Cause Analysis: solution:


Project scenario:

Ubuntu20.04Hadoop3.2.2Hbase2.2.2


Problem Description:

The main errors are as follows: error: org.apache.hadoop.hbase.pleaseholdexception: Master is initializing

After starting the HBase shell, when using create, list and other commands, the following error messages appear:

hbase(main):001:0> list
TABLE 
                                                                                                                    
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
        at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2452)
        at org.apache.hadoop.hbase.master.MasterRpcServices.getTableNames(MasterRpcServices.java:915)
        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58517)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

For usage try 'help "list"'

Took 10.297 seconds

Cause analysis:

Here, my computer is only configured with HBase application for Hadoop pseudo distributed cluster, so I don’t think it’s possible that the time of HBase and zookeeper servers is inconsistent, as others on the Internet say. The main reason should be: the processes of Hadoop and HBase are inconsistent, resulting in the initialization of the master node all the time


Solution:

Format the HBase file system in Hadoop, restart HBase, and resynchronize the two:

Shut down all HBase services first:

cd /usr/local/hbase
bin/stop-hbase.sh

Then close all Hadoop services:

cd /usr/local/hadoop
sbin/stop-all.sh

Enter JPS to ensure that all Hadoop and HBase processes are closed:

zq@fzqs-Laptop:~$ jps
4673 Jps

Then start the Hadoop service:

cd /usr/local/hadoop
sbin/start-all.sh

To view files in HDFS:

bin/hdfs dfs -ls /

The output shall be as follows (including/HBase):

zq@fzqs-Laptop:/usr/local/hadoop$ bin/hdfs dfs -ls /
Found 1 items
drwxr-xr-x		- root supergroup 		0 2021-10-28 21:49 /hbase

Delete/HBase Directory:

bin/hdfs dfs -rm -r /hbase

Start HBase service:

cd /usr/local/hbase
bin/start-hbase.sh

Then start the shell and you should be able to use it:

bin/hbase shell

Hadoop cluster: about course not obtain block: error reporting

Hadoop cluster: about course not obtain block: error reporting

When accessing HDFS, you encounter the above problems,
it is a node problem:
then check whether the firewall is closed, whether the datanode is started, and whether the data block is damaged:
check and find out that the second problem is the second problem. Then restart Hadoop daemon start datanode on the corresponding host on the command line, JPS to see that it has been started,
then try to execute the code to see if there is an error,
Similarly,
datanodes often hang up automatically,

go to the web (host: 9870)
find that other nodes are not really started in live node
OK
Restart,
reformat
find the HDFS data storage path in the configuration file:

delete $Hadoop from all nodes_ Home%/data/DFs/data/current
then restart the Hadoop cluster (turn off the security mode% hadoop_home% $bin/HDFS dfsadmin – safemode leave)
you can also see that the data has been deleted on the web side,
the landlord found that there are still previous data directories, but the content has been lost
you need to delete these damaged data blocks as well
execute HDFS fsck

View the data block of the mission

hdfs fsck

-Delete deletes a damaged data block

Then upload the data again and execute it again.