Tag Archives: Hadoop

Spring integrated HBase error [How to Solve]

Problem 1
Replace the jar package with spring-data-hadoop-1.0.0.RELEASE version
Problem 2
Introduce hadoop-client-3.1.3.jar and hadoop-common-3.1.3.jar
Problem 3
java.lang.NoClassDefFoundError: org/apache/commons/configuration2/ConfigurationSolution
Introduce commons-configuration2-2.3.jar
Problem 4
java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Introduce hadoop-auth-3.1.3.jar
Problem 5
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
Introduce hadoop-mapreduce-client-common-3.1.3.jar, hadoop-mapreduce-client-core-3.1.3.jar and
Problem 6
java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId
Introduce woodstox-core-5.0.3.jar
Problem 7
java.lang.NoClassDefFoundError: com/google/common/collect/Interners
Introduce guava-30.1.1-jre.jar
Problem 8
java.lang.NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence(Lcom/google/common/base/Equivalence;)Lcom/google/ common/collect/MapMaker
Remove the google-collect-1.0.jar package, guava conflict
Problem 9
java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonGenerator
Introduce jackson-annotations-2.12.4.jar, jackson-core-2.12.4.jar and jackson-databind-2.12.4.jar
Problem 10
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Introduce hbase-common-2.2.4.jar
Problem 11
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface
After searching for a long time, I found that it is written in the configuration file
<bean id=”htemplate” class=”org.springframework.data.hadoop.hbase.HbaseTemplate”>
<property name=”configuration” ref=”hbaseConfiguration”>
Comment it out Summary
Most of the problem is the lack of jar packages, Spring integration with Hbase requires 15 packages.
Among them.
These packages are also required when integrating HDFS

Hadoop ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_US

In the sbin directory under the hadoop installation directory, respectively modify /start-dfs.sh /stop-dfs.sh /start-yarn.sh /stop-yarn.shThe content of the modified file should be added in the file #!/usr/bin/env bash

/start-dfs.sh /stop-dfs.sh Modified content (HADOOP_SECURE_DN_USER is replaced by HADOOP_SECURE_DN_USER.) above hadoop 3.2 version.)


/start-yarn.sh  /stop-yarn.sh Modify content


[Solved] Beeline Error: Error: Could not open client transport with JDBC Failed to Connection


Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)


When connecting to beeline through the command, it is found that the client connection fails

[ [email protected] ~]# beeline -u jdbc:hive2://node01:10000 -n root

Check port 10000 and find that it is not started

[ [email protected] ~]# netstat -anp|grep 10000

It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).

Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.

This is a successful start, so don’t worry and deal with it calmly when you report an error.

Error in configuring Hadoop 3.1.3: attempting to operate on yarn nodemanager as root error

This may occur when HDFS and yarn services are turned on, and when HDFS and yarn services are turned off using scripts, the

solution may also occur

Add the following parameters to the top of start-dfs.sh and stop-dfs.sh (in SBIN of Hadoop installation directory)


Add the following parameters to the top of start-yarn.sh and stop-yarn.sh (in SBIN of Hadoop installation directory)


Successfully solved the problem

[Solved] Sqoop Mysqltohive error: Error: java.lang.RuntimeException: java.lang.RuntimeException…

Problem Description
Execute statementbin/sqoop import --connect jdbc:mysql://localhost:3306/gdcmxy --username root --password root --table 2019bigdata --fields-terminated-by '\t' --delete-target-dir --num-mappers 1 --hive-import --hive-database gdcmxy --hive-table 2019bigdata

Error: java.lang.RuntimeException: java.lang.RuntimeException: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure

After seeing the error report, it was preliminarily judged that the main problem was the database connection problem. At first, I thought the firewall was not turned off, but considering that hadoo had certainly turned off the firewall before, it should have nothing to do with the firewall.

Cluster environment:


Change the localhost of the execution statement to the IP address of the local machine bin/sqoop import -- connect JDBC: mysql:// --Username root -- password root -- table 2019bigdata -- fields terminated by '\ t' -- delete target dir -- num mappers 1 -- hive import -- hive database gdcmxy -- hive table 2019bigdata

successfully imported

Namenode startup error: outofmemoryerror: Java heap space

1. Find problems

Phenomenon: restart the Hadoop cluster, and the namenode reports an error and cannot be started.

Error reported:

2. Analyze problems

         As soon as you see the word “outofmemoryerror: Java heap space” in the error report, it should be the problem of JVM related parameters. Go to the hadoop-env.sh configuration file when. The configuration file settings are as follows:

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 

         It can be seen from the above that the size of heap memory is not set in the parameter.

         The default heap memory size of roles (namenode, secondarynamenode, datanode) in the HDFS cluster is 1000m

3. Problem solving

         Change the parameters to the following, start the cluster again, and the start is successful.

export HADOOP_NAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_SECONDARYNAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Xms2048M -Xmx2048M -Dhadoop.security.logger=ERROR,RFAS -Xmx4096m $HADOOP_DATANODE_OPTS"

Parameter Description:

        – Xmx4096m   Maximum heap memory available

        – Xms4096m   Initial heap memory

Reference: HDFS memory configuration – flowers are not fully opened * months are not round – blog Park

Sqoop connection gbase data error [How to Solve]

1. When sqoop accesses the gbase database through the command, the connection error is as follows:;

Just add the — driver parameter to the command

sqoop list-tables -connect jdbc: gbase:// -Driver com.gbase.jdbc.driver – username gbase – password gbase2010531
run successfully!

[Solved] Hive 2.3.9 Error: Error: Unrecognized column type: UNIONTYPE (state=,code=0)

Import uniontype data into test table with CSV file in hive_ After serializer, use select * from test_ An error occurred in the serializer

Error: Unrecognized column type: UNIONTYPE (state=,code=0)

Hive version: 2.3.9

After investigation, this is a bug that hive JDBC has solved in version 3.0.0  :  HIVE-17259

I want to try to solve the bug without upgrading the version. Imagine using the bug repair code of version 3.0.0 in the source code of hive JDBC of version 2.3.9, that is, make the following modifications:

1. Find the corresponding version of hive JDBC source code on the Apache website and download it

Apache Downloads

2. Find jdbccolumn.java and modify it

Add two lines of code as follows:

    } else if ("uniontype".equalsIgnoreCase(type)) {
      return Type.UNION_TYPE;

3. Package and generate a new JDBC jar file and copy it to hive server

Navigate to the JDBC directory in CMD and use the MVN package command to package

After packaging, find hive-jdbc-2.3.9-standalone.jar and hive-jdbc-2.3.9.jar in the target directory under the JDBC directory, copy them to {$hive_home}/JDBC and {$hive_home}/lib directories respectively, and restart hiverver2:

Implementation after repair:

[Solved] Hive Error while processing statement: FAILED: Execution Error

SQL Error [1] [08S01]: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)
    at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)
    at org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)
    at org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)


Solution: if su hdfs enter hdfs user vi /etc/password will
The back of hdfs is changed to the following

Then execute hadoop fs -chmod 777 /user

Hive install initialization error: Error: Duplicate key name ‘PCS_STATS_IDX‘ (state=42000,code=1061)

Error log

[[email protected] conf]# schematool -dbType mysql -initSchema
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/bin:/opt/soft/jdk180/bin:/opt/soft/zookeeper345/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/jdk180/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/zookeeper345/bin:/opt/soft/hive110/bin)
21/11/09 14:25:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/09 14:25:20 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
21/11/09 14:25:20 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
Metastore connection URL:	 jdbc:mysql://
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Error: Duplicate key name 'PCS_STATS_IDX' (state=42000,code=1061)
org.apache.hadoop.hive.metastore.HiveMetaException: Schema initialization FAILED! Metastore state would be inconsistent !!
*** schemaTool failed ***

Error reporting reason

Hive database already exists in MySQL


Delete hive database in MySQL

[[email protected] soft]# mysql -uroot -proot
mysql: [Warning] Using a password on the command line interface can be insecure.
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 135
Server version: 5.7.36 MySQL Community Server (GPL)

Copyright (c) 2000, 2021, Oracle and/or its affiliates.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> show databases;
| Database           |
| information_schema |
| hive151            |
| mysql              |
| performance_schema |
| sys                |
5 rows in set (0.00 sec)

mysql> drop database hive151;
Query OK, 54 rows affected (0.13 sec)

mysql> exit;

[[email protected] soft]# schematool -dbType mysql -initSchema
which: no hbase in (/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin:/bin:/opt/soft/jdk180/bin:/opt/soft/zookeeper345/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/jdk180/bin:/opt/soft/hadoop260/sbin:/opt/soft/hadoop260/bin:/opt/soft/zookeeper345/bin:/opt/soft/hive110/bin)
21/11/09 14:30:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/11/09 14:30:24 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.user does not exist
21/11/09 14:30:24 WARN conf.HiveConf: HiveConf of name hive.server2.thrift.client.password does not exist
Metastore connection URL:	 jdbc:mysql://
Metastore Connection Driver :	 com.mysql.jdbc.Driver
Metastore connection User:	 root
Starting metastore schema initialization to 1.1.0-cdh5.14.2
Initialization script hive-schema-1.1.0.mysql.sql
Initialization script completed
schemaTool completed

[Solved] Hbase Error: ERROR: KeeperErrorCode = NoNode for /hbase/master

Reason: power failure (including computer sleep, etc.) caused Hmaster to fail to connect, and no master node could be found in zookeeper

Solution: delete the hbase node in zookeeper, open hbase will automatically create this node
1) log in to the zookeeper client: zkCli.sh
2) delete the hbase node: deleteall /hbase

The most critical step: restart hbase, restart zookeeper
1) close zookeeper: my_zk.sh stop This is my script to start zk, don’t copy
2) close hbase: stop-hbase.sh is invalid, use the jps command to find each in the cluster The port number of the hmaster and hregionserver of the machine, kill -9 + port number one by one kills the hbase process, which is equivalent to manually closing hbase
3) Open zookeeper and hbase, my_zk.sh start start-hbase.sh
hdfs does not need to be moved, If yours doesn’t work, hdfs can also be restarted.

[Solved] Sqoop Error: ERROR tool.ImportTool: Import failed: java.io.IOException

21/11/08 12:13:10 ERROR tool.ImportTool: Import failed: java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
        at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:143)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:108)
        at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:101)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1311)
        at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1307)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1844)
        at org.apache.hadoop.mapreduce.Job.connect(Job.java:1306)
        at org.apache.hadoop.mapreduce.Job.submit(Job.java:1335)
        at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1359)
        at org.apache.sqoop.mapreduce.ImportJobBase.doSubmitJob(ImportJobBase.java:200)
        at org.apache.sqoop.mapreduce.ImportJobBase.runJob(ImportJobBase.java:173)
        at org.apache.sqoop.mapreduce.ImportJobBase.runImport(ImportJobBase.java:270)
        at org.apache.sqoop.manager.SqlManager.importTable(SqlManager.java:692)
        at org.apache.sqoop.manager.MySQLManager.importTable(MySQLManager.java:127)
        at org.apache.sqoop.tool.ImportTool.importTable(ImportTool.java:520)
        at org.apache.sqoop.tool.ImportTool.run(ImportTool.java:628)
        at org.apache.sqoop.Sqoop.run(Sqoop.java:147)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.sqoop.Sqoop.runSqoop(Sqoop.java:183)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:234)
        at org.apache.sqoop.Sqoop.runTool(Sqoop.java:243)
        at org.apache.sqoop.Sqoop.main(Sqoop.java:252)

The above error occurs when running the sqoop script (the content of the script is to import MySQL data into HDFS). It is found that there is a lack of dependency. I still report an error after copying the two jar packages hadoop-mapreduce-client-common-2.8.5.jar and hadoop-mapreduce-client-core-2.8.5.jar to the Lib directory of sqoop, Then I copied all the jar packages in the hadoop-2.8.5/share/hadoop/mapreduce directory of Hadoop to solve the problem and run the script successfully. It was simple and violent.