Tag Archives: hdfs

Sparkcontext: error initializing sparkcontext workaround

Spark reports an error when configuring a highly available cluster
error sparkcontext: error initializing sparkcontext. Java.net.connectexception: call from Hadoop 102/192.168.10.102 to Hadoop 102: 8020 failed on connection exception: java.net.connectexception: connection rejected

This is because we configured spark logs to be stored in HDFS, but Hadoop was not opened after the spark cluster was started, resulting in an error when submitting tasks.

Solution:

Hadoop cluster: about course not obtain block: error reporting

When accessing HDFS, you encounter the above problems,
it is a node problem:
then check whether the firewall is closed, whether the datanode is started, and whether the data block is damaged:
check and find out that the second problem is the second problem. Then restart Hadoop daemon start datanode on the corresponding host on the command line, JPS to see that it has been started,
then try to execute the code to see if there is an error,
Similarly,
datanodes often hang up automatically,
…
go to the web (host: 9870)
find that other nodes are not really started in live node
OK
Restart,
reformat
find the HDFS data storage path in the configuration file:

delete $Hadoop from all nodes_ Home%/data/DFs/data/current
then restart the Hadoop cluster (turn off the security mode% hadoop_home% $bin/HDFS dfsadmin – safemode leave)
you can also see that the data has been deleted on the web side,
the landlord found that there are still previous data directories, but the content has been lost
you need to delete these damaged data blocks as well
execute HDFS fsck

View the data block of the mission

hdfs fsck

-Delete deletes a damaged data block

Then upload the data again and execute it again.

HDFS Java API operation error (user permission)

Problem Description:

There is a problem when running Hadoop HDFS in idea. The error is as follows:
org. Apache. Hadoop. Security. Accesscontrolexception: permission denied: user = XXXX, access = write, inode = “/”: root: supergroup: drwxr-xr-x
because the user name of this machine is different from that of the Linux operating system, an error will be reported.

Solution:

Under Linux system, find the directory where Hadoop is installed, and find etc/Hadoop/HDFS site. XML
under this directory

<property>
  <name>dfs.permissions.enabled</name>
  <value>false</value>
  <description>
    If "true", enable permission checking in HDFS.
    If "false", permission checking is turned off,
    but all other behavior is unchanged.
    Switching from one parameter value to the other does not change the mode,
    owner or group of files or directories.
  </description>
</property>

Add the above code, restart the cluster, and there will be no problem in operation.

Solution:

The ultimate code, one line solution
Add system.setproperty ("hadoop_user_name", "root") to the Java code to set the permissions of the client to operate on HDFS. that will do


		
		
			This entry was posted in How to Fix and tagged Hadoop, hdfs, java on 2021-10-29 by Robins.



	
				
			
						
				Dbeaver connects hive to solve the problem that hive custom UDF functions cannot be used in SQL queries in dbeaver
			
								


				
			1. Emergence of problems
Today, connect hive with dbeaver and test several SQL executed on the hive client yesterday. There are custom UDF, udtf, udaf, etc. in the SQL, but when the execute button is pressed in dbeaver, an error is reported, saying that it is an invalid function. But it has been registered as a permanent function in hive and has been run. How can it be invalid in dbeaver?
2. Settle
1.Put the create permanent function statement executed at the hive command line into Dbeaver and execute it again

(1) The statement to create a permanent function is as follows:
create function testudf as 'test.CustomUDF' using jar 'hdfs://cls:8020/user/hive/warehouse/testudf/TESTUDF.jar';

3.Cause (not carefully verified)
1. Because my hive client uses hive commands to connect and register functions, and because Dbeaver connects to hive with hiveserver2 service, which is beeline connection. It is said that hive client registration hiveserver2 cannot be used.

2. In the actual operation process, when I execute the instruction to register the permanent function in Dbeaver, the execution result reports that the function already exists, and it will be fine when I execute the sql statement again. So I think it’s possible that the function information was refreshed, because the function was reported to be invalid at the beginning of the execution, indicating that the sql was also executed.
					

		
		
			This entry was posted in MySQL and tagged big data     Hive, Dbeaver connect hive, Hadoop, hdfs, Hive, SQL on 2021-07-29 by Robins.								

	


	
				
			
						
				Start Additional NameNode [How to Solve]
			
								


				
			Problem description
When building HDP cluster and configuring namenode ha, an error is reported:  start additional namenode 
The cluster can be used normally, that is, the active namenode cannot be found when HA is configured


Solution
Two possible reasons
sudo su hdfs -l -c 'hdfs namenode -boostrapStandby'

This command is to be installed on another namenode
When setting the name in get started, the name entered conflicts with the host name
					

		
		
			This entry was posted in How to Fix and tagged Big data exception, hdfs on 2021-07-27 by Robins.								

	


	
				
			
						
				Kettle reported an error and did not have permission to write files to HDFS
			
								


				
			
Caused by: org.apache.commons.vfs2.FileSystemException: Could not create file



First method.

Edit the Spoon.bat file, and add the following to line 119.

“-DHADOOP_USER_NAME=xxx” “-Dfile-encoding=UTF-8”

Note: xxx is your own user name

Second way.

hdfs dfs -chmod 777 /

					

		
		
			This entry was posted in Error and tagged hdfs, Kettle error on 2021-04-20 by Robins.								

	


	
				
			
						
				Introduction of Hadoop HDFS and the use of basic client commands
			
								


				
			
HDFS basic functions

Basic function 
 has the basic operation function of file system; 
 file blocks for unit store the data in different machine disk 
 the default 128 MB large data slices, 3 copies of the 
 the NameNode master node: virtual directory maintenance, management of child nodes (Secondly the NameNode) storage resources scheduling, and the client interaction 
 the DataNode from multiple nodes: save the data, register when they start with the master node, the master node can know the information of it, convenient later call 
 (DataNode from cluster nodes to meet the basic conditions: Linux01, 02 01… The IP domain name between the secret set. Because nodes communicate with each other) 
 Linux01:NameNode DataNode
 Linux02:DataNode
 Linux03:DataNode

Use of client base commands
 location bin/HDFS DFS among them are some commands to upload: HDFS DFS – put./to/(the front is a local path, followed by HDFS path) directory: HDFS DFS – ls/(files in the directory view points, and other road king in the same way) to create folders: HDFS DFS – mkdir/data (create the folder in the root directory) to check the file content: HDFS DFS – cat file path from HDFS download: HDFS DFS get /data/1.txt/(HDFS path in front followed by local path)

					

		
		
			This entry was posted in How to Fix and tagged Hadoop, hdfs, notes on 2020-11-23 by Robins.								

	


	
				
			
						
				CDH HDFS webui browser authentication (after Kerberos authentication is enabled)
			
								


				
			
 
 1. Open firefox and enter: about:config in the address bar to enter the Settings page (not available in other browsers) 
 2. Search for “net.negotiate -auth. Trusted -uris” to change the value to your server host name. 3. Search for “net.auth.use-sspi” and double click to change the value to false. 
 4. Install KFW (no private kfw-4.1-amdc64.msi) 
 5. Copy the contents of a clustered /etc/krb5.conf file to C:\ProgramData\MIT\ os5\krb.ini and delete the path-related configuration.
[logging]

 [libdefaults]
  default_realm = HADOOP.COM
  dns_lookup_realm = false
  dns_lookup_kdc = false
  ticket_lifetime = 24h
  renew_lifetime = 7d
  forwardable = true
  udp_preference_limit = 1

[realms]
 HADOOP.COM = {
  kdc = plum01
  admin_server = plum01
 }

[domain_realm]


					

		
		
			This entry was posted in How to Fix and tagged Big data, hdfs, kerberos on 2020-11-05 by Robins.								

	


	
				
			
						
				hdfs 192.168.2.19:9000 failed on connection exception: java.net.ConnectException:Connection refused
			
								


				
			
HDFS connection failed:

Common causes of errors may be:

1. Hadoop is not started (not all started). Hadoop is normally started including the following services. If the services are not all started, you can check the log



 

2. Installation of pseudo-distributed mode; localhost or 127.0.0.1 is used in the configuration file; at this time, the real ID should be changed, including core-site.xml, Mapred-site.xml, Slaves,

Masters, after modifying IP, the dataNode may fail to start,

Set DFS. Data.dir in the HDFs-site.xml profile:

& lt; property> 
 & lt; name> dfs.data.dir< /name> 
 & lt; value> /data/hdfs/data< /value> 
 & lt; /property>  

Delete all files in the folder in Hadoop and restart Hadoop

					

		
		
			This entry was posted in How to Fix and tagged hdfs on 2020-10-27 by Robins.								

	


	
				
			
						
				sqoop-import ERROR tool.ImportTool: Import failed: No primary key could be found for table user_info
			
								


				
			
 USES sqoop-import to import data from mysql into HDFS because mysql table has no primary key 
[walker001@walker001 ~]$ sqoop-import \
> --connect 'jdbc:mysql://192.168.220.129:3306/test?characterEncoding=UTF-8' \
> --username root \
> --password zwk95914 \
> --table user_info \
> --columns userId,userName,password,trueName,addedTime \
> --target-dir /sqoop/mysql 
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/04/05 11:16:29 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/04/05 11:16:29 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/04/05 11:16:29 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/04/05 11:16:29 INFO tool.CodeGenTool: Beginning code generation
Sun Apr 05 11:16:30 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
20/04/05 11:16:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_info` AS t LIMIT 1
20/04/05 11:16:31 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_info` AS t LIMIT 1
20/04/05 11:16:31 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/walker001/app/hadoop-2.8.2
注: /tmp/sqoop-walker001/compile/e8a05a19dfbef5b687215bb6f631fbd2/user_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
20/04/05 11:16:40 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-walker001/compile/e8a05a19dfbef5b687215bb6f631fbd2/user_info.jar
20/04/05 11:16:40 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/04/05 11:16:40 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/04/05 11:16:40 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/04/05 11:16:40 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/04/05 11:16:40 ERROR tool.ImportTool: Import failed: No primary key could be found for table user_info. Please specify one with --split-by or perform a sequential import with '-m 1'.

<人力资源>
mysql> alter table user_info add primary key(userId);
Query OK, 0 rows affected (0.04 sec)
Records: 0  Duplicates: 0  Warnings: 0

 add primary key 

[walker001@walker001 ~]$ sqoop-import --connect 'jdbc:mysql://192.168.220.129:3306/test?characterEncoding=UTF-8' --username root --password zwk95914 --table user_info --columns userId,userName,password,trueName,addedTime --target-dir /sqoop/mysql
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hbase does not exist! HBase imports will fail.
Please set $HBASE_HOME to the root of your HBase installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /home/walker001/app/sqoop-1.4.7.bin__hadoop-2.6.0/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/04/05 11:19:46 INFO sqoop.Sqoop: Running Sqoop version: 1.4.7
20/04/05 11:19:47 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
20/04/05 11:19:47 INFO manager.MySQLManager: Preparing to use a MySQL streaming resultset.
20/04/05 11:19:47 INFO tool.CodeGenTool: Beginning code generation
Sun Apr 05 11:19:47 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
20/04/05 11:19:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_info` AS t LIMIT 1
20/04/05 11:19:48 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM `user_info` AS t LIMIT 1
20/04/05 11:19:48 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /home/walker001/app/hadoop-2.8.2
注: /tmp/sqoop-walker001/compile/72faaf2287c7c39a8586ce10f0e78d74/user_info.java使用或覆盖了已过时的 API。
注: 有关详细信息, 请使用 -Xlint:deprecation 重新编译。
20/04/05 11:19:52 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-walker001/compile/72faaf2287c7c39a8586ce10f0e78d74/user_info.jar
20/04/05 11:19:52 WARN manager.MySQLManager: It looks like you are importing from mysql.
20/04/05 11:19:52 WARN manager.MySQLManager: This transfer can be faster! Use the --direct
20/04/05 11:19:52 WARN manager.MySQLManager: option to exercise a MySQL-specific fast path.
20/04/05 11:19:52 INFO manager.MySQLManager: Setting zero DATETIME behavior to convertToNull (mysql)
20/04/05 11:19:52 INFO mapreduce.ImportJobBase: Beginning import of user_info
20/04/05 11:19:53 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar
20/04/05 11:19:54 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
20/04/05 11:19:54 INFO client.RMProxy: Connecting to ResourceManager at walker001/192.168.220.129:8032
Sun Apr 05 11:20:06 CST 2020 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
20/04/05 11:20:07 INFO db.DBInputFormat: Using read commited transaction isolation
20/04/05 11:20:07 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN(`userId`), MAX(`userId`) FROM `user_info`
20/04/05 11:20:07 INFO db.IntegerSplitter: Split size: 0; Num splits: 4 from: 1 to: 2
20/04/05 11:20:07 INFO mapreduce.JobSubmitter: number of splits:2
20/04/05 11:20:08 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1586054209222_0001
20/04/05 11:20:09 INFO impl.YarnClientImpl: Submitted application application_1586054209222_0001
20/04/05 11:20:09 INFO mapreduce.Job: The url to track the job: http://walker001:8088/proxy/application_1586054209222_0001/
20/04/05 11:20:09 INFO mapreduce.Job: Running job: job_1586054209222_0001
20/04/05 11:20:33 INFO mapreduce.Job: Job job_1586054209222_0001 running in uber mode : false
20/04/05 11:20:33 INFO mapreduce.Job:  map 0% reduce 0%
20/04/05 11:20:56 INFO mapreduce.Job:  map 100% reduce 0%
20/04/05 11:20:58 INFO mapreduce.Job: Job job_1586054209222_0001 completed successfully
20/04/05 11:20:59 INFO mapreduce.Job: Counters: 30
        File System Counters
                FILE: Number of bytes read=0
                FILE: Number of bytes written=318188
                FILE: Number of read operations=0
                FILE: Number of large read operations=0
                FILE: Number of write operations=0
                HDFS: Number of bytes read=213
                HDFS: Number of bytes written=68
                HDFS: Number of read operations=8
                HDFS: Number of large read operations=0
                HDFS: Number of write operations=4
        Job Counters 
                Launched map tasks=2
                Other local map tasks=2
                Total time spent by all maps in occupied slots (ms)=39020
                Total time spent by all reduces in occupied slots (ms)=0
                Total time spent by all map tasks (ms)=39020
                Total vcore-milliseconds taken by all map tasks=39020
                Total megabyte-milliseconds taken by all map tasks=39956480
        Map-Reduce Framework
                Map input records=2
                Map output records=2
                Input split bytes=213
                Spilled Records=0
                Failed Shuffles=0
                Merged Map outputs=0
                GC time elapsed (ms)=840
                CPU time spent (ms)=2990
                Physical memory (bytes) snapshot=204382208
                Virtual memory (bytes) snapshot=3781640192
                Total committed heap usage (bytes)=48259072
        File Input Format Counters 
                Bytes Read=0
        File Output Format Counters 
                Bytes Written=68
20/04/05 11:20:59 INFO mapreduce.ImportJobBase: Transferred 68 bytes in 65.254 seconds (1.0421 bytes/sec)
20/04/05 11:20:59 INFO mapreduce.ImportJobBase: Retrieved 2 records.
[walker001@walker001 ~]$ hadoop fs -ls /sqoop/mysql
Found 3 items
-rw-r--r--   2 walker001 supergroup          0 2020-04-05 11:20 /sqoop/mysql/_SUCCESS
-rw-r--r--   2 walker001 supergroup         35 2020-04-05 11:20 /sqoop/mysql/part-m-00000
-rw-r--r--   2 walker001 supergroup         33 2020-04-05 11:20 /sqoop/mysql/part-m-00001
[walker001@walker001 ~]$ hadoop fs -cat /sqoop/mysql/*
1,hello,123456,zhangsan,2017-09-01
2,hello2,123456,lisis,2019-09-01

 after re-import 

					

		
		
			This entry was posted in How to Fix and tagged hdfs, mysql, Question summary, sqoop on 2020-10-15 by Robins.								

	


	
				
			
						
				What to do if you repeatedly format a cluster
			
								


				
			
 note: if you just want to solve this problem you can skip 1,2 and go straight to the 3 and 4 solution steps 

  one-click start cluster view where the datanode’s log is 
 sh start-all.sh
 

 enter the log view 
 use shift+g to enter the last row mode, in the upward turn, see the first INFO, there is WARN below, there is a prompt message, about the datenode clusterID and namenode clusterID is not consistent. 
 

  enter CD/export/servers/hadoop – server – cdh5.14.0/hadoopDatas/datanodeDatas/current/ 
 see cat VERSION 
   
 this is consistent with the namenode old ID 
 3.  delete current
 
 for each node in the cluster

  just restart the cluster, and check whether all the starts have been successful 
 sh start-all-sh 
 


					

		
		
			This entry was posted in How to Fix and tagged Hadoop, hdfs on 2020-10-08 by Robins.								

	

			
				Post navigation
				
				Newer posts →