Tag Archives: Big data

[Solved] PhoenixParserException:ERROR 602 (42P00): Syntax error. Missing ‘EOF’

org.apache.phoenix.exception.PhoenixParserException: ERROR 602 (42P00): Syntax error. Missing “EOF” at line 1, column 36.


public class DimUtil {
    public static JSONObject readDimFromPhoenix(Connection conn, String tableName, Long id) {
        String sql = "select * from " + tableName + "where id=?";
        Object[] args = {id.toString()};
        //Get the query result and return
        List<JSONObject> list = JdbcUtil.queryList(conn, sql, args, JSONObject.class);
        return list.size()==1?list.get(0):new JSONObject();

Error analysis:


Just add a space in front of where.

PySpark error: AttributeError: ‘NoneType‘ object has no attribute ‘_jvm‘

Possible reason 1: when you use from pyspark.SQL.Functions import * to pour in the pyspark function, the python built-in function in UDF is replaced by spark function, and you can import it again

Possible reason 2: the user-defined UDF function is not placed in the main function, resulting in an error

ERROR queue.BoundedInMemoryExecutor: error producing records0] org.apache.parquet.io.ParquetDecoding

Article catalog

1 error recurrence 2 causes and Solutions

1 error recurrence

ERROR queue.BoundedInMemoryExecutor: error producing records0]
org.apache.parquet.io.ParquetDecodingException: Can not read value at 1 in block 0 in file hdfs://hdp-yl-1:8020/user/testJoin/test_join27/join/default/1d0f7a5b-fcbc-40aa-994d-ada47e3a3257-0_0-59-5054_20211119171950.parquet

2 causes and Solutions

The reason for the error is that the data types of the fields of the table to be written and the fields of the destination table are different.

The solution is to reset the data type of the written data. See the following example.

write_df2 = write_df2.withColumn("superior_emp_id",col("superior_emp_id").cast("string"))

Apple M1: How to Solve Spark runs Error

snappy-java- (2021-01-20)

Could not initialize class org.xerial.snappy.Snappy
m1 no native library is found for os.name=mac and os.arch=aarch64



The latest package can support M1 chip.

Spring integrated HBase error [How to Solve]

Problem 1
Replace the jar package with spring-data-hadoop-1.0.0.RELEASE version
Problem 2
Introduce hadoop-client-3.1.3.jar and hadoop-common-3.1.3.jar
Problem 3
java.lang.NoClassDefFoundError: org/apache/commons/configuration2/ConfigurationSolution
Introduce commons-configuration2-2.3.jar
Problem 4
java.lang.NoClassDefFoundError: org/apache/hadoop/util/PlatformName
Introduce hadoop-auth-3.1.3.jar
Problem 5
java.lang.NoClassDefFoundError: org/apache/hadoop/mapred/JobConf
Introduce hadoop-mapreduce-client-common-3.1.3.jar, hadoop-mapreduce-client-core-3.1.3.jar and
Problem 6
java.lang.NoClassDefFoundError: com/ctc/wstx/io/SystemId
Introduce woodstox-core-5.0.3.jar
Problem 7
java.lang.NoClassDefFoundError: com/google/common/collect/Interners
Introduce guava-30.1.1-jre.jar
Problem 8
java.lang.NoSuchMethodError: com.google.common.collect.MapMaker.keyEquivalence(Lcom/google/common/base/Equivalence;)Lcom/google/ common/collect/MapMaker
Remove the google-collect-1.0.jar package, guava conflict
Problem 9
java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonGenerator
Introduce jackson-annotations-2.12.4.jar, jackson-core-2.12.4.jar and jackson-databind-2.12.4.jar
Problem 10
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/HBaseConfiguration
Introduce hbase-common-2.2.4.jar
Problem 11
java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface
After searching for a long time, I found that it is written in the configuration file
<bean id=”htemplate” class=”org.springframework.data.hadoop.hbase.HbaseTemplate”>
<property name=”configuration” ref=”hbaseConfiguration”>
Comment it out Summary
Most of the problem is the lack of jar packages, Spring integration with Hbase requires 15 packages.
Among them.
These packages are also required when integrating HDFS

[Solved] Error: java.io.EOFException: Premature EOF from inputStream

Solve the problem of error: java.io.eofexception: precondition EOF from InputStream

1. Question

1. Problem process

During the log parsing task, an error is reported suddenly, and the task is always very stable. How can an error be reported suddenly?A tight heart

2. Detailed error type:

Check the log and find the following errors

21/11/18 14:36:29 INFO mapreduce.Job: Task Id : attempt_1628497295151_1290365_m_000002_2, Status : FAILED
Error: java.io.EOFException: Premature EOF from inputStream
	at com.hadoop.compression.lzo.LzopInputStream.readFully(LzopInputStream.java:75)
	at com.hadoop.compression.lzo.LzopInputStream.readHeader(LzopInputStream.java:114)
	at com.hadoop.compression.lzo.LzopInputStream.<init>(LzopInputStream.java:54)
	at com.hadoop.compression.lzo.LzopCodec.createInputStream(LzopCodec.java:83)
	at com.hadoop.mapreduce.LzoSplitRecordReader.initialize(LzoSplitRecordReader.java:58)
	at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:548)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:786)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)
	at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1907)
	at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)

The error is queried through a search engine, and the result points to the upper limit of the dfs.datanode.max.transfer.threads parameter, such as

Viewing the cluster configuration, it is found that the parameter is modified to 8192. Check other problems.

Later, it was found that there was an LZO empty file in the log file. After deletion, the task was executed again and successfully.

2. Solution

To prevent the above problems from happening again, write a script to delete LZO empty files before performing the parsing task

1. Traverse the files under the specified path

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
	echo $file; 

Result output:


2. Judge whether the file is empty

for file in `hdfs dfs -ls /xxx/xxx/2037-11-05/pageview | sed '1d;s/  */ /g' | cut -d\  -f8`;
	echo $file; 
	lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
	echo $lzoIsEmpty;
	if [[ $lzoIsEmpty -eq 0 ]];then 
		# is empty, delete the file
		hdfs dfs -rm $file;
		echo "Loading data"

3. Final script

for type in webclick error pageview exposure login
    isEmpty=$(hdfs dfs -count /xxx/xxx/$do_date/$type | awk '{print $2}')
    if [[ $isEmpty -eq 0 ]];then 
        echo "------ Given Path:/xxx/xxx/$do_date/$type is empty" 
		for file in `hdfs dfs -ls /xxx/xxx/$do_date/$type | sed '1d;s/  */ /g' | cut -d\  -f8`;
			echo $file; 
			lzoIsEmpty=$(hdfs dfs -count $file | awk '{print $3}')
			echo $lzoIsEmpty;
			if [[ $lzoIsEmpty -eq 0 ]];then 
				echo Delete Files: $file
				hdfs dfs -rm $file;
		echo ================== Import log data of type $do_date $type into ods layer ==================
		... Handling log parsing logic

Hadoop ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_US

In the sbin directory under the hadoop installation directory, respectively modify /start-dfs.sh /stop-dfs.sh /start-yarn.sh /stop-yarn.shThe content of the modified file should be added in the file #!/usr/bin/env bash

/start-dfs.sh /stop-dfs.sh Modified content (HADOOP_SECURE_DN_USER is replaced by HADOOP_SECURE_DN_USER.) above hadoop 3.2 version.)


/start-yarn.sh  /stop-yarn.sh Modify content


[Solved] Beeline Error: Error: Could not open client transport with JDBC Failed to Connection


Error: Could not open client transport with JDBC Uri: jdbc:hive2://node01:10000: java.net.ConnectException: Connection refused (Connection refused) (state=08S01,code=0)

Error: unable to open client transport with JDBC URI: JDBC: hive2:// node01:10000: java.net.connectexception: connection rejected (connection rejected) (state = 08s01, code = 0)


When connecting to beeline through the command, it is found that the client connection fails

[ root@node03 ~]# beeline -u jdbc:hive2://node01:10000 -n root

Check port 10000 and find that it is not started

[ root@node01 ~]# netstat -anp|grep 10000

It takes time for hiveserver2 to start. You need to wait for a while. It will not start until hiveserver2 displays four hive session IDs (I just started four successfully).

Then I realized that no wonder the teacher mentioned that he had to wait a while to connect beeline.

This is a successful start, so don’t worry and deal with it calmly when you report an error.

Error generic_INTERNAL_Error (65536) processing [How to Solve]

Presto queries the data of Apache druid and inserts it into kudu. Occasionally, an error is reported: generic_INTERNAL_Error (65536)
java.lang.illegalargumentexception: unknown field druid.druid. Data source name. Field name: varchar
com.google.common.base.preconditions.checkargument (preconditions. Java: 216)

Check the Druid log and find the corresponding error:
unknown exception (org. Apache. Cite. Tools. Validationexception): org. Apache. Cite. Runtime. Calculecontextexception: from line 2, column 8 to line 2, column 12: column ‘field name’ not found in any table
in addition, errors are occasionally reported when using superset to obtain the data preview of the corresponding Druid data source

Problem analysis:
1. First, judge that it should be a druid problem. The same operation only occasionally reports an error. It is suspected that it is a druid broker problem
2. It is found that the results of querying the number of columns of the same data source through different brokers are inconsistent (SQL: select count (*) columncount from information_schema. Columns where table_neme = ‘data source name’). This may be because historical has not provided services to all segments when the broker is restarted by rolling (after restarting historical, the segment load is not completed, and the segment is still being loaded gradually), resulting in different schema information such as dimension indicators obtained through the segment at different times.

Solution: restart the broker after historically loading segments to ensure that the dimension indicators and other schema information obtained by the broker through segments are consistent. Subsequent restart of the broker should ensure that all segments in the cluster provide services normally.

Summary: the reason for error reporting is that the schema information obtained twice is inconsistent. The inconsistency may be caused by adding or deleting columns, or modifying column names and column data types. To solve the problem of inconsistent schema information, you can solve the current error reporting.

ES Error: Alternatively, set fielddata=true on [How to Solve]

Es reports an error when executing aggregate query: alternately, set fielddata = true on

GET /megacorp/employee/_search
  "aggs": {
    "all_interests": {
      "terms": { "field": "interests" }

Execute the following command to start fielddata

PUT /your_index_name/_mapping?pretty
  "properties": {
    "your_field_name": { 
      "type": "text",
      "fielddata": true

Namenode startup error: outofmemoryerror: Java heap space

1. Find problems

Phenomenon: restart the Hadoop cluster, and the namenode reports an error and cannot be started.

Error reported:

2. Analyze problems

         As soon as you see the word “outofmemoryerror: Java heap space” in the error report, it should be the problem of JVM related parameters. Go to the hadoop-env.sh configuration file when. The configuration file settings are as follows:

export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 

         It can be seen from the above that the size of heap memory is not set in the parameter.

         The default heap memory size of roles (namenode, secondarynamenode, datanode) in the HDFS cluster is 1000m

3. Problem solving

         Change the parameters to the following, start the cluster again, and the start is successful.

export HADOOP_NAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_SECONDARYNAMENODE_OPTS="-Xms4096m -Xmx4096m -Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS" 
export HADOOP_DATANODE_OPTS="-Xms2048M -Xmx2048M -Dhadoop.security.logger=ERROR,RFAS -Xmx4096m $HADOOP_DATANODE_OPTS"

Parameter Description:

        – Xmx4096m   Maximum heap memory available

        – Xms4096m   Initial heap memory

Reference: HDFS memory configuration – flowers are not fully opened * months are not round – blog Park