Tag Archives: Spark

[Solved] Spark Error: org.apache.spark.SparkException: A master URL must be set in your configuration

Error when running the project to connect to Spark:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/10/08 21:02:10 INFO SparkContext: Running Spark version 3.0.0
22/10/08 21:02:10 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:380)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:120)
	at test.wyh.wordcount.TestWordCount$.main(TestWordCount.scala:10)
	at test.wyh.wordcount.TestWordCount.main(TestWordCount.scala)
22/10/08 21:02:10 INFO SparkContext: Successfully stopped SparkContext
Exception in thread "main" org.apache.spark.SparkException: A master URL must be set in your configuration
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:380)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:120)
	at test.wyh.wordcount.TestWordCount$.main(TestWordCount.scala:10)
	at test.wyh.wordcount.TestWordCount.main(TestWordCount.scala)

Process finished with exit code 1

Solution:

Configure the following parameters:

-Dspark.master=local[*]

Restart IDEA.

[Solved] Spark job failed during runtime. Please check stacktrace for the root cause.

spark SQL Export Data to Kafka error [How to Solve]

Failed to find data source: kafka. Please deploy the application as per the deployment section of "Structured Streaming + Kafka Integration Guide"

The reason for this error is the lack of spark-sql-kafka-0-10_2.11-2.4.5.jar dependency

Download the jar package, put it on the server, and add it to the submission command

–jars spark-sql-kafka-0-10_2.11-2.4.5.jar

Error is still reported, and error is reported at this time

ommandExec.sideEffectResult(commands.scala:69)
	at org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:87)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:177)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:173)
	at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:201)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:198)
	at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:173)
	at org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:93)
	at org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:91)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:727)
	at org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:727)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1$$anonfun$apply$1.apply(SQLExecution.scala:95)
	at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:144)
	at org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:86)
	at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:789)
	at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:63)
	at org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:727)
	at org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:313)
	at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:288)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:694)
Caused by: java.lang.ClassNotFoundException: org.apache.kafka.common.serialization.ByteArrayDeserializer
	at java.net.URLClassLoader.findClass(URLClassLoader.java:382)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:419)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:352)
	... 45 more

Check the spark directory. There is no kafka-clients jar package

Just add the Kafka-clients dependency package to the submit command

spark-submit --master yarn --deploy-mode cluster --jars spark-sql-kafka-0-10_2.11-2.4.5.jar,kafka-clients-2.0.0.jar

Resubmit and solve the problem

[Solved] .java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a runn

. java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a runn

Running Spark Program on the server reports an error
java.lang.IllegalArgumentException: requirement failed: Can only call getServletHandlers on a running MetricsSystem

Solution:

After checking the spark Master and Worker services did not start, and then submit it to run normally after starting.

[Solved] Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.

Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.

This error occurs when executing sql with a collection in the hive on spark engine:
[42000][3] Error while processing statement: FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Spark job failed during runtime. Please check stacktrace for the root cause.

Solution 1: Switching engine using mr
set hive.execution.engine=mr;

Solution 2:
set mapred.map.tasks.speculative.execution=true
set mapred.reduce.tasks.speculative.execution=true

[Solved] YarnClientSchedulerBackend: Yarn application has already exited with state FAILED

In starting spark shell –master yarn, we will find an error when spark shell is started

YarnClientSchedulerBackend: Yarn application has already exited with state FAILED

At this point we visit the yarn process to see the history of the start-up time error exception: ERRORorg.apache.hadoop.hdfs.server.namenode.NameNode: RECEIVED SIGNAL 15: SIGTERM (as shown), the general access to the port number is http://Localhost_name+8088 (default)

Solution:

This problem often occurs when the jdk version is 1.8. You can directly modify the configuration of yarn-site.xml in hadoop and distribute it to each cluster, and restart the cluster.

 <property>
    <name>yarn.nodemanager.vmem-pmem-ratio</name>
    <value>10</value>
</property>
<property>
    <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
</property>

[Solved] error: value createSchemaRDD is not a member of org.apache.spark.sql.SQLContext

<console>:23: error: value createSchemaRDD is not a member of org.apache.spark.sql.SQLContext

After spark 1.3, spark SQL cancels createSchemaRDD and replaces it with implicits.

import sqlContext.implicits._

[Solved] ERROR SparkContext: Error initializing SparkContext. org.apache.spark.SparkException: Could not pars

ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: ‘<pyspark.conf.SparkConf object at 0x7fb5da40a850>’
Error Messages:

22/03/14 10:58:26 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/03/14 10:58:26 INFO SparkContext: Running Spark version 3.0.0
22/03/14 10:58:26 INFO ResourceUtils: ==============================================================
22/03/14 10:58:26 INFO ResourceUtils: Resources for spark.driver:

22/03/14 10:58:26 INFO ResourceUtils: ==============================================================
22/03/14 10:58:26 INFO SparkContext: Submitted application: Spark01_FindWord.py
22/03/14 10:58:26 INFO SecurityManager: Changing view acls to: root
22/03/14 10:58:26 INFO SecurityManager: Changing modify acls to: root
22/03/14 10:58:26 INFO SecurityManager: Changing view acls groups to: 
22/03/14 10:58:26 INFO SecurityManager: Changing modify acls groups to: 
22/03/14 10:58:26 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
22/03/14 10:58:27 INFO Utils: Successfully started service 'sparkDriver' on port 44307.
22/03/14 10:58:27 INFO SparkEnv: Registering MapOutputTracker
22/03/14 10:58:27 INFO SparkEnv: Registering BlockManagerMaster
22/03/14 10:58:27 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/03/14 10:58:27 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/03/14 10:58:27 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/03/14 10:58:27 INFO DiskBlockManager: Created local directory at /tmp/blockmgr-a4554f81-46f5-4499-b6fd-2d5a9005552b
22/03/14 10:58:27 INFO MemoryStore: MemoryStore started with capacity 366.3 MiB
22/03/14 10:58:27 INFO SparkEnv: Registering OutputCommitCoordinator
22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4040. Attempting port 4041.
22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4041. Attempting port 4042.
22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4042. Attempting port 4043.
22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4043. Attempting port 4044.
22/03/14 10:58:27 WARN Utils: Service 'SparkUI' could not bind on port 4044. Attempting port 4045.
22/03/14 10:58:27 INFO Utils: Successfully started service 'SparkUI' on port 4045.
22/03/14 10:58:27 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://node01:4045
22/03/14 10:58:27 ERROR SparkContext: Error initializing SparkContext.
org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7fb5da40a850>'
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
22/03/14 10:58:27 INFO SparkUI: Stopped Spark web UI at http://node01:4045
22/03/14 10:58:27 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
22/03/14 10:58:27 INFO MemoryStore: MemoryStore cleared
22/03/14 10:58:27 INFO BlockManager: BlockManager stopped
22/03/14 10:58:28 INFO BlockManagerMaster: BlockManagerMaster stopped
22/03/14 10:58:28 WARN MetricsSystem: Stopping a MetricsSystem that is not running
22/03/14 10:58:28 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
22/03/14 10:58:28 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
  File "/ext/servers/spark3.0/MyCode/Spark01_FindWord.py", line 4, in <module>
    sc = SparkContext(conf)
  File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 131, in __init__
  File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 193, in _do_init
  File "/ext/servers/spark3.0/python/lib/pyspark.zip/pyspark/context.py", line 310, in _initialize_context
  File "/ext/servers/spark3.0/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py", line 1569, in __call__
  File "/ext/servers/spark3.0/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py", line 328, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: org.apache.spark.SparkException: Could not parse Master URL: '<pyspark.conf.SparkConf object at 0x7fb5da40a850>'
	at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2924)
	at org.apache.spark.SparkContext.<init>(SparkContext.scala:528)
	at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:238)
	at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
	at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)

22/03/14 10:58:28 INFO ShutdownHookManager: Shutdown hook called
22/03/14 10:58:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-32987338-486b-448a-b9f4-3cf4d68aa85d
22/03/14 10:58:28 INFO ShutdownHookManager: Deleting directory /tmp/spark-2160f024-2430-4ab1-b5ed-5c3afc402351
(Spark_python) root@node01:/ext/servers/spark3.0/MyCode# spark-submit Spark01_FindWord.py

Modification method

error code

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("MyApp")
sc = SparkContext(conf)
hdfs_file = "hdfs://node01:/user/hadoop/test06.txt"
hdfs_rdd = sc.textFile(hdfs_file).count()
print(hdfs_rdd)

Modified code

from pyspark import SparkConf, SparkContext

conf = SparkConf().setMaster("local").setAppName("MyApp")
sc = SparkContext(conf=conf)
hdfs_file = "hdfs://node01:/user/hadoop/test06.txt"
hdfs_rdd = sc.textFile(hdfs_file).count()
print(hdfs_rdd)

Summary

The first is the parameter transfer of the sparkcontext method. If we don’t specify it, you can’t write conf directly for default parameter transfer, so an error will be reported later, so we need to specify the parameters

[Solved] spark sql Error: Can‘t zip RDDs with unequal numbers of partitions

Caused by: java.lang.IllegalArgumentException: Can't zip RDDs with unequal numbers of partitions: List(1, 200)

Solution:
Close AQE: set spark.sql.adaptive.enabled = false;

[Solved] Hive On Spark Error: Remote Spark Driver – HiveServer2 connection has been closed

Error Messages:

Failed to monitor Job[-1] with exception ‘java.lang.IllegalStateException(Connection to remote Spark driver was lost)’ Last known state = SENT
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. Unable to send message SyncJobRequest{job=org.apache.hadoop.hive.ql.exec.spark.status.impl.RemoteSparkJobStatus$GetAppIDJob@7805478c} because the Remote Spark Driver - HiveServer2 connection has been closed.

The real cause of this problem requires going to Yarn and looking at the Application’s detailed logs at

It turns out that the executor-memory is too small and needs to be modified in the hive configuration page

Save the changes and restart the relevant components, the problem is solved.

[Solved] IDEA: Internal error (java.lang.UnsupportedClassVersionError)

After creating a spark project and testing a “hello word”, I encountered the following error：

Internal error (java.lang.UnsupportedClassVersionError): org/intellij/erlang/jps/model/JpsErlangModelSerializerExtension has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0
java.lang.UnsupportedClassVersionError: org/intellij/erlang/jps/model/JpsErlangModelSerializerExtension has been compiled by a more recent version of the Java Runtime (class file version 55.0), this version of the Java Runtime only recognizes class file versions up to 52.0

Solution: Close the Erlang plugin and restart IDEA
Steps: Intel iJ IDEA -> Preferences -> Pluglinis -> (search Erlang) Remove the checkbox

Reference: https://youtrack.jetbrains.com/issue/IDEA-256900

IDEA: How to Solve spark source code Modified Error

Scalastyle examines your Scala code and indicates potential problems with it. Place scalastyle_config.xml in the <root>/.idea or <root>/project directory.

Download scalastyle_config.xml file from scalastyle website to spark code root directory, replace the original xml file and restart IDEA.

ProgrammerAH

Programmer Guide, Tips and Tutorial