Tag Archives: Big data

[Solved] ERROR PythonRunner: Python worker exited unexpectedly (crashed)

Some time ago, I received a private letter from my fans and reported an error when running in pychart. Error Python runner: Python worker exited unexpectedly (crashed)

The test run print (input_rdd. First()) can be printed, but the print (input_rdd. Count()) trigger function will report an error

print(input_rdd.count())

Error Python runner: Python worker exited unexpectedly (crashed) means Python worker exited unexpectedly (crashed)

21/10/24 10:24:48 ERROR PythonRunner: Python worker exited unexpectedly (crashed)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR PythonRunner: This may have been caused by a prior exception:
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

21/10/24 10:24:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "D:/Code/pycode/exercise/pyspark-study/pyspark-learning/pyspark-day04/main/01_web_analysis.py", line 28, in <module>
    print(input_rdd.first())
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1586, in first
    rs = self.take(1)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1566, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\context.py", line 1233, in runJob
    sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
	at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)


Process finished with exit code 1

For the solution to this problem, Xiaobian inquired online. This problem may be caused by many situations. For the current situation that Xiaobian helps solve, spark running locally on Windows system is a software problem. The amount of data is a little large, and errors may be reported when running on pycharm.

Without much nonsense, let’s talk about the solution to the problem of fans. It’s very simple. After pycharm is closed, open it again and run it again. Note that if not, shut down again and run again.

hdfs-bug:DataXceiver error processing WRITE_BLOCK operation

The error message and screenshot are as follows:

calculation112.aggrx:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.1.1.116:36274 dst: /10.1.1.112:50010
java.io.IOException: Premature EOF from inputStream
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:808)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
    at java.lang.Thread.run(Thread.java:748)
  ......

Reason: the file operation exceeds the lease term, that is, the file is deleted during the data stream operation

Scheme:

Step 1: modify the maximum number of files opened in the process

cat /etc/security/limits.conf  | grep -v ^#


*        soft    nofile        1000000
*         hard    nofile        1048576
*        soft    nproc        65536
*        hard    nproc        unlimited
*        soft    memlock        unlimited
*        hard    memlock        unlimited
*         -      nofile          1000000

Step 2: (modify the number of data transmission threads)

Done.

 

org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph

An error is reported when the Flink SQL client submits the job:

2021-10-21 15:23:54,232 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-10-21 15:23:54,233 WARN  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2021-10-21 15:23:54,291 INFO  org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing over to rm2
2021-10-21 15:23:54,337 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface n101:36989 of application 'application_1634635118307_0001'.
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.

Failing over to rm2

Only active namenode can be submitted successfully. My active namenode is RM2

Clickhouse error: XXXX.XXXX_local20211009 (8fdb18e9-bb4c-42d8-8fdb-18e9bb4c02d8): auto…

Code: 49, e.displayText() = DB::Exception: Part 20211009_67706_67706_0 is covered by 20211009_67118_67714_12 but should be merged into 20211009_67706_67715_1. This shouldn’t happen often., Stack trace (when copying this message, always include the lines below):
Error Messages:
XXXX.XXXX_local20211009 (8fdb18e9-bb4c-42d8-8fdb-18e9bb4c02d8): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 49, e.displayText() = DB::Exception: Part 20211009_67706_67706_0 is covered by 20211009_67118_67714_12 but should be merged into 20211009_67706_67715_1. This shouldn’t happen often., Stack trace (when copying this message, always include the lines below):
Solution 1.
1. Try to delete the local table and the distributed table XXXX.XXXX_local20211009

CDH operation and maintenance: child node cloudera SCM agent starts Error

1.Startup Error:
./cloudera-scm-agent start
Error Screenshot:

2.Delete or backup to other corresponding pid files according to the error report
find/-name cloudera-scm-agent.pid
mv cloudera-scm-agent.pid cloudera-scm-agent.pid20211019
or rm -rf cloudera-scm-agent.pid
3.restart cloudera-scm-agent
./cloudera-scm-agent start
Done!

[Solved] Android Room: Database Common Error ‘missing database’

Common error 1:

D:\AndroidProjectsDemo\JetpeckTest\app\build\tmp\kapt3\stubs\debug\com\example\jetpecktest\room\BookDao.
java:15: error: There is a problem with the query: [SQLITE_ERROR] 
SQL error or missing database (no such table: BookEntity)
public abstract java.util.List<com.example.jetpecktest.room.BookEntity> loadAllBooks();

**Solution: * * the error report mentions no such table: bookentity , so first check whether your class is added to the database, that is, check the entities =?In the
annotation in your database class
did you add your entity class.

@Database(version = 2, entities = [BookEntity::class])
abstract class DataBase:RoomDatabase() {

How to Solve Azkaban startup error

Environment.
One virtual machine (web and exector both on one machine)
MySQL 8.x (most of the later problems are due to him)
Hive-3.1.2
Azkaban-exec-server-3.84.4
Azkaban-web-server-3.84.4
1. SLF4J problems:
ERROR [StdOutErrRedirect] [Azkaban] SLF4J: Class path contains multiple SLF4J bindings.

2021/09/30 10:02:31.820 +0800 ERROR [StdOutErrRedirect] [Azkaban] SLF4J: Class path contains multiple SLF4J bindings.
2021/09/30 10:02:31.820 +0800 ERROR [StdOutErrRedirect] [Azkaban] SLF4J: Found binding in [jar:file:/opt/module/azkaban/azkaban-exec/lib/slf4j-log4j12-1.7.21.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2021/09/30 10:02:31.820 +0800 ERROR [StdOutErrRedirect] [Azkaban] SLF4J: Found binding in [jar:file:/data/soft/apache-hive-3.1.2-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
2021/09/30 10:02:31.820 +0800 ERROR [StdOutErrRedirect] [Azkaban] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
2021/09/30 10:02:31.823 +0800 ERROR [StdOutErrRedirect] [Azkaban] SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]

Cause: conflict with log4j package in hive
solution: delete Azkaban exec/lib/slf4j-log4j12-1.7.21.jar or add. Bak suffix
command:

cd /azkaban-exec/lib
mv slf4j-log4j12-1.7.21.jar slf4j-log4j12-1.7.21.jar.bak

2. DB connection problems(MySQL 8)
ERROR [MySQLDataSource] [Azkaban] Failed to find write-enabled DB connection.

2021/09/30 10:23:33.973 +0800 ERROR [MySQLDataSource] [Azkaban] Failed to find write-enabled DB connection. Wait 15 seconds and retry. No.Attempt = 1
java.sql.SQLException: Cannot create PoolableConnectionFactory (java.lang.ClassCastException: java.math.BigInteger cannot be cast to java.lang.Long)
        at org.apache.commons.dbcp2.BasicDataSource.createPoolableConnectionFactory(BasicDataSource.java:2294)

Reason: there is a problem with the MySQL driver package. The MySQL driver package provided by Azkaban cannot connect to MySQL 8. Just change to mysql-connector-java-5.1.47, and then a usessl error will be reported. This can be changed to MySQL. Database = Azkaban?Usessl = false in azkaban.properties. (both web and exector will report this error. Just modify it in turn)

3.
ERROR [PluginCheckerAndActionsLoader] [Azkaban] plugin path plugins/triggers doesn’t exist!
ERROR [ExecutorManager] [Azkaban] No active executors found

2021/09/30 10:47:52.799 +0800 ERROR [PluginCheckerAndActionsLoader] [Azkaban] plugin path plugins/triggers doesn't exist!
2021/09/30 10:47:52.838 +0800 INFO [AzkabanWebServer] [Azkaban] Setting timezone to Asia/Shanghai
2021/09/30 10:47:52.838 +0800 INFO [AzkabanWebServer] [Azkaban] Registering MBeans...
2021-09-30T10:47:52,869 INFO [main] azkaban.server.MBeanRegistrationManager - Bean azkaban.jmx.JmxJettyServer registered.
2021-09-30T10:47:52,871 INFO [main] azkaban.server.MBeanRegistrationManager - Bean azkaban.jmx.JmxTriggerManager registered.
2021-09-30T10:47:52,890 INFO [main] azkaban.server.MBeanRegistrationManager - Bean azkaban.jmx.JmxExecutorManager registered.
2021-09-30T10:47:52,915 INFO [main] azkaban.server.MBeanRegistrationManager - Bean org.apache.log4j.jmx.HierarchyDynamicMBean registered.
2021/09/30 10:47:52.915 +0800 INFO [AzkabanWebServer] [Azkaban] ************* loginLoggerObjName is null, make sure there is a logger with name azkaban.webapp.servlet.LoginAbstractAzkabanServlet
2021/09/30 10:47:52.916 +0800 INFO [ExecutorManager] [Azkaban] Initializing executors from database.
2021/09/30 10:47:52.941 +0800 ERROR [ExecutorManager] [Azkaban] No active executors found
2021/09/30 10:47:52.942 +0800 ERROR [StdOutErrRedirect] [Azkaban] Exception in thread "main"
2021/09/30 10:47:52.942 +0800 ERROR [StdOutErrRedirect] [Azkaban] azkaban.executor.ExecutorManagerException: No active executors found
2021/09/30 10:47:52.942 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.executor.ActiveExecutors.setupExecutors(ActiveExecutors.java:52)
2021/09/30 10:47:52.943 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.executor.ExecutorManager.setupExecutors(ExecutorManager.java:192)
2021/09/30 10:47:52.943 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.executor.ExecutorManager.initialize(ExecutorManager.java:127)
2021/09/30 10:47:52.943 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.executor.ExecutorManager.start(ExecutorManager.java:141)
2021/09/30 10:47:52.943 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.webapp.AzkabanWebServer.launch(AzkabanWebServer.java:234)
2021/09/30 10:47:52.943 +0800 ERROR [StdOutErrRedirect] [Azkaban]       at azkaban.webapp.AzkabanWebServer.main(AzkabanWebServer.java:227)

ERROR [PluginCheckerAndActionsLoader] [Azkaban] plugin path plugins/triggers doesn’t exist!
(you can refer to this.) https://blog.csdn.net/liumu243/article/details/81288884 )
this error can be ignored or run. The main problem here is that the exector is not activated

curl -G "localhsot:12321/executor?action=activate" && echo
//Appears to prove successful activation
{status:success}

Administrative Region:
https://192.168.xx.xx:8443
http://192.168.xx.xx:8081

Run hadoop fs -put Command Error: java.io.IOException: Got error, status message , ack with firstBadLink

After the Hadoop cluster is set up, the following port error occurs on one node when the Hadoop FS – put command is executed

location:org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1495)
throwable:java.io.IOException: Got error, status message , ack with firstBadLink as ip:port
at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:142)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1482)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1385)
at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:554)

First, check whether the firewalls of the three nodes are closed

firewall-cmd --state

Turn off firewall command

systemctl stop firewalld.service

If this situation still occurs after the firewall is closed, it may be that the ports of ECs are not open to the public. Take Alibaba cloud server as an example (Tencent cloud opens all ports by default):

After setting, execute the command again to succeed.

How to Solve Error in importing scala word2vecmodel

import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
model save:
Link: http://spark.apache.org/docs/2.3.4/api/scala/index.html#org.apache.spark.mllib.feature.Word2VecModel

var model = Word2VecModel.load(spark.sparkContext, config.model_path)

model read:
Link: http://spark.apache.org/docs/2.3.4/api/scala/index.html#org.apache.spark.mllib.feature.Word2VecModel$

var model = Word2VecModel.load(spark.sparkContext, config.model_path)

Read Error:

Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1337)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
	at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1372)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1371)
	at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:129)
	at org.apache.spark.mllib.feature.Word2VecModel$.load(Word2Vec.scala:699)
	at job.ml.embeddingModel.graphEmbedding$.run(graphEmbedding.scala:40)
	at job.ml.embeddingModel.graphEmbedding$.main(graphEmbedding.scala:24)
	at job.ml.embeddingModel.graphEmbedding.main(graphEmbedding.scala)
	

POM file add

    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>15.0</version>
    </dependency>

Run OK again!

[Solved] java.lang.noclassdeffounderror when idea runs Flink: org/Apache/flick/API/common/executionconfig

Solution:
change provided to compile ,for example:

		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-java</artifactId>
			<version>${flink.version}</version>
			<scope>compile</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-streaming-java_${scala.binary.version}</artifactId>
			<version>${flink.version}</version>
			<scope>compile</scope>
		</dependency>
		<dependency>
			<groupId>org.apache.flink</groupId>
			<artifactId>flink-clients_${scala.binary.version}</artifactId>
			<version>${flink.version}</version>
			<scope>compile</scope>
		</dependency>

Click POM. XML refresh in idea to refresh