Tag Archives: Big data

An error is reported when kettle connects Oracle database and MySQL database

1. An error is reported when kettle connects to Oracle database

When kettle connects to Oracle database for the first time, it reports an error “wrong connection to database [Oracle]  ”, This is because spoon lacks the jar package corresponding to Oracle. If Oracle or Oracle client is installed, you can copy the corresponding jar file from their installation directory to solve the error problem.

Solution:

Copy and paste all files starting with ojdbc in Oracle database, such as ojdbc5 and ojdbc6.jar, into the spoon directory, such as D: \ kettle \ data-integration-8.2.0.7-719 \ lib\   Just under the path.

2. An error is reported when kettle connects to MySQL database

The error contents are as follows:

  This is caused by the missing jar connection package of MySQL.

Solution:

You need the jar connection package of MySQL database, such as mysql-connector-java-8.0.11.jar, copy and paste it into the spoon directory, such as D: \ kettle \ data-integration-8.2.0.7-719 \ lib\   Just under the path.

The attachment download link is as follows:

Kettle driver. TXT other document resources CSDN Download

Error in loading online pictures on billboard in cesium tainted canvas may not be loaded

There is no problem loading the pictures in the project. An error will be reported when loading the online pictures

resolvent:

image.setAttribute(‘crossOrigin’, ‘anonymous’);  

Reason: when canvas is drawn, online pictures involve cross domain problems.

 

Of course, while setting the front end, you also need to set it in the background to allow cross domain access to pictures.

 

 

How to Solve creating bean with name ‘mappingjackson2httpmessageconverter’ error when elasticsearch advanced client starts‘

When learning elasticsearch, write the code to ehcahce index. When starting the project, an error is reported in the error creating bean with name ‘mappingjackson2httpmessageconverter’.

So I looked for the jar package in lib and found that Jackson’s package was introduced. However, no dependency is introduced into the POM file. I checked the related dependencies and found that they were introduced from the high-level client

It seems to be a necessary package for elastic advanced client. To test this conjecture, I deleted all the jar packages of Jackson and started it. The project did start successfully, but an error occurred when sending the index request, that is, in this line of code indexrequest.Source (JSON).

It seems that it is a necessary package. But I didn’t find the right answer on the Internet. To solve this problem, I manually introduced Jackson into

Then delete all the jar packages that do not belong to my imported Jackson.
the red box part is the jar package I manually imported, and the blue part is the jar package automatically imported by the advanced client. Manually delete the jar package in blue. The measured items can be started normally and the index request can be sent

However, there is a fatal problem, that is, if reload maven, the jar package in the blue part will be automatically introduced. To solve this problem, we must first find out which jar packages are introduced into our POM file, and then ignore these jar packages using the exclusion method. Finally, my POM file is shown below


    <dependencies>
        <dependency>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-starter-web</artifactId>
            <version>2.3.12.RELEASE</version>
            <exclusions>
                <exclusion>
                    <groupId>com.fasterxml.jackson.datatype</groupId>
                    <artifactId>jackson-datatype-jdk8</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.datatype</groupId>
                    <artifactId>jackson-datatype-jsr310</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <dependency>
            <groupId>org.elasticsearch.client</groupId>
            <artifactId>elasticsearch-rest-high-level-client</artifactId>
            <version>7.2.1</version>
            <exclusions>
                <exclusion>
                    <groupId>com.fasterxml</groupId>
                    <artifactId>jackson-xml-databind</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.core</groupId>
                    <artifactId>jackson-core</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.dataformat</groupId>
                    <artifactId>jackson-dataformat-smile</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.dataformat</groupId>
                    <artifactId>jackson-dataformat-yaml</artifactId>
                </exclusion>
                <exclusion>
                    <groupId>com.fasterxml.jackson.dataformat</groupId>
                    <artifactId>jackson-dataformat-cbor</artifactId>
                </exclusion>
            </exclusions>
        </dependency>

        <!-- json包 -->
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-databind</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-core</artifactId>
            <version>${jackson.version}</version>
        </dependency>
        <dependency>
            <groupId>com.fasterxml.jackson.core</groupId>
            <artifactId>jackson-annotations</artifactId>
            <version>${jackson.version}</version>
        </dependency>
    </dependencies>

/usr/bin/ssh-copy-id: ERROR [How to Solve]

When building Hadoop, the domain name is accidentally wrong
which leads to the password input error when configuring password-free login

after modifying the domain name again, the error is always reported in the configuration

Solution:
Modify the wrong domain name in the known_hosts file in the ~/.ssh directory and delete
it directly and execute it again. Command ssh-copy-id slave2 and
enter the password.

Install additional stage package for streamsets – error in cdh6.3.0 package rest API call error: java.io.eofexception

Version
streamsets3.16.1 (core)
cdh6.3.2

1、 Question

The name of the streamsets installation package is streamsets-datacollector-core-3.16.1. Tgz , and an error is reported when downloading the package of cdh6.3 after installation

1 operation

Install the package of cdh6.3.0 through the streamsets UI and report an error

click Show error

2. Complete error reporting contents

java.io.EOFException
	at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:303)
	at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:608)
	at java.io.InputStream.read(InputStream.java:101)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)
	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1744)
	at com.streamsets.datacollector.restapi.StageLibraryResource.installLibraries(StageLibraryResource.java:363)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
	at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:160)
	at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
	at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
	at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
	at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
	at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
	at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
	at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
	at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
	at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
	at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
	at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:760)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1617)
	at com.streamsets.datacollector.http.GroupsInScopeFilter.lambda$doFilter$0(GroupsInScopeFilter.java:82)
	at com.streamsets.datacollector.security.GroupsInScope.execute(GroupsInScope.java:34)
	at com.streamsets.datacollector.http.GroupsInScopeFilter.doFilter(GroupsInScopeFilter.java:81)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at com.streamsets.datacollector.restapi.rbean.rest.RestResourceContextFilter.doFilter(RestResourceContextFilter.java:42)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:310)
	at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:264)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at com.streamsets.datacollector.http.LocaleDetectorFilter.doFilter(LocaleDetectorFilter.java:39)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at org.eclipse.jetty.servlets.HeaderFilter.doFilter(HeaderFilter.java:117)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at com.streamsets.pipeline.http.MDCFilter.doFilter(MDCFilter.java:47)
	at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1604)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:501)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1592)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1296)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1562)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1211)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:322)
	at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:146)
	at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:221)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:500)
	at com.streamsets.lib.security.http.LimitedMethodServer.handle(LimitedMethodServer.java:41)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:386)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:562)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:378)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:270)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
	at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:388)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
	at java.lang.Thread.run(Thread.java:748)

3 sdc.log key error

2021-11-02 11:02:04,278 [user:admin] [pipeline:] [runner:] [thread:webserver-48] [stage:] INFO  StageLibraryResource - Installing stage library streamsets-datacollector-cdh_6_3-lib from http://archives.streamsets.com/datacollector/3.16.1/tarball/streamsets-datacollector-cdh_6_3-lib-3.16.1.tgz
2021-11-02 11:21:13,324 [user:admin] [pipeline:] [runner:] [thread:webserver-48] [stage:] ERROR ExceptionToHttpErrorProvider - REST API call error: java.io.EOFException
java.io.EOFException
        at org.apache.commons.compress.compressors.gzip.GzipCompressorInputStream.read(GzipCompressorInputStream.java:303)
        at org.apache.commons.compress.archivers.tar.TarArchiveInputStream.read(TarArchiveInputStream.java:608)
        at java.io.InputStream.read(InputStream.java:101)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1792)
        at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1769)

2、 Positioning

1 idea a

According to the log, it is from http://archives.streamsets.com/datacollector/3.16.1/tarball/streamsets-datacollector-cdh_ 6_ 3-lib-3.16.1. Tgz Download streamsets DataCollector CDH_ 6_ 3-lib
try the browser to visit this address and find that it can be downloaded. Try to manually download the package and put it in the target directory

SSH connects to the server and finds that the network is very slow. Try downloading this package on a PC. after decompression, you find streamsets-datacollector-3.16.1 \ streamsets LIBS \ streamsets DataCollector CDH_ 6_ There are jar packages in the 3-lib \ lib directory. Put these in the target directory of streamsets download streamsets-datacollector-3.16.1/streamsets-libs/streamsets-datacollector-cdh_ 6_ When looking at the installed packages in 3-lib/lib , it is found that there is still no

this method is not very good. It may be that some other configurations will be written in addition to the jar package during the installation process, which we can’t do manually. Change your mind

2 idea B (feasible)

The installation package of this version is the core version. Consider reinstalling the full version. Download streamsets-datacollector-all-3.22.3. Tgz the installation steps refer to the official documents

You can see that the required package is already installed after reinstallation

reference material

Official documents

An error occurs when HBase uses the shell command: pleaseholdexception: Master is initializing solution

Article catalog

Project scenario: Problem Description: Cause Analysis: solution:


Project scenario:

Ubuntu20.04Hadoop3.2.2Hbase2.2.2


Problem Description:

The main errors are as follows: error: org.apache.hadoop.hbase.pleaseholdexception: Master is initializing

After starting the HBase shell, when using create, list and other commands, the following error messages appear:

hbase(main):001:0> list
TABLE 
                                                                                                                    
ERROR: org.apache.hadoop.hbase.PleaseHoldException: Master is initializing
        at org.apache.hadoop.hbase.master.HMaster.checkInitialized(HMaster.java:2452)
        at org.apache.hadoop.hbase.master.MasterRpcServices.getTableNames(MasterRpcServices.java:915)
        at org.apache.hadoop.hbase.protobuf.generated.MasterProtos$MasterService$2.callBlockingMethod(MasterProtos.java:58517)
        at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2339)
        at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:123)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:188)
        at org.apache.hadoop.hbase.ipc.RpcExecutor$Handler.run(RpcExecutor.java:168)

For usage try 'help "list"'

Took 10.297 seconds

Cause analysis:

Here, my computer is only configured with HBase application for Hadoop pseudo distributed cluster, so I don’t think it’s possible that the time of HBase and zookeeper servers is inconsistent, as others on the Internet say. The main reason should be: the processes of Hadoop and HBase are inconsistent, resulting in the initialization of the master node all the time


Solution:

Format the HBase file system in Hadoop, restart HBase, and resynchronize the two:

Shut down all HBase services first:

cd /usr/local/hbase
bin/stop-hbase.sh

Then close all Hadoop services:

cd /usr/local/hadoop
sbin/stop-all.sh

Enter JPS to ensure that all Hadoop and HBase processes are closed:

zq@fzqs-Laptop:~$ jps
4673 Jps

Then start the Hadoop service:

cd /usr/local/hadoop
sbin/start-all.sh

To view files in HDFS:

bin/hdfs dfs -ls /

The output shall be as follows (including/HBase):

zq@fzqs-Laptop:/usr/local/hadoop$ bin/hdfs dfs -ls /
Found 1 items
drwxr-xr-x		- root supergroup 		0 2021-10-28 21:49 /hbase

Delete/HBase Directory:

bin/hdfs dfs -rm -r /hbase

Start HBase service:

cd /usr/local/hbase
bin/start-hbase.sh

Then start the shell and you should be able to use it:

bin/hbase shell

[Solved] ERROR PythonRunner: Python worker exited unexpectedly (crashed)

Some time ago, I received a private letter from my fans and reported an error when running in pychart. Error Python runner: Python worker exited unexpectedly (crashed)

The test run print (input_rdd. First()) can be printed, but the print (input_rdd. Count()) trigger function will report an error

print(input_rdd.count())

Error Python runner: Python worker exited unexpectedly (crashed) means Python worker exited unexpectedly (crashed)

21/10/24 10:24:48 ERROR PythonRunner: Python worker exited unexpectedly (crashed)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR PythonRunner: This may have been caused by a prior exception:
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)
21/10/24 10:24:48 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

21/10/24 10:24:48 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
Traceback (most recent call last):
  File "D:/Code/pycode/exercise/pyspark-study/pyspark-learning/pyspark-day04/main/01_web_analysis.py", line 28, in <module>
    print(input_rdd.first())
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1586, in first
    rs = self.take(1)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\rdd.py", line 1566, in take
    res = self.context.runJob(self, takeUpToNumLeft, p)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\pyspark\context.py", line 1233, in runJob
    sock_info = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, partitions)
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\java_gateway.py", line 1304, in __call__
    return_value = get_return_value(
  File "D:\opt\Anaconda3-2020.11\lib\site-packages\py4j\protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.runJob.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (LAPTOP-RK2V2UMB executor driver): java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)

Driver stacktrace:
	at org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2258)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2207)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2206)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2206)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1079)
	at scala.Option.foreach(Option.scala:407)
	at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1079)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2445)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2387)
	at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2376)
	at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
	at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:868)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2196)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2217)
	at org.apache.spark.SparkContext.runJob(SparkContext.scala:2236)
	at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:166)
	at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
	at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)
	at py4j.Gateway.invoke(Gateway.java:282)
	at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
	at py4j.commands.CallCommand.execute(CallCommand.java:79)
	at py4j.GatewayConnection.run(GatewayConnection.java:238)
	at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.SocketException: Connection reset by peer: socket write error
	at java.net.SocketOutputStream.socketWrite0(Native Method)
	at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:111)
	at java.net.SocketOutputStream.write(SocketOutputStream.java:155)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:95)
	at java.io.DataOutputStream.writeInt(DataOutputStream.java:199)
	at org.apache.spark.api.python.PythonRDD$.writeUTF(PythonRDD.scala:476)
	at org.apache.spark.api.python.PythonRDD$.write$1(PythonRDD.scala:297)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRDD$.$anonfun$writeIteratorToStream$1$adapted(PythonRDD.scala:307)
	at scala.collection.Iterator.foreach(Iterator.scala:941)
	at scala.collection.Iterator.foreach$(Iterator.scala:941)
	at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
	at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:307)
	at org.apache.spark.api.python.PythonRunner$$anon$2.writeIteratorToStream(PythonRunner.scala:621)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.$anonfun$run$1(PythonRunner.scala:397)
	at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1996)
	at org.apache.spark.api.python.BasePythonRunner$WriterThread.run(PythonRunner.scala:232)


Process finished with exit code 1

For the solution to this problem, Xiaobian inquired online. This problem may be caused by many situations. For the current situation that Xiaobian helps solve, spark running locally on Windows system is a software problem. The amount of data is a little large, and errors may be reported when running on pycharm.

Without much nonsense, let’s talk about the solution to the problem of fans. It’s very simple. After pycharm is closed, open it again and run it again. Note that if not, shut down again and run again.

hdfs-bug:DataXceiver error processing WRITE_BLOCK operation

The error message and screenshot are as follows:

calculation112.aggrx:50010:DataXceiver error processing WRITE_BLOCK operation  src: /10.1.1.116:36274 dst: /10.1.1.112:50010
java.io.IOException: Premature EOF from inputStream
    at org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:203)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:213)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:109)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receivePacket(BlockReceiver.java:501)
    at org.apache.hadoop.hdfs.server.datanode.BlockReceiver.receiveBlock(BlockReceiver.java:901)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:808)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:169)
    at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:106)
    at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:246)
    at java.lang.Thread.run(Thread.java:748)
  ......

Reason: the file operation exceeds the lease term, that is, the file is deleted during the data stream operation

Scheme:

Step 1: modify the maximum number of files opened in the process

cat /etc/security/limits.conf  | grep -v ^#


*        soft    nofile        1000000
*         hard    nofile        1048576
*        soft    nproc        65536
*        hard    nproc        unlimited
*        soft    memlock        unlimited
*        hard    memlock        unlimited
*         -      nofile          1000000

Step 2: (modify the number of data transmission threads)

Done.

 

org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph

An error is reported when the Flink SQL client submits the job:

2021-10-21 15:23:54,232 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - No path for the flink jar passed. Using the location of class org.apache.flink.yarn.YarnClusterDescriptor to locate the jar
2021-10-21 15:23:54,233 WARN  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Neither the HADOOP_CONF_DIR nor the YARN_CONF_DIR environment variable is set.The Flink YARN Client needs one of these to be set to properly load the Hadoop configuration for accessing YARN.
2021-10-21 15:23:54,291 INFO  org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider [] - Failing over to rm2
2021-10-21 15:23:54,337 INFO  org.apache.flink.yarn.YarnClusterDescriptor                  [] - Found Web Interface n101:36989 of application 'application_1634635118307_0001'.
[ERROR] Could not execute SQL statement. Reason:
org.apache.flink.runtime.client.JobSubmissionException: Failed to submit JobGraph.

Failing over to rm2

Only active namenode can be submitted successfully. My active namenode is RM2

Clickhouse error: XXXX.XXXX_local20211009 (8fdb18e9-bb4c-42d8-8fdb-18e9bb4c02d8): auto…

Code: 49, e.displayText() = DB::Exception: Part 20211009_67706_67706_0 is covered by 20211009_67118_67714_12 but should be merged into 20211009_67706_67715_1. This shouldn’t happen often., Stack trace (when copying this message, always include the lines below):
Error Messages:
XXXX.XXXX_local20211009 (8fdb18e9-bb4c-42d8-8fdb-18e9bb4c02d8): auto DB::StorageReplicatedMergeTree::processQueueEntry(ReplicatedMergeTreeQueue::SelectedEntryPtr)::(anonymous class)::operator()(DB::StorageReplicatedMergeTree::LogEntryPtr &) const: Code: 49, e.displayText() = DB::Exception: Part 20211009_67706_67706_0 is covered by 20211009_67118_67714_12 but should be merged into 20211009_67706_67715_1. This shouldn’t happen often., Stack trace (when copying this message, always include the lines below):
Solution 1.
1. Try to delete the local table and the distributed table XXXX.XXXX_local20211009