Tag Archives: Scala

How to Solve Error in importing scala word2vecmodel

import org.apache.spark.mllib.feature.{Word2Vec, Word2VecModel}
import org.apache.spark.sql.{DataFrame, Row, SparkSession}
model save:
Link: http://spark.apache.org/docs/2.3.4/api/scala/index.html#org.apache.spark.mllib.feature.Word2VecModel

var model = Word2VecModel.load(spark.sparkContext, config.model_path)

model read:
Link: http://spark.apache.org/docs/2.3.4/api/scala/index.html#org.apache.spark.mllib.feature.Word2VecModel$

var model = Word2VecModel.load(spark.sparkContext, config.model_path)

Read Error:

Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapred.FileInputFormat
	at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:312)
	at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:200)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:253)
	at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:251)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.rdd.RDD.partitions(RDD.scala:251)
	at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1337)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.take(RDD.scala:1331)
	at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1372)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
	at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
	at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
	at org.apache.spark.rdd.RDD.first(RDD.scala:1371)
	at org.apache.spark.mllib.util.Loader$.loadMetadata(modelSaveLoad.scala:129)
	at org.apache.spark.mllib.feature.Word2VecModel$.load(Word2Vec.scala:699)
	at job.ml.embeddingModel.graphEmbedding$.run(graphEmbedding.scala:40)
	at job.ml.embeddingModel.graphEmbedding$.main(graphEmbedding.scala:24)
	at job.ml.embeddingModel.graphEmbedding.main(graphEmbedding.scala)
	

POM file add

    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>15.0</version>
    </dependency>

Run OK again!

Scala Flink Watermarker Error: Static Methods in interface require -target:jvm-1.8

explain

Gradle project + Scala 2.11 + java8 + Flink 1.12

Error code:

kafkaSource.assignTimestampsAndWatermarks(WatermarkStrategy
      .forBoundedOutOfOrderness[JSONObject](Duration.ofSeconds(10)))

Error message: static methods in interface require – target: jvm-1.8

From the perspective of error reporting, it is an error reported by Scala trying to call a static method in the java interface. However, JDK1.8 is used for compiling and packaging in the idea configuration

Modifying the following idea configuration is not valid:

So think about another breakthrough and add the following configuration to the build.gradle file to solve the problem.

project.tasks.compileScala.scalaCompileOptions.additionalParameters = ["-target:jvm-1.8"]
project.tasks.compileTestScala.scalaCompileOptions.additionalParameters = ["-target:jvm-1.8"]

How to Solve Spark Writes Hudi Error

Error message

java.lang.UnsatisfiedLinkError: org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Ljava/lang/String;)Lorg/apache/hadoop/io/nativeio/NativeIO$POSIX$Stat;
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.stat(Native Method)
        at org.apache.hadoop.io.nativeio.NativeIO$POSIX.getStat(NativeIO.java:456)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfoByNativeIO(RawLocalFileSystem.java:821)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.loadPermissionInfo(RawLocalFileSystem.java:735)
        at org.apache.hadoop.fs.RawLocalFileSystem$DeprecatedRawLocalFileStatus.getPermission(RawLocalFileSystem.java:703)
        at org.apache.hadoop.fs.LocatedFileStatus.<init>(LocatedFileStatus.java:52)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2091)
        at org.apache.hadoop.fs.FileSystem$4.next(FileSystem.java:2071)
        at org.apache.hadoop.fs.FileSystem$5.hasNext(FileSystem.java:2190)
        at org.apache.hudi.table.marker.DirectWriteMarkers.lambda$createdAndMergedDataPaths$69cdea3b$1(DirectWriteMarkers.java:138)
        at org.apache.hudi.client.common.HoodieSparkEngineContext.lambda$flatMap$7d470b86$1(HoodieSparkEngineContext.java:78)
        at org.apache.spark.api.java.JavaRDDLike.$anonfun$flatMap$1(JavaRDDLike.scala:125)
        at scala.collection.Iterator$$anon$11.nextCur(Iterator.scala:484)
        at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:490)
        at scala.collection.Iterator.foreach(Iterator.scala:941)
        at scala.collection.Iterator.foreach$(Iterator.scala:941)
        at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
        at scala.collection.generic.Growable.$plus$plus$eq(Growable.scala:62)
        at scala.collection.generic.Growable.$plus$plus$eq$(Growable.scala:53)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:105)
        at scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:49)
        at scala.collection.TraversableOnce.to(TraversableOnce.scala:315)
        at scala.collection.TraversableOnce.to$(TraversableOnce.scala:313)
        at scala.collection.AbstractIterator.to(Iterator.scala:1429)
        at scala.collection.TraversableOnce.toBuffer(TraversableOnce.scala:307)
        at scala.collection.TraversableOnce.toBuffer$(TraversableOnce.scala:307)
        at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1429)
        at scala.collection.TraversableOnce.toArray(TraversableOnce.scala:294)
        at scala.collection.TraversableOnce.toArray$(TraversableOnce.scala:288)
        at scala.collection.AbstractIterator.toArray(Iterator.scala:1429)
        at org.apache.spark.rdd.RDD.$anonfun$collect$2(RDD.scala:1030)
        at org.apache.spark.SparkContext.$anonfun$runJob$5(SparkContext.scala:2236)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
        at org.apache.spark.scheduler.Task.run(Task.scala:131)
        at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:497)
        at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:500)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748)

Error reporting reason

Because the Hadoop version loaded at runtime is version 3.2, while the Hadoop version in the cluster is version 3.0.0-cdh6.3.2
this code is different for two different versions

Solution:

Download the spark source code, compile spark by yourself, and change the built-in Hadoop to version 3.0.0-cdh6.3.2

Solution to the error cannot resolve symbol reported by build.sbt

Solution to the error cannot resolve symbol reported by build.sbt

1. Background 2. Error reporting 3. Solution

1. Background

Idea version 2017.2, development language Scala, when re importing the SBT project, build.sbt became popular and cannot resolve symbol

2. Error reporting

As shown below

3. Solutions

3.1 delete the. Idea folder

3.2 using file – & gt; Invalidcaches/restart restart


3.3 after restarting, wait for the idea to regenerate the. Idea folder and solve the error!

[Solved] Scala signature package has wrong version expected: 5.0 found: 5.2 in package.class

XXX.jar broken
scala signature package has wrong version expected: 5.0 found: 5.2 in package.class

Solution:
add the dependency of the corresponding SDK in pom.xml, and the version must correspond to
if it is Scala, you must find the library of the corresponding version of scala SDK
if it says that Scala 2.12 and 2.13 are compatible, it’s better to choose only 2.12

Error:scalac: Scala compiler JARs not found [How to Solve]

Error:scalac: Scala compiler JARs not found (module 'SparkSql'): C:\Users\***\.m2\repository\org\scala-lang\scala-compiler\2.11.8\scala-compiler-2.11.8.jar

The reason for the problem is that Scala libraries are not added to the project.

To add scala libraries Idea File -> Project Structrue -> Libraries can be added.

Yarn: runtime.ContainerExecutionException : launch container failed

introduction:

After the spark submit submits the task, the code of the dirver side is executed normally, but the program gets stuck in the exciter stage and frequently reports errors until the task fails

 

location:

The log failed location prints a lot of warning:

The initial job did not accept any resources. Please check the cluster UI to make sure that the worker process is registered and has enough resources. The initial analysis is about resources. Then yarn logs pull down the logs to see:

The initial heap size of the JVM exceeds the maximum heap size. Check the task environment to find out the truth

 

solve:

The initial memory of the JVM – XMS (the minimum heap value of heap memory) requires 13g, but Excutor.memory Only 12g is given, so the above problem appears. Modify the script to keep it stable excutor.mermory =The size of – XMS is OK, the problem is solved~

Tips: generally – XMS – Xmx (the maximum heap value of heap memory) can be set the same.

Oracle recommends setting the minimum heap size (-Xms)equal to the maximum heap size (-Xmx) to minimize garbage collections.

 

Ubuntu: Failed to initialize compiler: object java.lang.Object In compiler mirror not found

haven’t used spark for a long time. Today, log on the Ubuntu system remotely and execute./spark-shell report error “Failed to initialize compiler: object java.lang. object in compiler mirror not found”.

looked at the Java installation version, there is no problem, so check the ~/.bashrc file, found that the Java path is not, rewrote it is OK.