Tag Archives: Data mining

[Solved] Error in Summary.factor ‘sum’ not meaningful for factors

 

Question:

The root cause is the wrong data type.

The factor type has no sum method

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find min value in the vector
sum(factor_vector)

Error in Summary.factor(1:5, na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

Solution:
Convert to numeric values and use the as.numeric function.
mydata$value<-as.numeric(mydata$value)
is.numeric(mydata$value)

#convert factor vector to numeric vector and find the min value
new_vector <- as.numeric(as.character(factor_vector))
sum(new_vector)

#[1] 49

Complete error:

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find min value in the vector
sum(factor_vector)

Error in Summary.factor(1:5, na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

Other (the minimum value can be obtained for numeric value, string and date type)

Numeric value, string and date type can all be maximized. Similarly, the maximum value can be obtained.

numeric_vector <- c(1, 2, 12, 14)
max(numeric_vector)

#[1] 14

character_vector <- c("a", "b", "f")
max(character_vector)

#[1] "f"

date_vector <- as.Date(c("2019-01-01", "2019-03-05", "2019-03-04"))
max(date_vector)

#[1] "2019-03-05"

The R language is called R partly because of the names of the two R authors (Robert gentleman and Ross ihaka) and partly because of the influence of Bell Labs s language (called the dialect of s language).

R language is a mathematical programming language designed for mathematical researchers. It is mainly used for statistical analysis, drawing and data mining.

If you are a beginner of computer programs and are eager to understand the general programming of computers, R language is not an ideal choice. You can choose python, C or Java.

Both R language and C language are the research achievements of Bell Laboratories, but they have different emphasis areas. R language is an explanatory language for mathematical theory researchers, while C language is designed for computer software engineers.

R language is a language for interpretation and operation (different from the compilation and operation of C language). Its execution speed is much slower than that of C language, which is not conducive to optimization. However, it provides more abundant data structure operation at the syntax level and can easily output text and graphic information, so it is widely used in mathematics, especially in statistics

[Solved] Error in Summary.factor ‘max’ not meaningful for factors

Question:

The root cause is the wrong data type.

The factor type has no max method

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find max value in the vector
max(factor_vector)

#Error in Summary.factor(1:5, na.rm = FALSE) : 
#  'max' not meaningful for factors

Solution:

Convert to numeric value or string, here convert to numeric value.

mydata$value<-as.numeric(mydata$value)
is.numeric(mydata$value)

#convert factor vector to numeric vector and find the max value
new_vector <- as.numeric(as.character(factor_vector))
max(new_vector)

#[1] 15

Full error:

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find max value in the vector
max(factor_vector)

#Error in Summary.factor(1:5, na.rm = FALSE) : 
#  'max' not meaningful for factors

 

Other (numeric value, string, date type can be the maximum value)

Numeric value, string and date type can all be the maximum value, and similarly, the minimum value can be obtained.

numeric_vector <- c(1, 2, 12, 14)
max(numeric_vector)

#[1] 14

character_vector <- c("a", "b", "f")
max(character_vector)

#[1] "f"

date_vector <- as.Date(c("2019-01-01", "2019-03-05", "2019-03-04"))
max(date_vector)

#[1] "2019-03-05"

[Solved] Error: ‘attrition‘ is not an exported object from ‘namespace:rsample‘

Error: ‘attrition’ is not an exported object from ‘namespace:rsample’


# Import package and library

# load required packages
library(rsample)
library(dplyr)
library(h2o)
library(DALEX)

# initialize h2o session
h2o.no_progress()
h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         4 hours 30 minutes 
##     H2O cluster timezone:       America/New_York 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.18.0.11 
##     H2O cluster version age:    1 month and 17 days  
##     H2O cluster name:           H2O_started_from_R_bradboehmke_gny210 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   1.01 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
##     R Version:                  R version 3.5.0 (2018-04-23)

#Data preprocessing and processing to h2o format;

Error: ‘attrition’ is not an exported object from ‘namespace:rsample’

#

# classification data
df <- rsample::attrition %>% 
  mutate_if(is.ordered, factor, ordered = FALSE) %>%
  mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))

# convert to h2o object
df.h2o <- as.h2o(df)

# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames = c("train","valid","test"))
names(splits) <- c("train","valid","test")

# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y) 

Solution:

You can use the attrition dataset of DALEX package directly;

Remove resample:

#

# classification data
df <- attrition %>% 
  mutate_if(is.ordered, factor, ordered = FALSE) %>%
  mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))

# convert to h2o object
df.h2o <- as.h2o(df)

# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames = c("train","valid","test"))
names(splits) <- c("train","valid","test")

# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y) 

Full Error Messages:

> # classification data
> df <- rsample::attrition %>% 
+     mutate_if(is.ordered, factor, ordered = FALSE) %>%
+     mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))
Error: 'attrition' is not an exported object from 'namespace:rsample'
>

How to Solve no module fcntl Error in Windows

This library is specific to Linux, if you use this library in Windows system development, it will generate some errors, how to solve it?
Create a new fcntl.py file

DN_ACCESS = 1
DN_ATTRIB = 32
DN_CREATE = 4
DN_DELETE = 8
DN_MODIFY = 2
DN_MULTISHOT = 2147483648
DN_RENAME = 16

FASYNC = 8192

FD_CLOEXEC = 1

F_ADD_SEALS = 1033

F_DUPFD = 0

F_DUPFD_CLOEXEC = 1030

F_EXLCK = 4
F_GETFD = 1
F_GETFL = 3
F_GETLEASE = 1025
F_GETLK = 5
F_GETLK64 = 5
F_GETOWN = 9
F_GETSIG = 11

F_GET_SEALS = 1034

F_NOTIFY = 1026
F_RDLCK = 0

F_SEAL_GROW = 4
F_SEAL_SEAL = 1
F_SEAL_SHRINK = 2
F_SEAL_WRITE = 8

F_SETFD = 2
F_SETFL = 4
F_SETLEASE = 1024
F_SETLK = 6
F_SETLK64 = 6
F_SETLKW = 7
F_SETLKW64 = 7
F_SETOWN = 8
F_SETSIG = 10
F_SHLCK = 8
F_UNLCK = 2
F_WRLCK = 1

LOCK_EX = 2
LOCK_MAND = 32
LOCK_NB = 4
LOCK_READ = 64
LOCK_RW = 192
LOCK_SH = 1
LOCK_UN = 8
LOCK_WRITE = 128

def fcntl(fd, op, arg=0):
    return 0


def ioctl(fd, op, arg=0, mutable_flag=True):
    if mutable_flag:
        return 0
    else:
        return ""


def flock(fd, op):
    return


def lockf(fd, operation, length=0, start=0, whence=0):
    return

Just find your Python path and put it in

C:\Users\{Username}\AppData\Local\Programs\Python\Python38\Lib\fcntl.py

Done!

[Solved] R Language Error: Discrete value supplied to continuous scale

Error: discrete value supplied to continuous scale

#Simulation data

set.seed(123)
my_df1 <- data.frame(a=rnorm(100), b=runif(100), c=rep(1:10, 10))
my_df2 <- data.frame(a=rnorm(100), b=runif(100), c=factor(rep(LETTERS[1:5], 20)))

#Error: Discrete value supplied to continuous scale

ggplot() +
  geom_point(data=my_df1, aes(x=a, y=b, color=c)) +
  geom_polygon(data=my_df2, aes(x=a, y=b, color=c), alpha=0.5)

#Solution:

Use the fill parameter;

ggplot() +
  geom_point(data=my_df1, aes(x=a, y=b, color=c)) +
  geom_polygon(data=my_df2, aes(x=a, y=b, fill=c), alpha=0.5)

[Solved] Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), v

Error Messages:
> library(ggplot2)
Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):

Loaded namespace ‘ellipsis’ 0.3.1, but what is needed is >= 0.3.2

Solution:
In Rstudio, find the packages module, remove the package that reported the error and then re-install it using install.packages for the corresponding installation.

[Solved] Python Pandas Read Error: OSError: initializing from file failed

Problem Description:

error when loading CSV format data in pandas

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On Data Analysis/Unit 1 Project Collection/train.csv")
B.head(3)

report errors:

OSError: Initializing from file failed

Cause analysis:

When calling the read_csv() method of pandas, the C engine is used as the parser engine by default, and when the file name contains Chinese, using the C engine will be wrong in some cases.


Solution:

Specify the engine as Python when calling the read_csv() method

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On-Data-Analysis/Unit-1-Project-Collection/train.csv",engine='python')
B.head(3)

Error: Discrete value supplied to continuous scale [How to Solve]

 

#Simulation data

df <- structure(list(`10` = c(0, 0, 0, 0, 0, 0), `33.95` = c(0, 0, 
0, 0, 0, 0), `58.66` = c(0, 0, 0, 0, 0, 0), `84.42` = c(0, 0, 
0, 0, 0, 0), `110.21` = c(0, 0, 0, 0, 0, 0), `134.16` = c(0, 
0, 0, 0, 0, 0), `164.69` = c(0, 0, 0, 0, 0, 0), `199.1` = c(0, 
0, 0, 0, 0, 0), `234.35` = c(0, 0, 0, 0, 0, 0), `257.19` = c(0, 
0, 0, 0, 0, 0), `361.84` = c(0, 0, 0, 0, 0, 0), `432.74` = c(0, 
0, 0, 0, 0, 0), `506.34` = c(1, 0, 0, 0, 0, 0), `581.46` = c(0, 
0, 0, 0, 0, 0), `651.71` = c(0, 0, 0, 0, 0, 0), `732.59` = c(0, 
0, 0, 0, 0, 1), `817.56` = c(0, 0, 0, 1, 0, 0), `896.24` = c(0, 
0, 0, 0, 0, 0), `971.77` = c(0, 1, 1, 1, 0, 1), `1038.91` = c(0, 
0, 0, 0, 0, 0), MW = c(3.9, 6.4, 7.4, 8.1, 9, 9.4)), .Names = c("10", 
"33.95", "58.66", "84.42", "110.21", "134.16", "164.69", "199.1", 
"234.35", "257.19", "361.84", "432.74", "506.34", "581.46", "651.71", 
"732.59", "817.56", "896.24", "971.77", "1038.91", "MW"), row.names = c("Merc", 
"Peug", "Fera", "Fiat", "Opel", "Volv"
), class = "data.frame")


df

Question:

library(reshape)

## Plotting
meltDF = melt(df, id.vars = 'MW')
ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
  scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
  scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Solution:

After the meltdf variable is defined, the factor variable can be transformed into numerical white energy;

If x is a numeric value, add scale_x_continual(); If x is a character/factor, add scale_x_discreate().

meltDF$variable=as.numeric(levels(meltDF$variable))[meltDF$variable]


ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y =   variable)) +
     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Full Error Messages:
> library(reshape)
>
> ## Plotting
> meltDF = melt(df, id.vars = ‘MW’)
> ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
+     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
+     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))
Error: Discrete value supplied to continuous scale
>

[Solved] Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]

Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]


Problem:
Due to the incompatibility between Scala type inference and Java type inference;

import java.util
import java.util.stream.Collectors
class Animal
class Dog extends Animal
class Cat extends Animal

object ObjectConversions extends App {

  import java.util.{List => JList}
  implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())
  val a= new util.ArrayList[Animal]()
  a.add(new Cat)
  convertLowerBound[Cat](a)
}

Solution:

When calling Java methods and still want to infer generic types, you need to pass generic types specifically

#Or the implicit conversion statement is not imported
import scala.collection.JavaConversions._

def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())


#or

def convertLowerBound[B <: Animal : TypeTag] (a: JList[Animal]) = a.asInstanceOf[JList[B]]

scala> def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())
convertLowerBound: [B <: Animal](a: java.util.List[Animal])java.util.List[B]
scala> convertLowerBound[Cat](a)
res30: java.util.List[Cat] = [Cat@6325af19, Dog@6ff6743f]
scala> a.add(new Cat())
res16: Boolean = true
scala> convertLowerBound[Cat](a)
res17: java.util.List[Cat] = [Cat@6325af19]
scala> a.add(new Dog())
res19: Boolean = true
scala> convertLowerBound[Cat](a)
res20: java.util.List[Cat] = [Cat@6325af19, Dog@6ff6743f]

完整错误:
<console>:15: error: type mismatch;
found   : java.util.List[?0]
required: java.util.List[B]
Note: ?0 >: B, but Java-defined trait List is invariant in type E.
You may wish to investigate a wildcard type such as `_ >: B`. (SLS 3.2.10)
implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())

[Solved] ParserError: NULL byte detected. This byte cannot be processed in Python‘s native csv library

ParserError: NULL byte detected. This byte cannot be processed in Python’s native csv library at the moment, so please pass in engine=’c’ instead



Error:

file_name = os.listdir(base_dir)[0]

col_list = [feature list]
col = col_list
#encoding
#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
    
data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='python')


#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')

path = "D:\\test\\repo\\data.csv"

Solution:

engine =’c’

file_name = os.listdir(base_dir)[0]

#encoding
#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
    
data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='c')


#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')

path = "D:\\test\\repo\\data.csv"

Full Error Messages:
—————————————————————————

Error                                     Traceback (most recent call last)
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_iter_line(self, row_num)
2967             assert self.data is not None
-> 2968             return next(self.data)
2969         except csv.Error as e:
Error: line contains NULL byte
During handling of the above exception, another exception occurred:
ParserError                               Traceback (most recent call last)
<ipython-input-12-c5d0c651c50e> in <module>
85                    ]
86
---> 87     data = inference_process(data_dir)
88     #print(data.head())
89     f=open("break_model1.pkl",'rb')
<ipython-input-12-c5d0c651c50e> in inference_process(base_dir)
18     #encoding
19 #     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
---> 20     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='python')
21 #     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')
22
D:\anaconda\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
608     kwds.update(kwds_defaults)
609
--> 610     return _read(filepath_or_buffer, kwds)
611
612
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
460
461     # Create the parser.
--> 462     parser = TextFileReader(filepath_or_buffer, **kwds)
463
464     if chunksize or iterator:
D:\anaconda\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
817             self.options["has_index_names"] = kwds["has_index_names"]
818
--> 819         self._engine = self._make_engine(self.engine)
820
821     def close(self):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
1048             )
1049         # error: Too many arguments for "ParserBase"
-> 1050         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
1051
1052     def _failover_to_python(self):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in __init__(self, f, **kwds)
2308                 self.num_original_columns,
2309                 self.unnamed_cols,
-> 2310             ) = self._infer_columns()
2311         except (TypeError, ValueError):
2312             self.close()
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _infer_columns(self)
2615             for level, hr in enumerate(header):
2616                 try:
-> 2617                     line = self._buffered_line()
2618
2619                     while self.line_pos <= hr:
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _buffered_line(self)
2809             return self.buf[0]
2810         else:
-> 2811             return self._next_line()
2812
2813     def _check_for_bom(self, first_row):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_line(self)
2906
2907             while True:
-> 2908                 orig_line = self._next_iter_line(row_num=self.pos + 1)
2909                 self.pos += 1
2910
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_iter_line(self, row_num)
2989                     msg += ". " + reason
2990
-> 2991                 self._alert_malformed(msg, row_num)
2992             return None
2993
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _alert_malformed(self, msg, row_num)
2946         """
2947         if self.error_bad_lines:
-> 2948             raise ParserError(msg)
2949         elif self.warn_bad_lines:
2950             base = f"Skipping line {row_num}: "
ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instea

Error: could not find function … in R [How to Solve]

Error: could not find function … in R

Question:

solve:

Full error:


Question:

> mytest.ax(lable,prediction)
Error in mytest.ax(lable, prediction) :
could not find function “mytest.ax”

Solution:

First, is the function name written correctly?R language function names are case sensitive.

Second, is the package containing the function installed?install.packages(“package_name”)

Third,

require(package_name)

library(package)

Require (package_name) (and check its return value) or library (package) (this should be done every time you start a new R session)

Fourth, are you using an old r version that does not yet exist?Or the version of R package; Or after the version is updated, some functions are removed from the original package;

Fifth, functions are added and removed over time, and the referenced code may expect an updated or older version than the package you installed. Or it’s too new. Cran doesn’t contain the latest version;

Full error:

> mytest.ax(lable,prediction)
Error in mytest.ax(lable, prediction) :
could not find function "mytest.ax"