Tag Archives: Machine learning

[Solved] AttributeError : ‘GridSearchCV‘ object has no attribute ‘grid_scores_‘

The reason is that grid_scores_ has been deleted in version 0.20 of sklearn and replaced by cv_results_.

Method 1 (version 0.20 deleted):
grid_search.grid_scores_
method 2 (applicable to version 0.20):
means = grid_search.cv_results_[‘mean_test_score’]
params = grid_search.cv_results_[‘params’]

[Solved] Python Error: An attempt has been made to start a new process before the current process has finished …

This error usually occurs in Windows systems using multiple processes. For example, execute the following code in pychar:

import torch
import torch.utils.data as Data
import numpy as np
from sklearn.datasets import load_iris

iris_x, irisy = load_iris(return_X_y=True)
print("iris_x.dtype:", iris_x.dtype)
print("irisy:", irisy.dtype)

## transform the training set x into a tensor, and the training set y into a tensor
train_xt = torch.from_numpy(iris_x.astype(np.float32))
train_yt = torch.from_numpy(irisy.astype(np.int64))
print("train_xt.dtype:", train_xt.dtype)
print("train_yt.dtype:", train_yt.dtype)

## After converting the training set into a tensor, use TensorDataset to collate X and Y together
train_data = Data.TensorDataset(train_xt, train_yt)
## Define a data loader to batch the training dataset
train_loader = Data.DataLoader(
    dataset=train_data, ## the dataset to use
    batch_size=10, # # Batch sample size
    shuffle=True, # Break up the data before each iteration
    num_workers=2, # [Note: 2 processes are used here]
)

## Check if the dimensionality of the samples of a batch of the training dataset is correct
for step, (b_x, b_y) in enumerate(train_loader):
    if step > 0:
        break
## Output the dimensions of the training image and the labels, and the data type
print("b_x.shape:", b_x.shape)
print("b_y.shape:", b_y.shape)
print("b_x.dtype:", b_x.dtype)
print("b_y.dtype:", b_y.dtype)


## --------- -The correct result is as follows -------- --

# iris_x.dtype: float64
# irisy: int32
# train_xt.dtype: torch.float32
# train_yt.dtype: torch.int64
# b_x.shape: torch.Size([10, 4])
# b_y.shape: torch.Size([10])
# b_x.dtype: torch.float32
# b_y.dtype: torch.int64

The following errors will be reported. (no error will be reported when running in jupyter notebook under the same environment. I don’t know why…)

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

 

Solution 1:

Remove the statement setting up multiple processes. In this example, comment or delete the following line.

num_workers=2,  # [Note: 2 processes are used here]

Solution 2:

Move the code part of calling multiple processes to [if _name_ = = ‘_main_’:].

if __name__ == '__main__':
    ##  Check if the dimensionality of the samples of a batch of the training dataset is correct
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
        ## Output the dimensions of the training image and the dimensions of the labels, and the data type
    print("b_x.shape:", b_x.shape)
    print("b_y.shape:", b_y.shape)
    print("b_x.dtype:", b_x.dtype)
    print("b_y.dtype:", b_y.dtype)

However, in pychart, the part before [for step, (b_x, b_y) in enumerate (train_loader):] will be executed twice.

## ——————————The result of running in Pycharm is as follows——————————
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
b_x.shape: torch.Size([10, 4])
b_y.shape: torch.Size([10])
b_x.dtype: torch.float32
b_y.dtype: torch.int64

Error: Discrete value supplied to continuous scale [How to Solve]

 

#Simulation data

df <- structure(list(`10` = c(0, 0, 0, 0, 0, 0), `33.95` = c(0, 0, 
0, 0, 0, 0), `58.66` = c(0, 0, 0, 0, 0, 0), `84.42` = c(0, 0, 
0, 0, 0, 0), `110.21` = c(0, 0, 0, 0, 0, 0), `134.16` = c(0, 
0, 0, 0, 0, 0), `164.69` = c(0, 0, 0, 0, 0, 0), `199.1` = c(0, 
0, 0, 0, 0, 0), `234.35` = c(0, 0, 0, 0, 0, 0), `257.19` = c(0, 
0, 0, 0, 0, 0), `361.84` = c(0, 0, 0, 0, 0, 0), `432.74` = c(0, 
0, 0, 0, 0, 0), `506.34` = c(1, 0, 0, 0, 0, 0), `581.46` = c(0, 
0, 0, 0, 0, 0), `651.71` = c(0, 0, 0, 0, 0, 0), `732.59` = c(0, 
0, 0, 0, 0, 1), `817.56` = c(0, 0, 0, 1, 0, 0), `896.24` = c(0, 
0, 0, 0, 0, 0), `971.77` = c(0, 1, 1, 1, 0, 1), `1038.91` = c(0, 
0, 0, 0, 0, 0), MW = c(3.9, 6.4, 7.4, 8.1, 9, 9.4)), .Names = c("10", 
"33.95", "58.66", "84.42", "110.21", "134.16", "164.69", "199.1", 
"234.35", "257.19", "361.84", "432.74", "506.34", "581.46", "651.71", 
"732.59", "817.56", "896.24", "971.77", "1038.91", "MW"), row.names = c("Merc", 
"Peug", "Fera", "Fiat", "Opel", "Volv"
), class = "data.frame")


df

Question:

library(reshape)

## Plotting
meltDF = melt(df, id.vars = 'MW')
ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
  scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
  scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Solution:

After the meltdf variable is defined, the factor variable can be transformed into numerical white energy;

If x is a numeric value, add scale_x_continual(); If x is a character/factor, add scale_x_discreate().

meltDF$variable=as.numeric(levels(meltDF$variable))[meltDF$variable]


ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y =   variable)) +
     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Full Error Messages:
> library(reshape)
>
> ## Plotting
> meltDF = melt(df, id.vars = ‘MW’)
> ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
+     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
+     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))
Error: Discrete value supplied to continuous scale
>

[Solved] Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]

Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]


Problem:
Due to the incompatibility between Scala type inference and Java type inference;

import java.util
import java.util.stream.Collectors
class Animal
class Dog extends Animal
class Cat extends Animal

object ObjectConversions extends App {

  import java.util.{List => JList}
  implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())
  val a= new util.ArrayList[Animal]()
  a.add(new Cat)
  convertLowerBound[Cat](a)
}

Solution:

When calling Java methods and still want to infer generic types, you need to pass generic types specifically

#Or the implicit conversion statement is not imported
import scala.collection.JavaConversions._

def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())


#or

def convertLowerBound[B <: Animal : TypeTag] (a: JList[Animal]) = a.asInstanceOf[JList[B]]

scala> def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())
convertLowerBound: [B <: Animal](a: java.util.List[Animal])java.util.List[B]
scala> convertLowerBound[Cat](a)
res30: java.util.List[Cat] = [Cat@6325af19, Dog@6ff6743f]
scala> a.add(new Cat())
res16: Boolean = true
scala> convertLowerBound[Cat](a)
res17: java.util.List[Cat] = [Cat@6325af19]
scala> a.add(new Dog())
res19: Boolean = true
scala> convertLowerBound[Cat](a)
res20: java.util.List[Cat] = [Cat@6325af19, Dog@6ff6743f]

完整错误:
<console>:15: error: type mismatch;
found   : java.util.List[?0]
required: java.util.List[B]
Note: ?0 >: B, but Java-defined trait List is invariant in type E.
You may wish to investigate a wildcard type such as `_ >: B`. (SLS 3.2.10)
implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())

Error in scikit learn installation of CONDA

Installing scikit learn under win7 will report an error

ERROR conda.core.link:_execute(701): An error occurred while installing package 'defaults::scikit-learn-1.0.1-py37hf11a4ad_0'.
Rolling back transaction: done

LinkError: post-link script failed for package defaults::scikit-learn-1.0.1-py37hf11a4ad_0
location of failed script: E:\PPY\Scripts\.scikit-learn-post-link.bat
==> script messages <==
<None>
==> script output <==
stdout:
stderr:
return code: 1

This should be the problem caused by the source of CONDA. The solution can be installed using the following command

conda install -c anaconda scikit-learn

[Solved] ParserError: NULL byte detected. This byte cannot be processed in Python‘s native csv library

ParserError: NULL byte detected. This byte cannot be processed in Python’s native csv library at the moment, so please pass in engine=’c’ instead



Error:

file_name = os.listdir(base_dir)[0]

col_list = [feature list]
col = col_list
#encoding
#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
    
data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='python')


#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')

path = "D:\\test\\repo\\data.csv"

Solution:

engine =’c’

file_name = os.listdir(base_dir)[0]

#encoding
#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
    
data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='c')


#data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')

path = "D:\\test\\repo\\data.csv"

Full Error Messages:
—————————————————————————

Error                                     Traceback (most recent call last)
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_iter_line(self, row_num)
2967             assert self.data is not None
-> 2968             return next(self.data)
2969         except csv.Error as e:
Error: line contains NULL byte
During handling of the above exception, another exception occurred:
ParserError                               Traceback (most recent call last)
<ipython-input-12-c5d0c651c50e> in <module>
85                    ]
86
---> 87     data = inference_process(data_dir)
88     #print(data.head())
89     f=open("break_model1.pkl",'rb')
<ipython-input-12-c5d0c651c50e> in inference_process(base_dir)
18     #encoding
19 #     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding="GBK",usecols=range(len(col)))
---> 20     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'unicode_escape', engine ='python')
21 #     data = pd.read_csv("D:\\test\\repo\\data.csv",sep = ',',encoding = 'utf-8', engine ='python')
22
D:\anaconda\lib\site-packages\pandas\io\parsers.py in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, dialect, error_bad_lines, warn_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
608     kwds.update(kwds_defaults)
609
--> 610     return _read(filepath_or_buffer, kwds)
611
612
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _read(filepath_or_buffer, kwds)
460
461     # Create the parser.
--> 462     parser = TextFileReader(filepath_or_buffer, **kwds)
463
464     if chunksize or iterator:
D:\anaconda\lib\site-packages\pandas\io\parsers.py in __init__(self, f, engine, **kwds)
817             self.options["has_index_names"] = kwds["has_index_names"]
818
--> 819         self._engine = self._make_engine(self.engine)
820
821     def close(self):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _make_engine(self, engine)
1048             )
1049         # error: Too many arguments for "ParserBase"
-> 1050         return mapping[engine](self.f, **self.options)  # type: ignore[call-arg]
1051
1052     def _failover_to_python(self):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in __init__(self, f, **kwds)
2308                 self.num_original_columns,
2309                 self.unnamed_cols,
-> 2310             ) = self._infer_columns()
2311         except (TypeError, ValueError):
2312             self.close()
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _infer_columns(self)
2615             for level, hr in enumerate(header):
2616                 try:
-> 2617                     line = self._buffered_line()
2618
2619                     while self.line_pos <= hr:
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _buffered_line(self)
2809             return self.buf[0]
2810         else:
-> 2811             return self._next_line()
2812
2813     def _check_for_bom(self, first_row):
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_line(self)
2906
2907             while True:
-> 2908                 orig_line = self._next_iter_line(row_num=self.pos + 1)
2909                 self.pos += 1
2910
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _next_iter_line(self, row_num)
2989                     msg += ". " + reason
2990
-> 2991                 self._alert_malformed(msg, row_num)
2992             return None
2993
D:\anaconda\lib\site-packages\pandas\io\parsers.py in _alert_malformed(self, msg, row_num)
2946         """
2947         if self.error_bad_lines:
-> 2948             raise ParserError(msg)
2949         elif self.warn_bad_lines:
2950             base = f"Skipping line {row_num}: "
ParserError: NULL byte detected. This byte cannot be processed in Python's native csv library at the moment, so please pass in engine='c' instea

Error: could not find function … in R [How to Solve]

Error: could not find function … in R

Question:

solve:

Full error:


Question:

> mytest.ax(lable,prediction)
Error in mytest.ax(lable, prediction) :
could not find function “mytest.ax”

Solution:

First, is the function name written correctly?R language function names are case sensitive.

Second, is the package containing the function installed?install.packages(“package_name”)

Third,

require(package_name)

library(package)

Require (package_name) (and check its return value) or library (package) (this should be done every time you start a new R session)

Fourth, are you using an old r version that does not yet exist?Or the version of R package; Or after the version is updated, some functions are removed from the original package;

Fifth, functions are added and removed over time, and the referenced code may expect an updated or older version than the package you installed. Or it’s too new. Cran doesn’t contain the latest version;

Full error:

> mytest.ax(lable,prediction)
Error in mytest.ax(lable, prediction) :
could not find function "mytest.ax"

[Solved] Grid Search Error (GridSearchCV): ‘ascii‘ codec can‘t encode characters in position 18-20: ordinal not in r

Grid Search Error: UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 18-20: ordinal not in range(128)

E:\DLstudy\Scripts\python.exe E:/PycharmProjects/DLstudy/run/train_model.py
[INFO] tuning hyperparameters...
Traceback (most recent call last):
  File "E:\PycharmProjects\DLstudy\run\train_model.py", line 22, in <module>
    model.fit(trainX, trainY)
  File "E:\DLstudy\lib\site-packages\sklearn\model_selection\_search.py", line 820, in fit
    with parallel:
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 725, in __enter__
    self._initialize_backend()
  File "E:\DLstudy\lib\site-packages\joblib\parallel.py", line 735, in _initialize_backend
    n_jobs = self._backend.configure(n_jobs=self.n_jobs, parallel=self,
  File "E:\DLstudy\lib\site-packages\joblib\_parallel_backends.py", line 494, in configure
    self._workers = get_memmapping_executor(
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 20, in get_memmapping_executor
    return MemmappingExecutor.get_memmapping_executor(n_jobs, **kwargs)
  File "E:\DLstudy\lib\site-packages\joblib\executor.py", line 42, in get_memmapping_executor
    manager = TemporaryResourcesManager(temp_folder)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 531, in __init__
    self.set_current_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 535, in set_current_context
    self.register_new_context(context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 560, in register_new_context
    self.register_folder_finalizer(new_folder_path, context_id)
  File "E:\DLstudy\lib\site-packages\joblib\_memmapping_reducer.py", line 590, in register_folder_finalizer
    resource_tracker.register(pool_subfolder, "folder")
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 191, in register
    self._send('REGISTER', name, rtype)
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Process finished with exit code 1

Solutioin:

Original error code:

model = GridSearchCV(LogisticRegression(), params, cv=3, n_jobs=-1)

Set parameter n_jobs = - 1 parameter can be deleted and changed to:

model = GridSearchCV(LogisticRegression(), params, cv=3)

After a look, this parameter indicates how many processors we need to work

 

n_jobs : int, default=None
        Number of jobs to run in parallel.
        ``None`` means 1 unless in a :obj:`joblib.parallel_backend` context.
        ``-1`` means using all processors. See :term:`Glossary <n_jobs>`
        for more details.

If n_jobs = – 1 is specified, there is a step at the bottom to use ASCII for coding, but the coding fails every time
therefore, if we do not specify this parameter, one processor will be used by default.

If you really want to specify multiple processors

Then we need to modify the code of the path with the problem in our error message
for example, in our error messages:

  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 204, in _send
    msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
UnicodeEncodeError: 'ascii' codec can't encode characters in position 18-20: ordinal not in range(128)

Note the location of my error is e:\dlstudy\lib\site packages\joblib\externals\rocky\backend\resource_Line 204 of tracker.py In the _send method, click
Source code of _send function:

  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Change
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('ascii')
to
msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
That is, the encoding is changed to utf-8, and the changed code is as follows.

```python
  def _send(self, cmd, name, rtype):
        msg = '{0}:{1}:{2}\n'.format(cmd, name, rtype).encode('utf8')
        if len(name) > 512:
            # posix guarantees that writes to a pipe of less than PIPE_BUF
            # bytes are atomic, and that PIPE_BUF >= 512
            raise ValueError('name too long')
        nbytes = os.write(self._fd, msg)
        assert nbytes == len(msg)

Then run the code again
don’t worry, you will still report errors. Because we only modified the encoding method, but not the decoding method
the error information is as follows:

     .............(...)
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)
Traceback (most recent call last):
  File "E:\DLstudy\lib\site-packages\joblib\externals\loky\backend\resource_tracker.py", line 253, in main
    splitted = line.strip().decode('ascii').split(':')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe5 in position 18: ordinal not in range(128)

Similarly, find the error path in the error message: e:\dlstudy\lib\sitepackages\joblib\externals\loky\backend\resource

There is an error in line 253 of the tracker.py file. We found the corresponding location:

......(...)
       with open(fd, 'rb') as f:
            while True:
                line = f.readline()
                if line == b'':  # EOF
                    break
                try:
                    splitted = line.strip().decode('ascii').split(':')
                    # name can potentially contain separator symbols (for
                    # instance folders on Windows)
                    cmd, name, rtype = (
                        splitted[0], ':'.join(splitted[1:-1]), splitted[-1])
......(...)

We just need to replace
line. Strip(). Decode ('ascii '). Split (': ')
with
line. Strip(). Decode (' 'utf8). Split (': ') ,
Run the file again to succeed.