Tag Archives: Machine learning

[Solved] Python Pandas Read Error: OSError: initializing from file failed

Problem Description:

error when loading CSV format data in pandas

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On Data Analysis/Unit 1 Project Collection/train.csv")
B.head(3)

report errors:

OSError: Initializing from file failed

Cause analysis:

When calling the read_csv() method of pandas, the C engine is used as the parser engine by default, and when the file name contains Chinese, using the C engine will be wrong in some cases.


Solution:

Specify the engine as Python when calling the read_csv() method

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On-Data-Analysis/Unit-1-Project-Collection/train.csv",engine='python')
B.head(3)

[Solved] Pyg load dataset Error: attributeerror [pytorch geometry]

AttributeError: ‘GlobalStorage’ object has no attribute ‘train_mask’ Solution

 def create_masks(data):
    """
    Splits data into training, validation, and test splits in a stratified manner if
    it is not already splitted. Each split is associated with a mask vector, which
    specifies the indices for that split. The data will be modified in-place
    :param data: Data object
    :return: The modified data
    """
    if not hasattr(data, "val_mask"):

        data.train_mask = data.dev_mask = data.test_mask = None

        for i in range(20):
            labels = data.y.numpy()
            dev_size = int(labels.shape[0] * 0.1)
            test_size = int(labels.shape[0] * 0.8)

            perm = np.random.permutation(labels.shape[0])
            test_index = perm[:test_size]
            dev_index = perm[test_size:test_size + dev_size]

            data_index = np.arange(labels.shape[0])
            test_mask = torch.tensor(np.in1d(data_index, test_index), dtype=torch.bool)
            dev_mask = torch.tensor(np.in1d(data_index, dev_index), dtype=torch.bool)
            train_mask = ~(dev_mask + test_mask)

            test_mask = test_mask.reshape(1, -1)
            dev_mask = dev_mask.reshape(1, -1)
            train_mask = train_mask.reshape(1, -1)


            if data.train_mask is None:
                data.train_mask = train_mask
                data.val_mask = dev_mask
                data.test_mask = test_mask
            else:

                data.train_mask = torch.cat((data.train_mask, train_mask), dim=0)
                data.val_mask = torch.cat((data.val_mask, dev_mask), dim=0)
                data.test_mask = torch.cat((data.test_mask, test_mask), dim=0)

    else:  # in the case of WikiCS
        data.train_mask = data.train_mask.T
        data.val_mask = data.val_mask.T

    return data

AttributeError: 'GlobalStorage' object has no attribute 'train_mask'
Line 33: Change
if data.train_mask is None: to if 'train_mask' not in data:

[Solved] AttributeError: ‘DataParallel‘ object has no attribute ‘save‘

Error message:

trainer.model.save(self.dir, epoch, is_best=is_best)
AttributeError: 'DataParallel' object has no attribute 'save'

Source code analysis:

 trainer.model.save(self.dir, epoch, is_best=is_best)

The above code is the code before using single machine multi card parallel. My parallel code is implemented as follows:

os.environ["CUDA_VISIBLE_DEVICES"] = "3,2,1"
model = torch.nn.DataParallel(model,device_ids=[0,1]).cuda()

Cause analysis: attributeerror: ‘dataparallel’ object has no attribute ‘save‘

Under torch multi GPU training, the whole model is stored instead of the model state_Dict(), so we need to use model when calling model Module mode. After using the above modification method, the code is as follows:

 trainer.model.module.save(self.dir, epoch, is_best=is_best)

[Solved] octave Error: -error: ‘squareThisNumber‘ undefined near line 1 column 1

. M file name should also be capitalized: squarethisnumber m

Question 2:

parse error near line 1 of file C:\Users\asus\squareThisNumber. m

syntax error

>>> {\rtf1\ansi\ansicpg936\deff0\nouicompat{\fonttbl{\f0\fnil\fcharset134 \’cb\’ce\’cc\’e5;}}

Solution: the WordPad program (nodepad) opens the file and finds many redundant characters. Delete them and add endfunction at the end.

[Solved] AttributeError : ‘GridSearchCV‘ object has no attribute ‘grid_scores_‘

The reason is that grid_scores_ has been deleted in version 0.20 of sklearn and replaced by cv_results_.

Method 1 (version 0.20 deleted):
grid_search.grid_scores_
method 2 (applicable to version 0.20):
means = grid_search.cv_results_[‘mean_test_score’]
params = grid_search.cv_results_[‘params’]

[Solved] Python Error: An attempt has been made to start a new process before the current process has finished …

This error usually occurs in Windows systems using multiple processes. For example, execute the following code in pychar:

import torch
import torch.utils.data as Data
import numpy as np
from sklearn.datasets import load_iris

iris_x, irisy = load_iris(return_X_y=True)
print("iris_x.dtype:", iris_x.dtype)
print("irisy:", irisy.dtype)

## transform the training set x into a tensor, and the training set y into a tensor
train_xt = torch.from_numpy(iris_x.astype(np.float32))
train_yt = torch.from_numpy(irisy.astype(np.int64))
print("train_xt.dtype:", train_xt.dtype)
print("train_yt.dtype:", train_yt.dtype)

## After converting the training set into a tensor, use TensorDataset to collate X and Y together
train_data = Data.TensorDataset(train_xt, train_yt)
## Define a data loader to batch the training dataset
train_loader = Data.DataLoader(
    dataset=train_data, ## the dataset to use
    batch_size=10, # # Batch sample size
    shuffle=True, # Break up the data before each iteration
    num_workers=2, # [Note: 2 processes are used here]
)

## Check if the dimensionality of the samples of a batch of the training dataset is correct
for step, (b_x, b_y) in enumerate(train_loader):
    if step > 0:
        break
## Output the dimensions of the training image and the labels, and the data type
print("b_x.shape:", b_x.shape)
print("b_y.shape:", b_y.shape)
print("b_x.dtype:", b_x.dtype)
print("b_y.dtype:", b_y.dtype)


## --------- -The correct result is as follows -------- --

# iris_x.dtype: float64
# irisy: int32
# train_xt.dtype: torch.float32
# train_yt.dtype: torch.int64
# b_x.shape: torch.Size([10, 4])
# b_y.shape: torch.Size([10])
# b_x.dtype: torch.float32
# b_y.dtype: torch.int64

The following errors will be reported. (no error will be reported when running in jupyter notebook under the same environment. I don’t know why…)

RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

 

Solution 1:

Remove the statement setting up multiple processes. In this example, comment or delete the following line.

num_workers=2,  # [Note: 2 processes are used here]

Solution 2:

Move the code part of calling multiple processes to [if _name_ = = ‘_main_’:].

if __name__ == '__main__':
    ##  Check if the dimensionality of the samples of a batch of the training dataset is correct
    for step, (b_x, b_y) in enumerate(train_loader):
        if step > 0:
            break
        ## Output the dimensions of the training image and the dimensions of the labels, and the data type
    print("b_x.shape:", b_x.shape)
    print("b_y.shape:", b_y.shape)
    print("b_x.dtype:", b_x.dtype)
    print("b_y.dtype:", b_y.dtype)

However, in pychart, the part before [for step, (b_x, b_y) in enumerate (train_loader):] will be executed twice.

## ——————————The result of running in Pycharm is as follows——————————
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
iris_x.dtype: float64
irisy: int32
train_xt.dtype: torch.float32
train_yt.dtype: torch.int64
b_x.shape: torch.Size([10, 4])
b_y.shape: torch.Size([10])
b_x.dtype: torch.float32
b_y.dtype: torch.int64

Error: Discrete value supplied to continuous scale [How to Solve]

 

#Simulation data

df <- structure(list(`10` = c(0, 0, 0, 0, 0, 0), `33.95` = c(0, 0, 
0, 0, 0, 0), `58.66` = c(0, 0, 0, 0, 0, 0), `84.42` = c(0, 0, 
0, 0, 0, 0), `110.21` = c(0, 0, 0, 0, 0, 0), `134.16` = c(0, 
0, 0, 0, 0, 0), `164.69` = c(0, 0, 0, 0, 0, 0), `199.1` = c(0, 
0, 0, 0, 0, 0), `234.35` = c(0, 0, 0, 0, 0, 0), `257.19` = c(0, 
0, 0, 0, 0, 0), `361.84` = c(0, 0, 0, 0, 0, 0), `432.74` = c(0, 
0, 0, 0, 0, 0), `506.34` = c(1, 0, 0, 0, 0, 0), `581.46` = c(0, 
0, 0, 0, 0, 0), `651.71` = c(0, 0, 0, 0, 0, 0), `732.59` = c(0, 
0, 0, 0, 0, 1), `817.56` = c(0, 0, 0, 1, 0, 0), `896.24` = c(0, 
0, 0, 0, 0, 0), `971.77` = c(0, 1, 1, 1, 0, 1), `1038.91` = c(0, 
0, 0, 0, 0, 0), MW = c(3.9, 6.4, 7.4, 8.1, 9, 9.4)), .Names = c("10", 
"33.95", "58.66", "84.42", "110.21", "134.16", "164.69", "199.1", 
"234.35", "257.19", "361.84", "432.74", "506.34", "581.46", "651.71", 
"732.59", "817.56", "896.24", "971.77", "1038.91", "MW"), row.names = c("Merc", 
"Peug", "Fera", "Fiat", "Opel", "Volv"
), class = "data.frame")


df

Question:

library(reshape)

## Plotting
meltDF = melt(df, id.vars = 'MW')
ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
  scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
  scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Solution:

After the meltdf variable is defined, the factor variable can be transformed into numerical white energy;

If x is a numeric value, add scale_x_continual(); If x is a character/factor, add scale_x_discreate().

meltDF$variable=as.numeric(levels(meltDF$variable))[meltDF$variable]


ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y =   variable)) +
     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))

Full Error Messages:
> library(reshape)
>
> ## Plotting
> meltDF = melt(df, id.vars = ‘MW’)
> ggplot(meltDF[meltDF$value == 1,]) + geom_point(aes(x = MW, y = variable)) +
+     scale_x_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200)) +
+     scale_y_continuous(limits=c(0, 1200), breaks=c(0, 400, 800, 1200))
Error: Discrete value supplied to continuous scale
>

[Solved] Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]

Scala error: type mismatch; found : java.util.List[?0] required: java.util.List[B]


Problem:
Due to the incompatibility between Scala type inference and Java type inference;

import java.util
import java.util.stream.Collectors
class Animal
class Dog extends Animal
class Cat extends Animal

object ObjectConversions extends App {

  import java.util.{List => JList}
  implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())
  val a= new util.ArrayList[Animal]()
  a.add(new Cat)
  convertLowerBound[Cat](a)
}

Solution:

When calling Java methods and still want to infer generic types, you need to pass generic types specifically

#Or the implicit conversion statement is not imported
import scala.collection.JavaConversions._

def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())


#or

def convertLowerBound[B <: Animal : TypeTag] (a: JList[Animal]) = a.asInstanceOf[JList[B]]

scala> def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map[B](a => a.asInstanceOf[B]).collect(Collectors.toList[B]())
convertLowerBound: [B <: Animal](a: java.util.List[Animal])java.util.List[B]
scala> convertLowerBound[Cat](a)
res30: java.util.List[Cat] = [[email protected], [email protected]]
scala> a.add(new Cat())
res16: Boolean = true
scala> convertLowerBound[Cat](a)
res17: java.util.List[Cat] = [[email protected]]
scala> a.add(new Dog())
res19: Boolean = true
scala> convertLowerBound[Cat](a)
res20: java.util.List[Cat] = [[email protected], [email protected]]

完整错误:
<console>:15: error: type mismatch;
found   : java.util.List[?0]
required: java.util.List[B]
Note: ?0 >: B, but Java-defined trait List is invariant in type E.
You may wish to investigate a wildcard type such as `_ >: B`. (SLS 3.2.10)
implicit  def convertLowerBound[ B <: Animal] (a: JList[Animal]): JList[B] = a.stream().map(a => a.asInstanceOf[B]).collect(Collectors.toList())

Error in scikit learn installation of CONDA

Installing scikit learn under win7 will report an error

ERROR conda.core.link:_execute(701): An error occurred while installing package 'defaults::scikit-learn-1.0.1-py37hf11a4ad_0'.
Rolling back transaction: done

LinkError: post-link script failed for package defaults::scikit-learn-1.0.1-py37hf11a4ad_0
location of failed script: E:\PPY\Scripts\.scikit-learn-post-link.bat
==> script messages <==
<None>
==> script output <==
stdout:
stderr:
return code: 1

This should be the problem caused by the source of CONDA. The solution can be installed using the following command

conda install -c anaconda scikit-learn