Tag Archives: data analysis

[Solved] raise KeyError(key) from err KeyError: ‘Dates‘

Description of error reporting:

Today, when reading Excel data and processing the data, an error is reported as follows:

Error reason:

The Excel table data read in by pandas is not aligned. Please check the Excel table data read in
I print out the dataframe after reading it into Excel

Solution:

Delete Sheet2 and Sheet3 in the Excel table, so that Pandas will align after reading the Excel table data

The root cause of such problems is that the data is not aligned

Debug method: 1. Print out the dataframe data and check the format of the data after it is read in. 2. Then adjust the read data

[Solved] Error in Summary.factor ‘sum’ not meaningful for factors

 

Question:

The root cause is the wrong data type.

The factor type has no sum method

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find min value in the vector
sum(factor_vector)

Error in Summary.factor(1:5, na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

Solution:
Convert to numeric values and use the as.numeric function.
mydata$value<-as.numeric(mydata$value)
is.numeric(mydata$value)

#convert factor vector to numeric vector and find the min value
new_vector <- as.numeric(as.character(factor_vector))
sum(new_vector)

#[1] 49

Complete error:

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find min value in the vector
sum(factor_vector)

Error in Summary.factor(1:5, na.rm = FALSE) : 
  ‘sum’ not meaningful for factors

Other (the minimum value can be obtained for numeric value, string and date type)

Numeric value, string and date type can all be maximized. Similarly, the maximum value can be obtained.

numeric_vector <- c(1, 2, 12, 14)
max(numeric_vector)

#[1] 14

character_vector <- c("a", "b", "f")
max(character_vector)

#[1] "f"

date_vector <- as.Date(c("2019-01-01", "2019-03-05", "2019-03-04"))
max(date_vector)

#[1] "2019-03-05"

The R language is called R partly because of the names of the two R authors (Robert gentleman and Ross ihaka) and partly because of the influence of Bell Labs s language (called the dialect of s language).

R language is a mathematical programming language designed for mathematical researchers. It is mainly used for statistical analysis, drawing and data mining.

If you are a beginner of computer programs and are eager to understand the general programming of computers, R language is not an ideal choice. You can choose python, C or Java.

Both R language and C language are the research achievements of Bell Laboratories, but they have different emphasis areas. R language is an explanatory language for mathematical theory researchers, while C language is designed for computer software engineers.

R language is a language for interpretation and operation (different from the compilation and operation of C language). Its execution speed is much slower than that of C language, which is not conducive to optimization. However, it provides more abundant data structure operation at the syntax level and can easily output text and graphic information, so it is widely used in mathematics, especially in statistics

[Solved] Error in Summary.factor ‘max’ not meaningful for factors

Question:

The root cause is the wrong data type.

The factor type has no max method

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find max value in the vector
max(factor_vector)

#Error in Summary.factor(1:5, na.rm = FALSE) : 
#  'max' not meaningful for factors

Solution:

Convert to numeric value or string, here convert to numeric value.

mydata$value<-as.numeric(mydata$value)
is.numeric(mydata$value)

#convert factor vector to numeric vector and find the max value
new_vector <- as.numeric(as.character(factor_vector))
max(new_vector)

#[1] 15

Full error:

#create a vector of class vector
factor_vector <- as.factor(c(1, 7, 12, 14, 15))

#attempt to find max value in the vector
max(factor_vector)

#Error in Summary.factor(1:5, na.rm = FALSE) : 
#  'max' not meaningful for factors

 

Other (numeric value, string, date type can be the maximum value)

Numeric value, string and date type can all be the maximum value, and similarly, the minimum value can be obtained.

numeric_vector <- c(1, 2, 12, 14)
max(numeric_vector)

#[1] 14

character_vector <- c("a", "b", "f")
max(character_vector)

#[1] "f"

date_vector <- as.Date(c("2019-01-01", "2019-03-05", "2019-03-04"))
max(date_vector)

#[1] "2019-03-05"

[Solved] Geopy library Error: Configurationerror Error

Geopy library Error: Configuration Error

Error details

geopy.exc.ConfigurationError: 
Using Nominatim with default or sample `user_agent` "geopy/2.2.0" is strongly discouraged, as it violates Nominatim's ToS https://operations.osmfoundation.org/policies/nominatim/ and may possibly cause 403 and 429 HTTP errors. Please specify a custom `user_agent` with `Nominatim(user_agent="my-application")` or by overriding the default `user_agent`: `geopy.geocoders.options.default_user_agent = "my-application"`.

Solution:
This error is because the default value of UA is bad. Just specify user-agent as a unique string. For example
, when BuyiXiao initializes Nominatim, specify user-agent.

geolocator = Nominatim(user_agent='BuyiXiao')

[Solved] Error: ‘attrition‘ is not an exported object from ‘namespace:rsample‘

Error: ‘attrition’ is not an exported object from ‘namespace:rsample’


# Import package and library

# load required packages
library(rsample)
library(dplyr)
library(h2o)
library(DALEX)

# initialize h2o session
h2o.no_progress()
h2o.init()
##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         4 hours 30 minutes 
##     H2O cluster timezone:       America/New_York 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.18.0.11 
##     H2O cluster version age:    1 month and 17 days  
##     H2O cluster name:           H2O_started_from_R_bradboehmke_gny210 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   1.01 GB 
##     H2O cluster total cores:    4 
##     H2O cluster allowed cores:  4 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     H2O API Extensions:         XGBoost, Algos, AutoML, Core V3, Core V4 
##     R Version:                  R version 3.5.0 (2018-04-23)

#Data preprocessing and processing to h2o format;

Error: ‘attrition’ is not an exported object from ‘namespace:rsample’

#

# classification data
df <- rsample::attrition %>% 
  mutate_if(is.ordered, factor, ordered = FALSE) %>%
  mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))

# convert to h2o object
df.h2o <- as.h2o(df)

# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames = c("train","valid","test"))
names(splits) <- c("train","valid","test")

# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y) 

Solution:

You can use the attrition dataset of DALEX package directly;

Remove resample:

#

# classification data
df <- attrition %>% 
  mutate_if(is.ordered, factor, ordered = FALSE) %>%
  mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))

# convert to h2o object
df.h2o <- as.h2o(df)

# create train, validation, and test splits
set.seed(123)
splits <- h2o.splitFrame(df.h2o, ratios = c(.7, .15), destination_frames = c("train","valid","test"))
names(splits) <- c("train","valid","test")

# variable names for resonse & features
y <- "Attrition"
x <- setdiff(names(df), y) 

Full Error Messages:

> # classification data
> df <- rsample::attrition %>% 
+     mutate_if(is.ordered, factor, ordered = FALSE) %>%
+     mutate(Attrition = recode(Attrition, "Yes" = "1", "No" = "0") %>% factor(levels = c("1", "0")))
Error: 'attrition' is not an exported object from 'namespace:rsample'
>

[Solved] transformers Install Error: error can‘t find rust compiler

Install transformers after reinstalling the system. If you encounter a bug, record it and check it later.

When reinstalling with pip install transformers command under windows, an error is reported:

error: can't find Rust compiler
      
    If you are using an outdated pip version, it is possible a prebuilt wheel is available for this package but pip is not able to install from it. Installing from the wheel would avoid the need for a Rust compiler.
      
    To update pip, run:
      
        pip install --upgrade pip
      
    and then retry package installation.
      
    If you did intend to build this package from source, try installing a Rust compiler from your system package manager and ensure it is on the PATH during installation. Alternatively, rustup (available at https://rustup.rs) is the recommended way to download and update the Rust compiler toolchain.
    [end of output]
  
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for tokenizers
Failed to build tokenizers
ERROR: Could not build wheels for tokenizers, which is required to install pyproject.toml-based projects

According to the error prompt, first run pip install -- upgrade pip is invalid, and then install Rust Compiler according to the error prompt. First go to the official website to download the corresponding installation package, select the 64-bit installation file according to my actual situation, then click the downloaded exe file to install, and select the default configuration during the installation process.

According to the instructions on the official website, all tools of rust are in the  ~/.cargo/bin directory includes commands:  rustc, cargo and rustup . Therefore, it needs to be configured into the environment variable, but windows will configure it automatically, but the configured environment variable will take effect only after restarting the computer under windows. After restarting, run the installation command again:

pip install transformers

The result is a successful installation. The screenshot is as follows:

[Solved] R Language Error: Discrete value supplied to continuous scale

Error: discrete value supplied to continuous scale

#Simulation data

set.seed(123)
my_df1 <- data.frame(a=rnorm(100), b=runif(100), c=rep(1:10, 10))
my_df2 <- data.frame(a=rnorm(100), b=runif(100), c=factor(rep(LETTERS[1:5], 20)))

#Error: Discrete value supplied to continuous scale

ggplot() +
  geom_point(data=my_df1, aes(x=a, y=b, color=c)) +
  geom_polygon(data=my_df2, aes(x=a, y=b, color=c), alpha=0.5)

#Solution:

Use the fill parameter;

ggplot() +
  geom_point(data=my_df1, aes(x=a, y=b, color=c)) +
  geom_polygon(data=my_df2, aes(x=a, y=b, fill=c), alpha=0.5)

[Solved] Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), v

Error Messages:
> library(ggplot2)
Error: package or namespace load failed for ‘ggplot2’ in loadNamespace(i, c(lib.loc, .libPaths()), versionCheck = vI[[i]]):

Loaded namespace ‘ellipsis’ 0.3.1, but what is needed is >= 0.3.2

Solution:
In Rstudio, find the packages module, remove the package that reported the error and then re-install it using install.packages for the corresponding installation.

[Solved] Python Pandas Read Error: OSError: initializing from file failed

Problem Description:

error when loading CSV format data in pandas

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On Data Analysis/Unit 1 Project Collection/train.csv")
B.head(3)

report errors:

OSError: Initializing from file failed

Cause analysis:

When calling the read_csv() method of pandas, the C engine is used as the parser engine by default, and when the file name contains Chinese, using the C engine will be wrong in some cases.


Solution:

Specify the engine as Python when calling the read_csv() method

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On-Data-Analysis/Unit-1-Project-Collection/train.csv",engine='python')
B.head(3)

[Solved] python Error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.

Python Error: ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Error Codes:

if (code in list(changed_code['Old material code'])):
            temp_index = changed_code.loc[changed_code['Old material code'] == code].index

The type of code here is float.

Cause analysis:

In this judgment method, the judged value cannot be of float type.

Solution:

Just convert float format to int format

if (int(code) in list(changed_code['Old material code'])):
            temp_index = changed_code.loc[changed_code['Old material code'] == code].index

Solve the problem, brothers, get better!!

Python Pandas Error: KeyError: 0 [How to Solve]

Keyerror: error reported by 0

The following are error codes

I call my own library function and use apply to realize vlookup in Excel. The following is the code

data2 = super_function.vlook_up(data1, ['material group', 'material description'], data, ['material group', 'material group description'])

Error message

KeyError: 0

Error reporting reason

This kind of error reporting is due to the index problem. As a result, some numbers were deleted during the original data processing, resulting in the index starting from 6 instead of 0.

Solution:

Just reassign the index

data1.index = list(range(len(data1)))

result

Run successfully.

[Solved] Excel plug in installation failed: unable to resolve the value of property ‘type’

[Description of the problem]
The third party to Excel plug-in installation package as Figure 1, I have not done Excel plug-in installation package, it is estimated that the callVSTOInstaller.exe

The installation failed with the following message

ERROR message ” The value of the property ‘type’ cannot be parsed. The error is: Could not load file or assembly ‘Microsoft.Office.BusinessApplications.Fba,Version=14.0.0.0,Culture=nutral, PublicKeyToken=71e9ce111e9429c’ or one of its dependencies. The system cannot find the file specified. (C:\Program Files\Common Files\Microsoft Shared\VSTO\10.0\VSTOInstaller.exe.Config Line 10)

[Solution]
Fixed location plugin folder

 

    1. C:\Program Files (x86)\Common Files\Microsoft shared\VSTO\10.0 or C:\Program Files\Common Files\Microsoft shared\VSTO\10.0.

Rename VSTOInstaller.exe.config, such as VSTOInstaller.exe.config.old. and reinstall successfully.

[Run Result]
After installation plug-in directory.


Normal operation interface.