Tag Archives: Python ValueError

[resolution] str.contains() problem] valueerror: cannot index with vector containing Na/Nan values

Problem description;
when using dataframe, perform the following operations:

df[df.line.str.contains('G')]

The purpose is to find out all the lines in the line column of DF that contain the character ‘g’

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-10f8503f73f2> in <module>()
---->  df.line.str.contains('G')

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2983 
   2984         # Do we have a (boolean) 1d indexer?
-> 2985         if com.is_bool_indexer(key):
   2986             return self._getitem_bool_array(key)
   2987 

D:\Anaconda3\lib\site-packages\pandas\core\common.py in is_bool_indexer(key)
    128             if not lib.is_bool_array(key):
    129                 if isna(key).any():
--> 130                     raise ValueError(na_msg)
    131                 return False
    132             return True

ValueError: cannot index with vector containing NA / NaN values

Obviously, it means that there are Na or Nan values in the line column, so Baidu has a lot of methods on the Internet to teach you how to delete the Na / Nan values in the line column.

However, deleting the row containing Na / Nan value in the line column still can’t solve the problem!! What shall I do?

Solution:
it’s very simple. In fact, it’s very likely that the element formats in the line column are not all STR formats, and there may be int formats, etc.
so you just need to unify the format of the line column into STR format!
The operation is as follows:

df['line'] = df['line'].apply(str) #Change the format of the line column to str

df[df.line.str.contains('G')] #Execute your corresponding statement

solve the problem!!

Python 3.X error: valueerror: data type must provide an itemsize

1. Overview

The error occurs when multiplying the acquired data;
the reason for the error is that the data matrix is a string (read from the file);
the solution is to convert the data in batch, line by line, and convert the string data into floating point or integer.

2. Solutions

2.1 error code
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# @Time    : 2019/2/21 14:16
# @Author  : Arrow and Bullet
# @FileName: error.py
# @Software: PyCharm


def loadDataSet(fileName):
    fr = open(fileName)
    dataMat = []
    for line in fr.readlines():
        currLineListStr = line.strip().split("\t") 
        dataMat.append(currLineListStr[0:numFeat])
    return dataMat

The data read out here are all strings, for example:

# [['1.000000', '0.067732'], ['1.000000', '0.427810']]

Then, when you multiply a matrix like this, the error data type must provide an itemsize will be reported.

2.2 correct code (solution)

1

def loadDataSet(fileName):
    fr = open(fileName)
    dataMat = []
    for line in fr.readlines():
        currLineListStr = line.strip().split("\t") 
        currLineListFloat = []
        for i in currLineListStr:  # Convert string data to floating point numbers line by line
            currLineListFloat.append(float(i))
        dataMat.append(currLineListFloat[0:numFeat])
    return dataMat

The data read out here are all floating-point numbers, for example:

# [[1.0, 0.067732], [1.0, 0.42781]]

Then when you multiply with such a matrix, there is no error.

I hope I can help you. If you have any questions, you can comment on them directly. If you like, you can praise them for more people to see. If you are not detailed enough, you can also say that I will reply in time.

Python ValueError: only 2 non-keyword arguments accepted

Tiger input the following code on the problem, because there is no clear matrix format. Just add a box outside the matrix group. See the following for details.

source code

import time
import numpy as np

A = np.array([56.0, 0.0, 4.4, 68.0],
             [1.2, 104.0, 52.0, 8.0],
             [1.8, 135.0, 99.0, 0.9])

cal=A.sum(axis=0)
print(cal)

After modification

import time
import numpy as np

A = np.array([[56.0, 0.0, 4.4, 68.0],
             [1.2, 104.0, 52.0, 8.0],
             [1.8, 135.0, 99.0, 0.9]])

cal=A.sum(axis=0)
print(cal)

Python Valueerror: cannot index with vector containing Na / Nan values

Problem description;
when using dataframe, perform the following operations:

df[df.line.str.contains('G')]

The purpose is to find out all the lines in the line column of DF that contain the character ‘g’

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-10f8503f73f2> in <module>()
---->  df.line.str.contains('G')

D:\Anaconda3\lib\site-packages\pandas\core\frame.py in __getitem__(self, key)
   2983 
   2984         # Do we have a (boolean) 1d indexer?
-> 2985         if com.is_bool_indexer(key):
   2986             return self._getitem_bool_array(key)
   2987 

D:\Anaconda3\lib\site-packages\pandas\core\common.py in is_bool_indexer(key)
    128             if not lib.is_bool_array(key):
    129                 if isna(key).any():
--> 130                     raise ValueError(na_msg)
    131                 return False
    132             return True

ValueError: cannot index with vector containing NA / NaN values

Obviously, it means that there are Na or Nan values in the line column, so Baidu has a lot of methods on the Internet to teach you how to delete the Na / Nan values in the line column.

However, deleting the row containing Na / Nan value in the line column still can’t solve the problem!! What shall I do?

Solution:
it’s very simple. In fact, it’s very likely that the element formats in the line column are not all STR formats, and there may be int formats, etc.
so you just need to unify the format of the line column into STR format!
The operation is as follows:

df['line'] = df['line'].apply(str) #Change the format of the line column to str

df[df.line.str.contains('G')] #Execute your corresponding statement

solve the problem!!

Python errors: valueerror: if using all scalar values, you must pass an index (four solutions)

 

1. Error scenarios:

import pandas as pd
dict = {'a':1,'b':2,'c':3}
data = pd.DataFrame(dict)

2. Error reason:

When passing in the dictionary with nominal attribute value directly, you need to write index, that is, you need to set index when creating the dataframe object.

3. Solution:

It is a common requirement to create dataframe objects through dictionaries, but there may be different writing methods for different object forms. Looking at the code, the following four methods can correct this error and produce the same correct results. Just choose which method to use according to your own needs.

import pandas as pd

#Method 1: Directly set the index when creating the DataFrame
dict = {'a':1,'b':2,'c':3}
data = pd.DataFrame(dict,index=[0])
print(data)

#Method 2: Convert the dictionary with value as nominal variable to DataFrame object by from_dict function
dict = {'a':1,'b':2,'c':3}
pd.DataFrame.from_dict(dict,orient='index').T
print(data)

#Method 3: When entering the dictionary, do not let the Value be the nominal property, convert the Value to a list object and then pass it in.
dict = {'a':[1],'b':[2],'c':[3]}
data = pd.DataFrame(dict)
print(data)

#Method 4: directly take out the key and value, are converted to list objects
dict = {'a':1,'b':2,'c':3}
pd.DataFrame(list(dict.items()))
print(data)

[How to Fix]The truth value of a series is ambiguous

The truth value of a series is ambiguous

It is estimated that you are using pandas when this problem occurs. If so, congratulations on finding a solution. Ha ha~

#General Purpose Example

FI_lasso[(FI_lasso["columns"]<0.001) and (FI_lasso["columns"]>=0)]

If you also encounter such a problem, Congratulations, the solution is very simple

The core meaning is to use & amp; instead of and or
for logical judgment in dataframe|

Like this


FI_lasso[(FI_lasso["columns"]<0.001) & (FI_lasso["columns"]>=0)]

Solve the problem and leave~