Tag Archives: Pandas

[Solved] pandas ExcelWrite AttributeError: ‘NoneType‘ object has no attribute ‘group‘

An error is reported when writing the specified cell. When the following two sentences are executed

name_format = workbook.add_format({'font_color': '#0000FF'})
worksheet.write('k1', os.path.basename(gtk_agile_bom_path), name_format)
worksheet.write('k1', os.path.basename(gtk_agile_bom_path), name_format)
  File "E:\Gerrit_Project\pyenv\bomflyenv\lib\site-packages\xlsxwriter\worksheet.py", line 82, in cell_wrapper
    new_args = xl_cell_to_rowcol(first_arg)
  File "E:\Gerrit_Project\pyenv\bomflyenv\lib\site-packages\xlsxwriter\utility.py", line 126, in xl_cell_to_rowcol
    col_str = match.group(2)
AttributeError: 'NoneType' object has no attribute 'group'

 

Solution:

I found that k1,k lowercase can not be written, change it to the upper case: K1, and then run it OK

worksheet.write('K1', os.path.basename(gtk_agile_bom_path), name_format)

[Solved] raise KeyError(key) from err KeyError: ‘Dates‘

Description of error reporting:

Today, when reading Excel data and processing the data, an error is reported as follows:

Error reason:

The Excel table data read in by pandas is not aligned. Please check the Excel table data read in
I print out the dataframe after reading it into Excel

Solution:

Delete Sheet2 and Sheet3 in the Excel table, so that Pandas will align after reading the Excel table data

The root cause of such problems is that the data is not aligned

Debug method: 1. Print out the dataframe data and check the format of the data after it is read in. 2. Then adjust the read data

[Solved] Python Pandas Read Error: OSError: initializing from file failed

Problem Description:

error when loading CSV format data in pandas

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On Data Analysis/Unit 1 Project Collection/train.csv")
B.head(3)

report errors:

OSError: Initializing from file failed

Cause analysis:

When calling the read_csv() method of pandas, the C engine is used as the parser engine by default, and when the file name contains Chinese, using the C engine will be wrong in some cases.


Solution:

Specify the engine as Python when calling the read_csv() method

B = pd.read_csv("C:/Users/hp/Desktop/Hands-On-Data-Analysis/Unit-1-Project-Collection/train.csv",engine='python')
B.head(3)

[Solved] SyntaxError: (unicode error) ‘unicodeescape‘ codec can‘t decode bytes in position 10-11: malformed

#Read a *.txt file using the read_table() function in the Pandas library
data = pd.read_table(r'D:\New\test.txt',delimiter=',',encoding = 'UTF-8')
print(data)

Title defect Solution: add “R” before the path to solve it.

python D:\New\MyTest.py
        name  date   id
0   jianghu  20210201  00001
1  jianghu1  20210202  00002
2  jianghu2  20210203  00003

 

[Solved] AttributeError: module ‘pandas‘ has no attribute ‘rolling_count‘

Problem Description:

For the problems encountered in automatic modeling today, we use iris data set to initialize the automl framework and pass in training data. The problem is that in the last line of fit, an error is reported: attributeerror: module ‘pandas’ has no attribute’ rolling_ At that time, I read the wrong version of pandas on the Internet. Then I reinstalled it on the Internet and found that it still couldn’t.

Use Microsoft’s flaml automated modeling framework to directly pip, Install flaml. Attach Code:

from flaml import AutoML
from sklearn.datasets import load_iris
import pandas as pd



iris = load_iris()
iris_data = pd.concat([pd.DataFrame(iris.data),pd.Series(iris.target)],axis=1)
iris_data.columns = ["_".join(feature.split(" ")[:2]) for feature in iris.feature_names]+["target"]
iris_data = iris_data[(iris_data.target==0) |(iris_data.target==1)]


flaml_automl = AutoML()
flaml_automl.fit(pd.DataFrame(iris_data.iloc[:,:-1]),iris_data.iloc[:,-1],time_budget=10,estimator_list=['lgbm','xgboost'])

After the upgrade dask is finally executed (PIP install — upgrade dask), it can run normally. However, it is strange that the error message does not prompt dask related problems. Some bloggers on the Internet say that dask provides interfaces to pandas and numpy, which may be caused by the low version of the interface??

Finally, after upgrading dask, the problem is solved!

ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied

Error content: importerror: C extension: DLL load failed: access denied. not built. If you want to import pandas from the source directory, you may need to run ‘python setup.py build_ ext –inplace –force’ to build the C extensions first.

Reason for error reporting: it may be caused by deleting the installation dependency package of pandas in the environment by mistake, or by deleting the anti-virus software. Generally, the latter does it.

Solution 1: uninstall pandas: PIP uninstall pandas , reinstall: PIP install pandas

—————————————————————————
if you use the above method, you may make mistakes

Error content: error: could not install packages due to an oserror: [errno 13] permission denied: ‘your project path \ venv \ lib \ site packages \ pandas\_ libs\
tslibs\period.cp36-win_ amd64.pyd’
Check the permissions.

Error reason: it means that there are missing files in the folder reported above and cannot be downloaded.

Solution 2: delete the folder and reinstall pandas. For example, in this example, delete pandas under site packages. Remember to delete pandas under site packages instead of site packages. Don’t make a mistake. Delete all the environment and you’ll be finished.

—————————————————————————

If it’s still the previous error, congratulations. I’ve been confused for half an hour. Hahaha……..

Why?The reason is that with Shadu software, I don’t need to repeat what sahdu software is. I know everything: the inner corners of a circle and. Because it is on, it causes the software to delete the PYD file during pandas installation

Solution 3: turn off the kill software, and then try again according to solution 2. It should be successful
if you still can’t, you can only be a freak. Ha ha ha. Just kidding, you can write private letters and step on the pit together!

Pandas read_csv pandas.errors.ParserError: Error tokenizing data

What you will learn?
pandas read_csv escape commas and double qoutes
Prepare datas

# test.csv or test.txt
"1","123","4","\"data\""
"test","123","4","if(\"data\" = \"<test>\", (10*24))"

Wrong-way

import pandas as pd

datas = pd.read_csv('test.txt', header=None, skip_blank_lines=True)

You got

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 4 fields in line 2, saw 5

Right way

import pandas as pd

datas = pd.read_csv('test.txt', header=None, skip_blank_lines=True, escapechar='\\')

Digression

Many people on the Internet encounter this problem and add a parameter: error_bad_Lines = false (tested, the second row will be lost for the above data). If the amount of data is not large, check the method of the specified row: cat – N filename | head – N end_line_no| tail -n +start_line_no

Pandas Error: ValueError: setting an array element with a sequence.

Pandas apply returns multiple columns

Originally, I wanted to process the dataframe line by line through NP. Vectorize() and return several new fields. An error valueerror: setting an array element with a sequence

def test():
    arr = np.random.randn(4,4)
    cols = ['a', 'b', 'c']
    df = pd.DataFrame(data=arr,columns=['e','f','g','h'])
    def func(a,b,c):
        output1 = a+1
        output2 = b*2
        output3 = c-4
        return pd.Series([output1,output2,output3])
    vfunc = np.vectorize(func)
    df[cols] = vfunc(df['e'],df['f'],df['g'])
    print(df)
test()

The reason for the error is that the assigned DF [cols] is inconsistent with the dimension returned by vffunc, and the shape between the returned data frame and the result does not match. Use apply to solve it, and the parameter result_ Type = “expand” means that the result will be converted into columns, and each returned value will be used as the value in the column of result dataframe. In apply (func), the number of results returned by func should be the same as the number of col columns in DF [col]

def test():
    arr = np.random.randn(4,4)
    cols = ['a', 'b', 'c']
    df = pd.DataFrame(data=arr,columns=['e','f','g','h'])
    def func(row):
        a,b,c = row['e'],row['f'],row['g']
        output1 = a+1
        output2 = b*2
        output3 = c-4
        return output1,output2,output3
    df[cols] = df.apply(func,axis=1, result_type="expand")
    print(df)
test()

output

          e         f         g         h         a         b         c
0  0.493280 -0.092513 -3.014135 -0.361842  1.493280 -0.185027 -7.014135
1  0.300695 -0.745392  0.591653 -1.752471  1.300695 -1.490785 -3.408347
2 -0.033944 -1.556307 -0.359979  1.808213  0.966056 -3.112615 -4.359979
3  0.701741 -0.272337  0.041114  0.150049  1.701741 -0.544674 -3.958886

For a single column

df['id'] 

And

ID = ['id']
df[ID]

The results obtained are different. The former is [1,2,3,4], and the latter is [[1], [2], [3], [4]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html

AttributeError: DatetimeProperties object has no attribute

1.Question

AttributeError: ‘DatetimeProperties’ object has no attribute ‘weekday_ name’

Simple test, run the following code:

import pandas as pd

# Create dates
dates = pd.Series(pd.date_range("7/26/2021", periods=3, freq="D"))
# Check the day of the week
print(dates.dt.weekday_name)
# Show only values
print(dates.dt.weekday)

2.Solution

weekday_ Change name to day_ name()

import pandas as pd

# Create dates
dates = pd.Series(pd.date_range("7/26/2021", periods=3, freq="D"))
# Check the day of the week
print(dates.dt.day_name())
# Show only values
print(dates.dt.weekday)

For example:

Type