Tag Archives: # pandas

Pandas read_csv pandas.errors.ParserError: Error tokenizing data

What you will learn?
pandas read_csv escape commas and double qoutes
Prepare datas

# test.csv or test.txt
"1","123","4","\"data\""
"test","123","4","if(\"data\" = \"<test>\", (10*24))"

Wrong-way

import pandas as pd

datas = pd.read_csv('test.txt', header=None, skip_blank_lines=True)

You got

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.TextReader._tokenize_rows()

pandas\_libs\parsers.pyx in pandas._libs.parsers.raise_parser_error()

ParserError: Error tokenizing data. C error: Expected 4 fields in line 2, saw 5

Right way

import pandas as pd

datas = pd.read_csv('test.txt', header=None, skip_blank_lines=True, escapechar='\\')

Digression

Many people on the Internet encounter this problem and add a parameter: error_bad_Lines = false (tested, the second row will be lost for the above data). If the amount of data is not large, check the method of the specified row: cat – N filename | head – N end_line_no| tail -n +start_line_no