Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error_bad_lines not work, but how do I read data successfully? #57257

Closed
zysNLP opened this issue Feb 5, 2024 · 4 comments
Closed

error_bad_lines not work, but how do I read data successfully? #57257

zysNLP opened this issue Feb 5, 2024 · 4 comments
Labels
Closing Candidate May be closeable, needs more eyeballs IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue Usage Question

Comments

@zysNLP
Copy link

zysNLP commented Feb 5, 2024

Since I use panda as version=2.2 I found "error_bad_lines" para was dropped, but I use pd.read_csv("unknown.csv"), Got an Error:

Traceback (most recent call last):
File "D:\work\email_reply\data_process.py", line 11, in
df = pd.read_csv('./data/data_0101.csv', on_bad_lines="warn")
File "D:\miniconda3\envs\py310\lib\site-packages\pandas\io\parsers\readers.py", line 1024, in read_csv
return _read(filepath_or_buffer, kwds)
File "D:\miniconda3\envs\py310\lib\site-packages\pandas\io\parsers\readers.py", line 624, in _read
return parser.read(nrows)
File "D:\miniconda3\envs\py310\lib\site-packages\pandas\io\parsers\readers.py", line 1921, in read
) = self._engine.read( # type: ignore[attr-defined]
File "D:\miniconda3\envs\py310\lib\site-packages\pandas\io\parsers\c_parser_wrapper.py", line 234, in read
chunks = self._reader.read_low_memory(nrows)
File "parsers.pyx", line 838, in pandas._libs.parsers.TextReader.read_low_memory
File "parsers.pyx", line 905, in pandas._libs.parsers.TextReader._read_rows
File "parsers.pyx", line 874, in pandas._libs.parsers.TextReader._tokenize_rows
File "parsers.pyx", line 891, in pandas._libs.parsers.TextReader._check_tokenize_status
File "parsers.pyx", line 2061, in pandas._libs.parsers.raise_parser_error
pandas.errors.ParserError: Error tokenizing data. C error: Buffer overflow caught - possible malformed input file.

So how could I read this data sucessfully now ? If there is a better way to deal with this?

@lithomas1
Copy link
Member

You can look into the on_bad_lines parameter.

@lithomas1 lithomas1 added Usage Question IO CSV read_csv, to_csv Closing Candidate May be closeable, needs more eyeballs labels Feb 5, 2024
@cLOWNgOD
Copy link

I'm seeing the identical error [nearly to the same line numbers] on pip 2.2.0. I tried updating to 2.2.1 before reading this (obviously didn't help). Has anyone found a solution??

@phofl
Copy link
Member

phofl commented Mar 18, 2024

Can you provide a reproducible example?

@phofl phofl added the Needs Info Clarification about behavior needed to assess issue label Mar 18, 2024
@mroeschke
Copy link
Member

Closing as a usage question and no reproducible example to act on

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Closing Candidate May be closeable, needs more eyeballs IO CSV read_csv, to_csv Needs Info Clarification about behavior needed to assess issue Usage Question
Projects
None yet
Development

No branches or pull requests

5 participants