Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: Assignment of pyarrow arrays yields unexpected dtypes #56994

Open
3 tasks done
WillAyd opened this issue Jan 21, 2024 · 2 comments · May be fixed by #58601
Open
3 tasks done

BUG: Assignment of pyarrow arrays yields unexpected dtypes #56994

WillAyd opened this issue Jan 21, 2024 · 2 comments · May be fixed by #58601
Assignees
Labels
Arrow pyarrow functionality Bug Indexing Related to indexing on series/frames, not to indexes themselves

Comments

@WillAyd
Copy link
Member

WillAyd commented Jan 21, 2024

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import datetime
import pandas as pd
import pyarrow as pa

df = pd.DataFrame([[42]], columns=["col"])
df["int16"] = pa.array([16], type=pa.int16())
df["date"] = pa.array([datetime.date(2024, 1, 1)], type=pa.date32())
df["string"] = pa.array(["foo"], pa.string())

Issue Description

>>> df.dtypes
col               int64
int16             int16
date      datetime64[s]
string           object
dtype: object

I am surprised that the pyarrow type is not maintained during assignment

Expected Behavior

>>> df.dtypes
col               int64
int16             int16[pyarrow]
date              date32[pyarrow]
string            string[pyarrow]
dtype: object

Installed Versions

on main

@WillAyd WillAyd added Bug Needs Triage Issue that has not been reviewed by a pandas team member Indexing Related to indexing on series/frames, not to indexes themselves Arrow pyarrow functionality and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 21, 2024
@WillAyd WillAyd changed the title BUG: Assignment of pyarrow string array yields unexpected dtypes BUG: Assignment of pyarrow arrays yields unexpected dtypes Jan 21, 2024
@droussea2001
Copy link
Contributor

take

@droussea2001
Copy link
Contributor

Hi @WillAyd: I propose in PR #58601 that during a column assignment in a DataFrame, sanitize_array is called with a dtype equals to ArrowDtype(value.type) if the column value is a pa.lib.Array (else the standard behaviour is kept)

Would it be acceptable ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Arrow pyarrow functionality Bug Indexing Related to indexing on series/frames, not to indexes themselves
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants