Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: possible inconsistency between inplace=True and inplace=False in DataFrame.where/mask #57083

Open
3 tasks done
yuanx749 opened this issue Jan 26, 2024 · 4 comments · May be fixed by #58576
Open
3 tasks done

BUG: possible inconsistency between inplace=True and inplace=False in DataFrame.where/mask #57083

yuanx749 opened this issue Jan 26, 2024 · 4 comments · May be fixed by #58576
Labels
Bug Conditionals E.g. where, mask, case_when inplace Relating to inplace parameter or equivalent

Comments

@yuanx749
Copy link
Contributor

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this bug exists on the latest version of pandas.

  • I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

# examples from docstrings, inplace=False
s = pd.Series(range(5))
t = pd.Series([True, False])
print(s.where(t, 99))
# 0     0
# 1    99
# 2    99
# 3    99
# 4    99
# dtype: int64
print(s.mask(t, 99))
# 0    99
# 1     1
# 2    99
# 3    99
# 4    99
# dtype: int64

# inplace=True
s = pd.Series(range(5))
s.where(t, 99, inplace=True)
print(s)
# 0     0
# 1    99
# 2     2
# 3     3
# 4     4
# dtype: int64
s = pd.Series(range(5))
s.mask(t, 99, inplace=True)
print(s)
# 0    99
# 1     1
# 2     2
# 3     3
# 4     4
# dtype: int64

Issue Description

The first two examples are from the docstrings of DataFrame.where and DataFrame.mask. They agree with the documentations regarding how to fill the values of cond on misaligned index positions.
However, when inplace=True, the results are different from inplace=False for both where and mask.

Expected Behavior

I would expect inplace parameter does not affect the results. But I notice the first line of code below in the source code of where. So I wonder is this behaviour expected?
Thank you in advance.

pandas/pandas/core/generic.py

Lines 10665 to 10674 in d928a5c

# make sure we are boolean
fill_value = bool(inplace)
with warnings.catch_warnings():
warnings.filterwarnings(
"ignore",
"Downcasting object dtype arrays",
category=FutureWarning,
)
cond = cond.fillna(fill_value)
cond = cond.infer_objects(copy=False)

Installed Versions

INSTALLED VERSIONS

commit : 4c520e3
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 5.10.16.3-microsoft-standard-WSL2
Version : #1 SMP Fri Apr 2 22:23:49 UTC 2021
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C.UTF-8
LOCALE : en_US.UTF-8

pandas : 2.2.0dev0+743.g4c520e35f9
numpy : 1.26.2
pytz : 2023.3.post1
dateutil : 2.8.2
setuptools : 68.2.2
pip : 23.3.1
Cython : 3.0.5
pytest : 7.4.3
hypothesis : 6.91.0
sphinx : 7.2.6
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.2
IPython : 8.18.1
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.2
numba : 0.58.1
numexpr : 2.8.7
odfpy : None
openpyxl : 3.1.2
pandas_gbq : None
pyarrow : 14.0.1
pyreadstat : None
python-calamine : None
pyxlsb : 1.0.10
s3fs : None
scipy : None
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : 2.0.1
zstandard : None
tzdata : 2023.3
qtpy : None
pyqt5 : None

@yuanx749 yuanx749 added Bug Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2024
@rhshadrach
Copy link
Member

Thanks for the report. Agreed this looks suspect, but the code seems quite deliberate. I haven't been able to track down where this behavior was introduced, I'm thinking the origin should be better understood.

Note that these methods will retain inplace under PDEP-8.

@rhshadrach rhshadrach added inplace Relating to inplace parameter or equivalent Conditionals E.g. where, mask, case_when Needs Discussion Requires discussion from core team before further action and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 26, 2024
@mitlabence
Copy link
Contributor

The relevant commit seems to be this with the corresponding comment. I believe the corresponding Python version is 3.1-3.2, how would one go about testing with such an old release?

@rhshadrach
Copy link
Member

Thanks for finding this! I don't think we need to test - understanding comes from the discussion around the changes made.

It does seem to me the comment you found has things backwards, even according to the docstring at the time:

Return a DataFrame with the same shape as self and whose corresponding entries are from self where cond is True and otherwise are from other.

I think this is easy to mix up (especially since the semantics are somewhat different from np.where).

@rhshadrach rhshadrach removed the Needs Discussion Requires discussion from core team before further action label Apr 10, 2024
@mitlabence
Copy link
Contributor

mitlabence commented May 5, 2024

To my understanding, there is now an inconsistency between what the documentation of mask and where say about misaligned indices (replace by other, as for the inplace=False examples above) and what the bracket indexing is expected to do:

import pandas as pd
df = pd.DataFrame({"a" : [0, 1, 2, -3], "b": [0, -1, 2, 3]})
#    a  b
# 0  0  0
# 1  1 -1
# 2  2  2
# 3 -3  3
df[df[:-1] < 0] = 4
#    a  b
# 0  0  0
# 1  1  4
# 2  2  2
# 3 -3  3

This latter behavior is expected in the tests here, here and here.
It is also (obviously) syntactically similar to inplace mask.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Conditionals E.g. where, mask, case_when inplace Relating to inplace parameter or equivalent
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants