ExtensionDtype is missing hasobject #26531

mitar · 2019-05-26T15:22:56Z

Problem description

For better compatibility with numpy dtypes, hasobject could be added to ExtensionDtype. We use it to determine if columns contain Python objects or not, but it does not work with sparse column types which are extending ExtensionDtype.

Output of `pd.show_versions()`

INSTALLED VERSIONS

commit: None
python: 3.6.7.final.0
python-bits: 64
OS: Linux
OS-release: 4.18.0-20-generic
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8

pandas: 0.25.0.dev0+610.gd2beaf3c8
pytest: None
pip: 18.1
setuptools: 40.7.1
Cython: 0.29.7
numpy: 1.15.4
scipy: 1.2.0
pyarrow: 0.13.0
xarray: None
IPython: None
sphinx: None
patsy: None
dateutil: 2.7.5
pytz: 2018.9
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: None
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2019-05-26T16:58:27Z

Is this different from .kind being ‘O’?

mitar · 2019-05-26T21:52:49Z

To my understanding the difference is that hasobject also returns True if fields in structs have objects. So you could have a C struct type with fields pointing to objects. That would not be O by itself, but it still hasobject.

jorisvandenbossche · 2019-05-27T07:45:37Z

One question is: what should this return for extension arrays/dtypes that store data "natively" but box into python objects (eg when converting to a numpy array) ?
Because that might depend on the reason that you are checking hasobject

mitar · 2019-05-27T15:13:10Z

For us, the reason why we check hasobject is to know if we have to recurse the object. So to know if the value is a scalar (final) value or is it something we have to recurse when we are searching for all scalar values.

I think that probably for our own use the kind == 'O' might even be enough, if we see struct types as scalar values.

jorisvandenbossche · 2019-05-27T15:16:54Z

How does object dtype signal that the value is not scalar and needs to be further recursed?
Eg an object array of strings, the values are also "scalars" ? Or not for your use case?

mitar · 2019-05-27T15:19:23Z

They are. Sadly, that is a false positive. Ideally, Python strings would have their own dtype and that would make our life much easier. So we recurse and then discover it is a string.

jbrockmendel · 2023-07-27T17:27:25Z

Is this different from .kind being ‘O’?

FWIW PeriodDtype.kind is "O".

My gut here is that hasobject is little-used and adding it is more likely to introduce problems than fix them.

mroeschke · 2024-05-19T19:16:41Z

Looks like there not much support to add this attribute from the core team so closing

jorisvandenbossche added Dtype Conversions Unexpected or buggy dtype conversions ExtensionArray Extending pandas with custom dtypes or arrays. labels May 27, 2019

mroeschke added Enhancement and removed Dtype Conversions Unexpected or buggy dtype conversions labels Jul 10, 2021

jbrockmendel added the Closing Candidate May be closeable, needs more eyeballs label Oct 3, 2023

mroeschke closed this as completed May 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ExtensionDtype is missing hasobject #26531

ExtensionDtype is missing hasobject #26531

mitar commented May 26, 2019

INSTALLED VERSIONS

TomAugspurger commented May 26, 2019

mitar commented May 26, 2019

jorisvandenbossche commented May 27, 2019

mitar commented May 27, 2019 •

edited

jorisvandenbossche commented May 27, 2019

mitar commented May 27, 2019

jbrockmendel commented Jul 27, 2023

mroeschke commented May 19, 2024

ExtensionDtype is missing hasobject #26531

ExtensionDtype is missing hasobject #26531

Comments

mitar commented May 26, 2019

Problem description

Output of pd.show_versions()

INSTALLED VERSIONS

TomAugspurger commented May 26, 2019

mitar commented May 26, 2019

jorisvandenbossche commented May 27, 2019

mitar commented May 27, 2019 • edited

jorisvandenbossche commented May 27, 2019

mitar commented May 27, 2019

jbrockmendel commented Jul 27, 2023

mroeschke commented May 19, 2024

Output of `pd.show_versions()`

mitar commented May 27, 2019 •

edited