Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relplot refline error in situations when using dataframes w/ duplicate indicies #3690

Open
zacharygibbs opened this issue May 10, 2024 · 0 comments

Comments

@zacharygibbs
Copy link

zacharygibbs commented May 10, 2024

This error occurs when using relplot then adding a refline. From investigation, it appears that replot is duplicating the data. In this example, when i concatentate my 3 dataframes, I did not use 'ignore_index'; therefore there are duplicate indicies in the input data.

The problem is solved when I use ignore_index, or feed the data in with as df.reset_index(), however, the error message was not useful in discovering this! After tracking down the relplot source code, it appears the problem is related to the grid_data merging at the end of the function. I was able to solve this by skipping the "merge" if all of the columns are already present. I have submitted a pull request #3692 .

the error was: ValueError: operands could not be broadcast together with shapes (45000,) (15000,)
(the input data was 15000 rows long with 3 different "hue" variables)

Reproducible example:

import pandas as pd
import numpy as np

import seaborn as sns
import matplotlib.pyplot as plt

n_items = 5000
n_floats = 5
n_categorical = 3

df1 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df1 = df1.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df2 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df2 = df2.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df3 = pd.DataFrame(
    np.random.random((n_items, n_floats)),
    columns=[f'float{i}' for i in range(n_floats)]
)
df3 = df3.assign( **{f'categorical{i}': np.random.randint(0, 15, n_items) for i in range(n_categorical)})

df = pd.concat([df1.assign(origin=1), df2.assign(origin=2), df3.assign(origin=3)])
print(df)

fg=sns.relplot(data=df, x='float1', y='float2', hue='origin', row='categorical1')
print('main', fg.data.shape)
fg.refline(y=0.5)



plt.show()
@zacharygibbs zacharygibbs changed the title Relplot refline error in situations when using concatenated dataframes Relplot refline error in situations when using dataframes w/ duplicate indicies May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant