You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The available validation methods lack checks for (left-/right-)totality. I am frequently encountering cases where I need to manually check that eg. a one-to-one merge also finds a match match in the right DF for every row in the left DF or vice versa.
Feature Description
Add the following to one_to_one, one_to_many and many_to_one merge validations:
left_total ... Each row in the left DataFrame is matched to (at least) one row in the right DataFrame
right_total ... Each row in the right DataFrame is matched to (at least) one row in the left DataFrame
total ... Both left_total and right_total must hold
A combination of join relation and totality constraint should be possible by combining with a +: one_to_one+left_total
Alternative Solutions
Currently, doing an outer join and checking for NaN values in the "foreign" columns works to find unmerged rows. However, this will fail if there are already NaN values in the initial DataFrames.
Additional Context
No response
The text was updated successfully, but these errors were encountered:
To maybe add a common use case. Here the goal is to add the biological domain to the favorite animal of certain people:
importpandasaspd# Create the first DataFrame with person names and favorite animalsdf1_data= {
'Person': ['John', 'Emma', 'Alex','Darleen'],
'Animal': ['Dog', 'Spider', 'Snake','Cat']
}
df1=pd.DataFrame(df1_data)
# Create the second DataFrame with mapping of animals to biological classdf2_data= {
'Animal': ['Dog', 'Snake', 'Cat'],
'Biological_Class': ['Mammal', 'Reptile', 'Mammal']
}
df2=pd.DataFrame(df2_data)
# Merge the DataFrames on the 'Animal' columnmerged_df=pd.merge(
df1,
df2,
on='Animal',
validate='m:1'
)
The merged_df will lack the favorite animal of Emma, as 'Spider' has no class defined in df2. With the proposed feature validate could be set to m:1+left_total. This would raise an error as not all keys from the left df1 are contained in the right df2.
Feature Type
Adding new functionality to pandas
Changing existing functionality in pandas
Removing existing functionality in pandas
Problem Description
The available validation methods lack checks for (left-/right-)totality. I am frequently encountering cases where I need to manually check that eg. a one-to-one merge also finds a match match in the right DF for every row in the left DF or vice versa.
Feature Description
Add the following to
one_to_one
,one_to_many
andmany_to_one
merge validations:left_total
... Each row in the left DataFrame is matched to (at least) one row in the right DataFrameright_total
... Each row in the right DataFrame is matched to (at least) one row in the left DataFrametotal
... Bothleft_total
andright_total
must holdA combination of join relation and totality constraint should be possible by combining with a
+
:one_to_one+left_total
Alternative Solutions
Currently, doing an outer join and checking for
NaN
values in the "foreign" columns works to find unmerged rows. However, this will fail if there are alreadyNaN
values in the initial DataFrames.Additional Context
No response
The text was updated successfully, but these errors were encountered: