Releases · modin-project/modin

Stability and Bugfixes
- FIX-#6968: Align API with pandas (#6969)
- FIX-#7302: Pin numpy<2 (072453b)
New Features
- FEAT-#7265: Automatic publication of Modin wheel to PyPI (#7262)

Contributors

@anmyachev
@dchigarev
@sfc-gh-dpetersohn

Contributors

anmyachev, dchigarev, and sfc-gh-dpetersohn

Assets 2

15 May 10:28

anmyachev

0.30.0

51b0a78

Modin 0.30.0

This release introduces support for DataFrame API standard, a distributed implementation for right merge/join,
more efficient implementation of internal operators, which gives a performance boost to almost all distributed Modin functions,
improved compatibility with pandas on pyarrow backend, type hints for pandas API to improve UX.

Key Features and Updates Since 0.29.0

Stability and Bugfixes
- FIX-#0000: Fix badge in README.md (#7213)
- FIX-#0000: Make merge tests more stable by sorting results (#7266)
- FIX-#6967: Remove read_pickle_distributed/to_pickle_distributed functions as deprecated (#7258)
- FIX-#7093: Make sure idxmax and idxmin can work with string columns (#7193)
- FIX-#7102: Remove enable_api_only mode in modin logging (#7194)
- FIX-#7103: Move lower-level functionality logging to debug (#7184)
- FIX-#7143: Constructing a DataFrame from a Modin Series with tuple name should produce MultiIndex columns (#7214)
- FIX-#7185: Add extra check for some config classes (#7189)
- FIX-#7201: Update docs on how to enable Modin logs for high-level API and low-level API (#7209)
- FIX-#7206: Make sure df.melt handle duplicate value_vars correctly (#7208)
- FIX-#7219: Pin dataframe-api-compat>=0.2.7 (#7220)
- FIX-#7221: Don't use use_legacy_dataset=False for ParquetDataset (#7222)
- FIX-#7224: Importing modin.pandas.api.extensions overwrites re-export of pandas.api submodules (#7225)
- FIX-#7233: Display property name in default_to_pandas error messages (#7269)
- FIX-#7234: Deprecate HDK engine (#7235)
- FIX-#7238: Fix docstring inheritance for cached_property and use it (#7239)
- FIX-#7240: Allow doc_checker.py works with functools.cached_property (#7241)
- FIX-#7246: Pin pyarrow>=10.0.1 as pandas 2.2.* does (#7247)
- FIX-#7248: Make sure _validate_dtypes_sum_prod_mean works correctly with datetime types (#7237)
- FIX-#7250: Revert "PERF-#6666: Avoid internal reset_index for left merge" (#7251)
Performance enhancements
- PERF-#7227: Call modin_frame.combine() for merge and join only when necessary (#7228)
- PERF-#7230: Don't preserve bad partition for merge (#7229)
Refactor Codebase
- REFACTOR-#7242: Add type hints for modin/core/dataframe/algebra/ (#7243)
- REFACTOR-#7260: Use extract_dtype internal function in more places (#7261)
Update testing suite
- TEST-#7049: Add some sanity tests with pyarrow-backed pandas dataframes (#7199)
- TEST-#7191: Fix ASV after changing default branch (#7190)
Documentation improvements
- DOCS-#0000: Fix a typo with MODIN_CPUS number (#7198)
- DOCS-#0000: Supplement Optimization Notes with a link to configs (#7197)
- DOCS-#7217: Update docs as to when Modin operators work best (#7218)
- DOCS-#7255: Update docs as to from_* functions (#7256)
New Features
- FEAT-#5394: Reduce amount of remote calls for Map operator (#7136)
- FEAT-#5394: Reduce amount of remote calls for TreeReduce and GroupByReduce operators (#7245)
- FEAT-#6492: Add from_map feature to create dataframe (#7215)
- FEAT-#6498: Make Fold operator more flexible (#7257)
- FEAT-#6808: Implement __arrow_array__ for Series (#7200)
- FEAT-#6890: Modin implementation of DataFrame API standard (#7216)
- FEAT-#7139: Use ray-core instead of ray-default (#6955)
- FEAT-#7187: Change master branch to main (#7188)
- FEAT-#7202: Use custom resources for Ray (#7205)
- FEAT-#7203: Make sure Modin works correctly with pandas, which uses pyarrow as a backend (#7204)
- FEAT-#7207: Add the ability to assign a df to a columns selection without d2p (#7210)
- FEAT-#7252: Add type hints for base.py (#7253)
- FEAT-#7254: Support right merge/join (#7226)

Contributors

@Retribution98
@YarShev
@anmyachev
@arunjose696
@noloerino
@sfc-gh-jkew

Contributors

noloerino, YarShev, and 4 other contributors

Assets 2

15 Apr 18:05

anmyachev

0.29.0

6d64e08

Modin 0.29.0

This release introduces modin.pandas.testing and modin.pandas.arrays modules, faster implementation (range-partitioning) for
pivot_table, unique, drop_duplicates, nunique, df.resample functions, new functions to interact with Dask: to/from_dask,
distributed implementation for Series.case_when, optimization for astype function with scalar dtype.

Key Features and Updates Since 0.28.0

Stability and Bugfixes
- FIX-#6227: Make sure Series.unique() with pyarrow dtype returns ArrowExtensionArray (#7042)
- FIX-#6793: Use pandas_dtype instead of np.dtype for some more places in Modin code (#6794)
- FIX-#7039: Pass scalar dtype as is to astype query compiler (#7152)
- FIX-#7051: Update exception message for astype function (#7052)
- FIX-#7054: Update exception message for shift function (#7055)
- FIX-#7056: Update exception message for iloc/loc functions (#7057)
- FIX-#7058: Update exception message for insert function (#7059)
- FIX-#7060: Fix pivot when index or columns are of Index type (#7061)
- FIX-#7062: Update exception message for aggregate function (#7063)
- FIX-#7072: Replace MaterializationHook with the materialized object on serialization (#7075)
- FIX-#7088: Make sure rank raises No axis named None... exception (#7089)
- FIX-#7115: Exclude Ray 2.10.0 from deps installation (#7116)
- FIX-#7135: Fix appending a new row (#7172)
- FIX-#7153: Fix Series.corr with method != pearson (#7158)
- FIX-#7157: Make sure quantile function works with numeric_only=True (#7160)
- FIX-#7170: Don't use MinPartitionSize configuration variable in remote context (#7177)
Performance enhancements
- PERF-#5296: Partition parquet file if it has too few row groups (#7016)
- PERF-#7068: Provide shape_hint="column" for some more operations with Series (#7069)
- PERF-#7123: Preserve shape_hint for dropna (#7124)
- PERF-#7130: Preserve partition lengths in apply_full_axis with keep_partitioning=True (#7131)
- PERF-#7132: Preserve partition lengths in apply_full_axis with keep_partitioning=False (#7133)
- PERF-#7150: Reduce peak memory consumption (#7149)
Refactor Codebase
- REFACTOR-#3257: Move logging and caching to the gen_data internal function (#7046)
- REFACTOR-#7105: Deprecate cfg.RangePartitioningGroupby (#7161)
- REFACTOR-#7106: Rename from/to_ray_dataset to from/to_ray (#7107)
- REFACTOR-#7109: Remove the outdated aws_example.yaml file (#7110)
Update testing suite
- TEST-#3622: Centralize tests in Modin (#7137)
- TEST-#6016: Make sure eval_general doesn't expect exceptions by default (#6954)
- TEST-#7064: Explicitly check for exceptions in test_groupby.py (#7065)
- TEST-#7066: Explicitly check for exceptions in test_io.py (#7067)
- TEST-#7073: Explicitly check for exceptions in test_default.py (#7074)
- TEST-#7076: Explicitly check for exceptions in test_map_metadata.py (#7077)
- TEST-#7082: Explicitly check for exceptions in test_series.py (#7083)
- TEST-#7084: Explicitly check for exceptions in test_indexing.py (#7085)
- TEST-#7086: Explicitly check for exceptions in test_reduce.py (#7087)
- TEST-#7094: Rename raising_exceptions argument of eval_general testing function (#7095)
- TEST-#7125: Explicitly install modin in CI tests (#7126)
- TEST-#7165: Add codecov token to fix CI on master (#7175)
- TEST-#7166: Fix HDF tests in CI (#7167)
- TEST-#7173: Update github actions (#7168)
Documentation improvements
- DOCS-#2434: Clarify the use of --signoff option (#7145)
- DOCS-#6987: Rework range-partitioning docs (#7169)
- DOCS-#7144: Add information about logging from user defined function (#7155)
New Features
- FEAT-#4527: Add Modin logging to AxisPartition and BlockPartition classes (#7079)
- FEAT-#6783: Implement modin.pandas.testing module (#7045)
- FEAT-#6929: Implement Series.case_when in a distributed way (#6972)
- FEAT-#7004: Use generators when returning from _deploy_ray_func remote function. (#7005)
- FEAT-#7021: Implement to/from_dask functions (#7022)
- FEAT-#7047: Add range-partitioning implementation for .pivot_table() (#7048)
- FEAT-#7070: Add modin.pandas.arrays module (#7071)
- FEAT-#7078: Add modin_layer names to classes that inherit ClassLogger (#7099)
- FEAT-#7090: Add range-partitioning implementation for .unique() and .drop_duplicates() (#7091)
- FEAT-#7100: Add range-partitioning impl for nunique() (#7101)
- FEAT-#7102: Deprecate enable_api_only mode in modin logging (#7114)
- FEAT-#7111: Implemented @remote_function decorator with cache (#7112)
- FEAT-#7117: Support building range-partitioning from an index level (#7120)
- FEAT-#7118: Add range-partitioning impl for df.resample() (#7140)
- FEAT-#7128: Update minimal supported version of Ray up to 2.1.0 (#7129)
- FEAT-#7141: Add an ability to use config variables with a context manager (#7142)
- FEAT-#7146: Use BaseQueryCompiler, BasePandasDataset, DataFrame or Series type hints at a high level (#7147)
- FEAT-#7156: Add type hints for Series (#7154)
- FEAT-#7178: Add type hints for DataFrame (#7179)
- FEAT-#7180: Add type hints for modin.pandas.[functions] (#7181)

Contributors

@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-mvashishtha

Contributors

AndreyPavlenko, YarShev, and 5 other contributors

Assets 2

12 Apr 08:46

YarShev

0.28.2

caed912

Modin 0.28.2

This release reverts the pandas requirement from
2.2.1 to >=2.2,<2.3

Key Features and Updates Since 0.28.1

New Features
- FEAT-#7162: Revert pandas version to >=2.2,<2.3 (67e2541)

Contributors

@sfc-gh-mvashishtha

Contributors

sfc-gh-mvashishtha

Assets 2

09 Apr 23:04

sfc-gh-dpetersohn

0.28.1

eac21a8

Modin 0.28.1

This release pins pandas to 2.2.1. This pin will be removed
in a subsequent release.

Key Features and Updates Since 0.28.0
-------------------------------------
* New Features
  * FEAT-#7162: Pin pandas to 2.2.1 (87d147f)

Contributors
------------
@sfc-gh-dpetersohn

Assets 2

07 Mar 18:35

anmyachev

0.28.0

14452a8

Modin 0.28.0

This release introduces modin.pandas.api.extensions module, faster implementations for merge and
groupby.rolling(by default) functions, and new functions to work with Ray Dataset: to/from_ray_dataset.
It also includes some other new features, performance optimizations and bug fixes.

Key Features and Updates Since 0.27.0

Stability and Bugfixes
- FIX-#6935: Fix merge when right operand is an empty dataframe (#6941)
- FIX-#6936: Fix read_parquet when dataset is created with to_parquet and index=False (#6937)
- FIX-#6944: Apply isort formatting for scripts from tutorials (#6945)
- FIX-#6946: Remove needs: [lint-black-isort, ...] (#6947)
- FIX-#6948: Fix groupby when Modin dataframe has several column partitions (#6951)
- FIX-#6952: Use render_as_string to get sqlalchemy engine url (#6953)
- FIX-#6968: Align API with pandas (#6969)
- FIX-#6974: Always use actual pandas version in test_all_urls_exist (#6975)
- FIX-#6982: Updating data in notebooks from yellow taxi to green taxi dataset (#6993)
- FIX-#6984: Ensure the results of inplace operations materialize (for tests) (#6985)
Performance enhancements
- PERF-#6976: Do not trigger unnecessary computations on ._propagate_index_objs() (#6977)
- PERF-#6979: Do not trigger ._copartition() for identical indices on binary operations (#6980)
Refactor Codebase
- REFACTOR-#6856: Rename read_pickle_distributed/to_pickle_distributed to read_pickle_glob/to_pickle_glob (#6957)
- REFACTOR-#6939: Make modin.pandas.DataFrame._to_pandas a public method (#6940)
- REFACTOR-#6958: Remove DataFrame.to_pickle_distributed in favour of DataFrame.modin.to_pickle_distributed (#6959)
- REFACTOR-#7002: Get more information about exceptions from eval_general utility (#7003)
- REFACTOR-#7008: Remove check_exception_type argument of eval_general function (#7009)
- REFACTOR-#7013: Move to_pandas and to_ray_dataset into modin namespace (#7014)
- REFACTOR-#7017: Align to_hdf and hist signatures to pandas (#7018)
Update testing suite
- TEST-#6932: Don't use deprecated pandas._testing.makeStringIndex (#6933)
- TEST-#6994: Update tests in test_series.py (#6995)
- TEST-#6996: Update tests in test_io.py (#6997)
Documentation improvements
- DOCS-#6871: Update Modin on Ray cluster tutorial (#6872)
- DOCS-#6949: Create Modin on Dask cluster tutorial (#6950)
- DOCS-#6962: Remove links to https://discuss.modin.org (#6963)
New Features
- FEAT-#3044: Create Extensions Module in Modin (#6961)
- FEAT-#4622: Unify data type of log_level in logging module (#6992)
- FEAT-#6913: Support sqlalchemy connectables in read_sql by getting connection url (#6956)
- FEAT-#6934: Support include_groups=False parameter in groupby.apply() (#6938)
- FEAT-#6942: Enable range-partitioning impl for groupby().rolling() by default (#6943)
- FEAT-#6965: Implement .merge() using range-partitioning implementation (#6966)
- FEAT-#6970: Implement to/from_ray_dataset functions (#6971)
- FEAT-#6983: Add Pluggable Documentation Module Support (#6986)
- FEAT-#7001: Do not force materialization in MetaList.__getitem__() (#7006)

Contributors

@AndreyPavlenko
@Retribution98
@YarShev
@anmyachev
@arunjose696
@dchigarev
@sfc-gh-dpetersohn
@tochigiv

Contributors

AndreyPavlenko, tochigiv, and 6 other contributors

Assets 2

14 Feb 14:00

anmyachev

0.27.0

d54dcfd

Modin 0.27.0

This release updates pandas to 2.2, introduces lazy execution mode on Ray, new functions that support glob
syntax and speeds up several more groupby cases. It also includes some other new features, performance
optimizations and many bug fixes.

Key Features and Updates Since 0.26.0

Stability and Bugfixes
- FIX-#2405: Make sure named aggregation work for Series objects (#6892)
- FIX-#5925: Put a sorting-hack into groupby tests to hide #6875 bug (#6896)
- FIX-#6830: Pass AWS related env vars to mpiexec (#6867)
- FIX-#6840: Call tolist function in DtypesDescriptor._merge_dtypes (#6844)
- FIX-#6855: Make sure read_parquet works with integer columns for pyarrow engine (#6874)
- FIX-#6879: Convert the right DF to single partition before broadcasting in query_compiler.merge (#6880)
- FIX-#6881: Make sure astype works correctly with int32 and float32 dtypes (#6884)
- FIX-#6897: Preprocess kernel function that aligns columns in groupby (#6898)
- FIX-#6897: Revert unidist specific fix for groupby (#6902)
- FIX-#6899: Avoid sending lazy categorical proxies to workers (#6900)
- FIX-#6904: Align levels of partially known dtypes with MultiIndex labels (#6905)
- FIX-#6911: Remove unidist specific workaround in .from_pandas() (#6912)
- FIX-#6916: Unpin pydantic dependency (#6917)
- FIX-#6924: HDK: Use JoinNode instead of MaskNode for non-range row_position (#6926)
Performance enhancements
- PERF-#6876: Skip the masking stage on iloc where beneficial (#6878)
- PERF-#6922: Set DaskThreadsPerWorker to 1 (#6923)
Refactor Codebase
- REFACTOR-#6293: Corrected missmatch to mismatch in ErrorMessage.missmatch_with_pandas method (#6901)
- REFACTOR-#6812: Remove PyarrowOnRay execution in favour of pyarrow-backed pandas dataframes (#6848)
- REFACTOR-#6833: Remove SocksProxy, DoLogRpyc, DoTraceRpyc outdated classes (#6834)
- REFACTOR-#6845: Fix import issues found by CodeQL (#6837)
- REFACTOR-#6852: Remove OrderedDict in favor of builtin dict (#6853)
- REFACTOR-#6858: Rename _get_dimensions and change arguments (#6859)
- REFACTOR-#6889: Define __all__ in modin.config.__init__.py (#6886)
- REFACTOR-#6903: Remove duplicated definitions of create_test_series (#6910)
- REFACTOR-#6918: Docstring and type hints fixes (#6925)
Update testing suite
- TEST-#6708: Create test files using tmp_path fixture (#6709)
- TEST-#6777: Make to_csv tests on Unidist more stable (for test-all-unidist CI job) (#6851)
- TEST-#6830: Use local s3 server instead of public s3 buckets (#6863)
- TEST-#6846: Skip unstable Unidist to_csv tests (#6847)
- TEST-#6868: Remove tests for gs remote protocol since we rely on fsspec (#6882)
- TEST-#6885: Switch to black>=24.1.0 (#6887)
- TEST-#6893: Added support for pytest 8.0.0 (#6894)
- TEST-#6920: Remove testing for Ray client (#6921)
Documentation improvements
- DOCS-#6860: Add an ecosystem page to the docs (#6861)
New Features
- FEAT-#3450: Implement read_json_glob and to_json_glob (#6873)
- FEAT-#5809: New implementation of the Ray lazy execution queue (#6731)
- FEAT-#5925: Enable grouping on categoricals with range-partitioning impl (#6862)
- FEAT-#6382: Execute bitwise NOT (~) operations on HDK (#6383)
- FEAT-#6398: Improved performance of list-like objects insertion into HDK DataFrames (#6412)
- FEAT-#6830: Remove public s3 bucket reference (#6829)
- FEAT-#6831: Implement read_parquet_glob and to_parquet_glob (#6854)
- FEAT-#6832: Implement read_xml_glob, to_xml_glob (#6930)
- FEAT-#6835: Do not put binary functions to the Ray storage multiple times (#6836)
- FEAT-#6838: Prefer lazy execution for binary operations with scalar (#6839)
- FEAT-#6841: Fixing ray anti pattern with .length() and .width() being called in a loop (#6842)
- FEAT-#6849: Removing to_pandas call in merge and join functions (#6850)
- FEAT-#6883: Support grouping on a Series with range-partitioning impl (#6888)
- FEAT-#6906: Update to pandas 2.2.* (#6907)
- FEAT-#6908: Remove the warning regarding engine initialization (#6909)
- FEAT-#6914: Add a config for setting a number of threads per Dask worker (#6915)
- FEAT-#6918: Add auto mode to the lazy execution. (#6919)

Contributors

@AndreyPavlenko
@YarShev
@anmyachev
@arunjose696
@dchigarev
@leshikus
@vedant

Contributors

vedant, AndreyPavlenko, and 5 other contributors

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Key Features and Updates Since 0.30.0

Contributors

Contributors

Key Features and Updates Since 0.29.0

Contributors

Contributors

Key Features and Updates Since 0.28.2

Contributors

Contributors

Key Features and Updates Since 0.27.0

Contributors

Contributors

Key Features and Updates Since 0.29.0

Contributors

Contributors

Key Features and Updates Since 0.28.0

Contributors

Contributors

Key Features and Updates Since 0.28.1

Contributors

Contributors

Key Features and Updates Since 0.27.0

Contributors

Contributors

Key Features and Updates Since 0.26.0

Contributors

Contributors

Releases: modin-project/modin

Modin 0.30.1

Key Features and Updates Since 0.30.0

Contributors

Contributors

Modin 0.29.1

Key Features and Updates Since 0.29.0

Contributors

Contributors

Modin 0.28.3

Key Features and Updates Since 0.28.2

Contributors

Contributors

Modin 0.27.1

Key Features and Updates Since 0.27.0

Contributors

Contributors

Modin 0.30.0

Key Features and Updates Since 0.29.0

Contributors

Contributors

Modin 0.29.0

Key Features and Updates Since 0.28.0

Contributors

Contributors

Modin 0.28.2

Key Features and Updates Since 0.28.1

Contributors

Contributors

Modin 0.28.1

Modin 0.28.0

Key Features and Updates Since 0.27.0

Contributors

Contributors

Modin 0.27.0

Key Features and Updates Since 0.26.0

Contributors

Contributors