Releases · matthewwardrop/formulaic

21 Sep 05:28

v1.2.1

35430e0

v1.2.1 Latest

Latest

This is a minor release with only one change: fixing compatibility with the new string dtype in the upcoming release of Pandas 3. (#261)

Assets 2

14 Jul 06:44

matthewwardrop

v1.2.0

61d9bdd

v1.2.0

Breaking changes:

Formulaic now targets Python 3.9+ (dropping support for 3.7 and 3.8). (#234)
ModelSpec.required_variables (and all other .required_variables
implementations) now only include the root variable name (e.g. a.fillna ->
a), since this is expected to be used to validate that dataframes have the
expected columns. The full variable usage is still exposed via .variables
and related attributes. You can lookup the the root variable using the .root
property of each Variable instance. (#254)

New features and enhancements:

We now use narwhals for all non-pandas dataframe types supported by narwhals
(including polars dataframes and pyarrow tables). This should substantially
improve performance for these types since historically the relevant parts of
these dataframes were converted lazily to pandas objects. (#189)

Bugfixes and cleanups:

Fixed compatibility with Scipy 1.16+.
Factor instances no longer include backticks around factors evaluated using
Python so that string representations can be interpreted correctly when
parsed anew. (#256)
FactorValue types are now always unwrapped during categorical encoding due
to issues with wrapped numpy arrays.
The entire code-base was updated to use Python 3.9+ typing annotation idioms.
The docsite is now versioned (allowing historical versions to be explored).
Various documentation typo fixes.

Special thanks to @MarcoGorelli for his help in implementing Narwhals support.

Contributors

MarcoGorelli

Assets 2

20 Dec 19:43

matthewwardrop

v1.1.1

5fe4317

v1.1.1

New features and enhancements:

Formula.differentiate() is now considered stable, with
ModelMatrix.differentiate() to follow in a future release. (#236)

Bugfixes and cleanups:

Fixed a regression introduced in v1.1.0 regarding ordering of terms in a
differentiated formula. (#236)

Assets 2

16 Dec 03:41

matthewwardrop

v1.1.0

ae9e5d0

v1.1.0

This is a major feature release that was motivated in many aspects by the migration of statstmodels from patsy to formulaic. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic (which I do not believe to be the case for any external library) these are not expected to break common usage.

Breaking changes:

Formula is no longer always "structured" with special cases to handle the
case where it has no structure. Legacy shims have been added to support old
patterns, with DeprecationWarnings raised when they are used. It is not
expected to break anyone not explicitly checking whether the Formula.root is
a list instance (which formerly should have been simply assumed) [it is a now
SimpleFormula instance that acts like an ordered sequence of Term
instances].
The column names associated with categorical factors has changed. Previously,
a prefix was unconditionally added to the level in the column name like
feature[T.A], whether nor not the encoding will result in that term acting
as a contrast. Now, in keeping with patsy, we only add the prefix if the
categorical factor is encoded with reduced rank. Otherwise, feature[A] will
be used instead.
formulaic.parsers.types.structured has been promoted to
formulaic.utils.structured.

New features and enhancements:

Formula now instantiates to SimpleFormula or StructuredFormula, the
latter being a tree-structure of SimpleFormula instances (as compared to
List[Term]) previously. This simplifies various internal logic and makes the
propagation of formula metadata more explicit. (#222)
Added support for restricting the set of features used by the default formula
parser so that libraries can more easily restrict the structure of output
formulae. (#207)
dict and recarray types are no associated with the pandas materializer
by default (rather than raising), simplifying some user workflows. (#225)
Added support for the . operator (which is replaced with all variables not
used on the left-hand-side of formulae). (#216)
Added experimental support for nested formulae of form [ ... ~ ... ].
This is useful for (e.g.) generating formulae for IV 2SLS. (#108)
Add support for subsettings ModelSpec[s] based on an arbitrary
strictly reduced FormulaSpec. (#208)
Added Formula.required_variables to more easily surface the expected data
requirements of the formula. (#205)
Added support for extracting rows dropped during materialization. (#197)
Added cubic spline support for cyclic (cc) and natural (cr). See
formulaic.materializers.transforms.cubic_spline.cubic_spline for
more details.
Added a lag() transform.
Constructing LinearConstraints can now be done from a list of strings (for
increased parity with patsy). (#201)
Categorical factors are now preceded with (e.g.) T. when they actully
describe contrasts (i.e. when they are encoded with reduced rank). (#220)
Contrasts metadata is now added to the encoder state via encode_categorical;
which is surfaced via ModelSpec.factor_contrasts. (#204)
Operator instances now received context which is optionally specified by
the user during formula parsing, and updated by the parser. This is what makes
the . implementation possible. (#216)
Given the generic usefulness of Structured, it has been promoted to
formulaic.utils. (#223)
Added explicit support and testing for Python 3.13. (#202)

Bugfixes and cleanups:

Fixed nested ordering of Formula instance. (#200)
Allow Python tokens to multiple chained parentheses and brackets without using
quotes as long as the parentheses are balanced. (#214, #218)
Reduced the number of redundant initialisation operations in Structured
instances. (#200)
Fixed pickling ModelMatrix and FactorValues instances (whenever wrapped
objects are picklable). (#209; thanks @bashtage)
basis_spline: Fixed evaluation involving datasets with null values, and
disallow out-of-bounds knots. (#217; thanks @bashtage)
Improved robustness of data contexts involving PyArrow datasets.
We now use the same sentiles throughout the code-base, rather than having
module specific sentinels in some places.
Migrated to ruff for linting, and updated mypy and pre-commit tooling.
Automatic fixes from ruff are automatically applied when using
hatch run lint:format.

Documentation:

Fixed and updated docsite build, as well as other minor tweaks.

Contributors

bashtage

Assets 2

12 Jul 19:05

matthewwardrop

v1.0.2

0d02a8f

v1.0.2

Bugfixes and cleanups:

Fix compatibility with pandas >=3.
Fix mypy type inference in materializer subclasses.

Documentation:

Add column name extraction to sklearn integration example.
Add section to allow users to indicate their usage of formulaic.

Assets 2

25 Dec 05:46

matthewwardrop

v1.0.1

34c667e

v1.0.1

This is identical to v1.0.0, but with the package status marked to production/stable rather than beta [facepalm].

Assets 2

25 Dec 05:45

matthewwardrop

v1.0.0

cd3c997

v1.0.0

This is the first officially stable release of formulaic, with a relatively small diff from the 0.6.x series.

Breaking changes:

Python tokens are now canonically formatted (see below).
Methods deprecated during the 0.x series have been removed: Formula.terms,
ModelSpec.feature_names, and ModelSpec.feature_indices.

New features and enhancements:

Python tokens are now sanitized and canonically formatted to prevent
ambiguities and better align with patsy.
Added official support for Python 3.12 (no code changes were necessary).
Added the hashed transform for categorically encoding deterministically
hashed representations of a dataset. [Contributed by @rishi-kulkarni]

Bugfixes and cleanups:

Fixed transform state not propagating correctly when Python code tokens were
not canonically formatted.
Literals in formulae will no longer be silently ignored, and feature scaling
is now fully supported.
Improved code parsing and formatting utilities and dropped the requirement for
astor for Python 3.9 and newer.
Fixed all warnings emitted during unit tests.

Documentation:

Removed incompleteness warnings.
Added some lightweight developer documents.
Fixed some broken links.

Contributors

rishi-kulkarni

Assets 2

04 Oct 21:48

matthewwardrop

v0.6.6

a8fdf13

v0.6.6

This is minor release with one important bugfix.

Bugfixes and cleanups:

Fixes a regression introduced by 0.6.4 whereby missing variables will be
silently dropped from the formula., rather than raising an exception.

Assets 2

25 Sep 23:42

matthewwardrop

v0.6.5

da6d11c

v0.6.5

This is a minor release with several important bugfixes.

Bugfixes and cleanups:

Fixed intercept terms sorting after other features (by not counting literal
factors toward the degree of a term). #156
Fixed a regression in 0.6.4 around quoted field names in Python evaluations. #154
Fixed detection and dropping of null rows in sparse datasets. #155
Fixed poly() transforms operating on datasets that include null values. #155
Arguments can now be passed when running the unit tests using hatch run tests.

Assets 2

11 Jul 04:27

matthewwardrop

v0.6.4

b3d2d92

v0.6.4

This is a minor release with several new features and cleanups.

New features and enhancements:

Added support for keeping track of the source of variables being used to
evaluate a formula. Refer to the ModelSpec documentation for more details.

Bugfixes and cleanups:

All functions and methods now have type signatures that are statically checked
during unit testing.
Removed OrderedDict usage, since Python guarantees the orderedness of
dictionaries in Python 3.7+.
Suppress terms/factors in model matrices for which the factors evaluate to
None.

Assets 2

Releases: matthewwardrop/formulaic

v1.2.1

Uh oh!

v1.2.0

Contributors

Uh oh!

v1.1.1

Uh oh!

v1.1.0

Contributors

Uh oh!

v1.0.2

Uh oh!

v1.0.1

Uh oh!

v1.0.0

Contributors

Uh oh!

v0.6.6

Uh oh!

v0.6.5

Uh oh!

v0.6.4

Uh oh!