Releases: matthewwardrop/formulaic
v1.2.1
v1.2.0
Breaking changes:
- Formulaic now targets Python 3.9+ (dropping support for 3.7 and 3.8). (#234)
ModelSpec.required_variables(and all other.required_variables
implementations) now only include the root variable name (e.g.a.fillna->
a), since this is expected to be used to validate that dataframes have the
expected columns. The full variable usage is still exposed via.variables
and related attributes. You can lookup the the root variable using the.root
property of eachVariableinstance. (#254)
New features and enhancements:
- We now use
narwhalsfor all non-pandas dataframe types supported by narwhals
(including polars dataframes and pyarrow tables). This should substantially
improve performance for these types since historically the relevant parts of
these dataframes were converted lazily to pandas objects. (#189)
Bugfixes and cleanups:
- Fixed compatibility with Scipy 1.16+.
Factorinstances no longer include backticks around factors evaluated using
Python so that string representations can be interpreted correctly when
parsed anew. (#256)FactorValuetypes are now always unwrapped during categorical encoding due
to issues with wrapped numpy arrays.- The entire code-base was updated to use Python 3.9+ typing annotation idioms.
- The docsite is now versioned (allowing historical versions to be explored).
- Various documentation typo fixes.
Special thanks to @MarcoGorelli for his help in implementing Narwhals support.
v1.1.1
v1.1.0
This is a major feature release that was motivated in many aspects by the migration of statstmodels from patsy to formulaic. Many thanks to @bashtage for driving those invasive changes forward. There are some semantic breaking changes, but unless you are deep in the internals of formulaic (which I do not believe to be the case for any external library) these are not expected to break common usage.
Breaking changes:
Formulais no longer always "structured" with special cases to handle the
case where it has no structure. Legacy shims have been added to support old
patterns, withDeprecationWarnings raised when they are used. It is not
expected to break anyone not explicitly checking whether theFormula.rootis
a list instance (which formerly should have been simply assumed) [it is a now
SimpleFormulainstance that acts like an ordered sequence ofTerm
instances].- The column names associated with categorical factors has changed. Previously,
a prefix was unconditionally added to the level in the column name like
feature[T.A], whether nor not the encoding will result in that term acting
as a contrast. Now, in keeping withpatsy, we only add the prefix if the
categorical factor is encoded with reduced rank. Otherwise,feature[A]will
be used instead. formulaic.parsers.types.structuredhas been promoted to
formulaic.utils.structured.
New features and enhancements:
Formulanow instantiates toSimpleFormulaorStructuredFormula, the
latter being a tree-structure ofSimpleFormulainstances (as compared to
List[Term]) previously. This simplifies various internal logic and makes the
propagation of formula metadata more explicit. (#222)- Added support for restricting the set of features used by the default formula
parser so that libraries can more easily restrict the structure of output
formulae. (#207) dictandrecarraytypes are no associated with thepandasmaterializer
by default (rather than raising), simplifying some user workflows. (#225)- Added support for the
.operator (which is replaced with all variables not
used on the left-hand-side of formulae). (#216) - Added experimental support for nested formulae of form
[ ... ~ ... ].
This is useful for (e.g.) generating formulae for IV 2SLS. (#108) - Add support for subsettings
ModelSpec[s]based on an arbitrary
strictly reducedFormulaSpec. (#208) - Added
Formula.required_variablesto more easily surface the expected data
requirements of the formula. (#205) - Added support for extracting rows dropped during materialization. (#197)
- Added cubic spline support for cyclic (
cc) and natural (cr). See
formulaic.materializers.transforms.cubic_spline.cubic_splinefor
more details. - Added a
lag()transform. - Constructing
LinearConstraintscan now be done from a list of strings (for
increased parity withpatsy). (#201) - Categorical factors are now preceded with (e.g.)
T.when they actully
describe contrasts (i.e. when they are encoded with reduced rank). (#220) - Contrasts metadata is now added to the encoder state via
encode_categorical;
which is surfaced viaModelSpec.factor_contrasts. (#204) Operatorinstances now receivedcontextwhich is optionally specified by
the user during formula parsing, and updated by the parser. This is what makes
the.implementation possible. (#216)- Given the generic usefulness of
Structured, it has been promoted to
formulaic.utils. (#223) - Added explicit support and testing for Python 3.13. (#202)
Bugfixes and cleanups:
- Fixed nested ordering of
Formulainstance. (#200) - Allow Python tokens to multiple chained parentheses and brackets without using
quotes as long as the parentheses are balanced. (#214, #218) - Reduced the number of redundant initialisation operations in
Structured
instances. (#200) - Fixed pickling
ModelMatrixandFactorValuesinstances (whenever wrapped
objects are picklable). (#209; thanks @bashtage) basis_spline: Fixed evaluation involving datasets with null values, and
disallow out-of-bounds knots. (#217; thanks @bashtage)- Improved robustness of data contexts involving PyArrow datasets.
- We now use the same sentiles throughout the code-base, rather than having
module specific sentinels in some places. - Migrated to
rufffor linting, and updatedmypyandpre-committooling. - Automatic fixes from
ruffare automatically applied when using
hatch run lint:format.
Documentation:
- Fixed and updated docsite build, as well as other minor tweaks.
v1.0.2
Bugfixes and cleanups:
- Fix compatibility with
pandas>=3. - Fix
mypytype inference in materializer subclasses.
Documentation:
- Add column name extraction to
sklearnintegration example. - Add section to allow users to indicate their usage of formulaic.
v1.0.1
This is identical to v1.0.0, but with the package status marked to production/stable rather than beta [facepalm].
v1.0.0
This is the first officially stable release of formulaic, with a relatively small diff from the 0.6.x series.
Breaking changes:
- Python tokens are now canonically formatted (see below).
- Methods deprecated during the 0.x series have been removed:
Formula.terms,
ModelSpec.feature_names, andModelSpec.feature_indices.
New features and enhancements:
- Python tokens are now sanitized and canonically formatted to prevent
ambiguities and better align withpatsy. - Added official support for Python 3.12 (no code changes were necessary).
- Added the
hashedtransform for categorically encoding deterministically
hashed representations of a dataset. [Contributed by @rishi-kulkarni]
Bugfixes and cleanups:
- Fixed transform state not propagating correctly when Python code tokens were
not canonically formatted. - Literals in formulae will no longer be silently ignored, and feature scaling
is now fully supported. - Improved code parsing and formatting utilities and dropped the requirement for
astorfor Python 3.9 and newer. - Fixed all warnings emitted during unit tests.
Documentation:
- Removed incompleteness warnings.
- Added some lightweight developer documents.
- Fixed some broken links.
v0.6.6
This is minor release with one important bugfix.
Bugfixes and cleanups:
- Fixes a regression introduced by 0.6.4 whereby missing variables will be
silently dropped from the formula., rather than raising an exception.
v0.6.5
This is a minor release with several important bugfixes.
Bugfixes and cleanups:
- Fixed intercept terms sorting after other features (by not counting literal
factors toward the degree of a term). #156 - Fixed a regression in 0.6.4 around quoted field names in Python evaluations. #154
- Fixed detection and dropping of null rows in sparse datasets. #155
- Fixed
poly()transforms operating on datasets that include null values. #155 - Arguments can now be passed when running the unit tests using
hatch run tests.
v0.6.4
This is a minor release with several new features and cleanups.
New features and enhancements:
- Added support for keeping track of the source of variables being used to
evaluate a formula. Refer to theModelSpecdocumentation for more details.
Bugfixes and cleanups:
- All functions and methods now have type signatures that are statically checked
during unit testing. - Removed
OrderedDictusage, since Python guarantees the orderedness of
dictionaries in Python 3.7+. - Suppress terms/factors in model matrices for which the factors evaluate to
None.