Releases: snowflakedb/snowpark-python
Releases · snowflakedb/snowpark-python
Release
1.47.0 (2026-03-05)
Snowpark Python API Updates
New Features
- Added support for the
array_union_aggfunction in thesnowflake.snowpark.functionsmodule.
Bug Fixes
- Fixed a bug where
Session.udf.register_from_filedid not properly process thestrictandsecureparameters. - Fixed a bug when create dataframe with small data(< array binding threshold), and error is raised when have string value in a DecimalType column.
Release
1.46.0 (2026-02-23)
Snowpark Python API Updates
New Features
- Added support for the
DECFLOATdata type that allows users to represent decimal numbers exactly with 38 digits of precision and a dynamic base-10 exponent. - Added support for the
DEFAULT_PYTHON_ARTIFACT_REPOSITORYparameter that allows users to configure the default artifact repository at the account, database, and schema level.
Bug Fixes
- Fixed a bug where
cloudpicklewas not automatically added to the package list when usingartifact_repositorywith custom packages, causingModuleNotFoundErrorat runtime. - Fixed a bug when reading xml with custom schema, result include element attributes when column is not
StructTypetype. - Fixed a bug where
Session.udf.register_from_filedid not properly process thestrictandsecureparameters.
Improvements
- Reduced the size of queries generated by certain
DataFrame.joinoperations. - Removed redundant aliases in generated queries (for example,
SELECT "A" AS "A"is now always simplified toSELECT "A").
Snowpark pandas API Updates
New Features
Bug Fixes
Improvements
Release
1.45.0 (2026-02-02)
Snowpark Python API Updates
New Features
- Allow user input schema when reading XML file on stage.
- Added support for the following functions in
functions.py:- String and Binary functions:
hex_decode_stringjarowinkler_similarityparse_urlregexp_instrregexp_likeregexp_substrregexp_substr_allrtrimmed_lengthspacesplit_part
- String and Binary functions:
- Added
preserve_parameter_namesflag to sproc, UDF, UDTF, and UDAF creation
Bug Fixes
- Fixed a bug that opentelemetry is not correctly import when using
Session.client_telemetry.enable_event_table_telemetry_collection.
Improvements
snowflake.snowpark.context.configure_development_featuresis effective for multiple sessions including newly created sessions after the configuration. No duplicate experimental warning any more.- Removed experimental warning from
DataFrame.to_arrowandDataFrame.to_arrow_batches. - When both
Session.reduce_describe_query_enabledandSession.cte_optimization_enabledare enabled, fewer DESCRIBE queries are issued when resolving table attributes.
Release
1.44.0 (2025-12-15)
Snowpark Python API Updates
New Features
- Added support for targeted delete-insert via the
overwrite_conditionparameter inDataFrameWriter.save_as_table
Improvements
- Improved
DataFrameReaderto return columns in deterministic order when usingINFER_SCHEMA.
Dependency Updates
- Added a dependency on
protobuf<6.34(was<6.32).
Release
1.43.0 (2025-12-03)
Snowpark Python API Updates
New Features
- Added support for
DataFrame.lateral_join - Added support for PrPr feature
Session.client_telemetry. - Added support for
Session.udf_profiler. - Added support for
functions.ai_translate. - Added support for the following
iceberg_configoptions inDataFrameWriter.save_as_tableandDataFrame.copy_into_table:target_file_sizepartition_by
- Added support for the following functions in
functions.py:-
String and Binary functions:
base64_decode_binarybucketcompressdaydecompress_binarydecompress_stringmd5_binarymd5_number_lower64md5_number_upper64sha1_binarysha2_binarysoundex_p123strtoktruncatetry_base64_decode_binarytry_base64_decode_stringtry_hex_decode_binarytry_hex_decode_stringunicodeuuid_string
-
Conditional expressions:
booland_aggboolxor_aggregr_valyzeroifnull
-
Numeric expressions:
cotmodpisquarewidth_bucket
-
Bug Fixes
- Fixed a bug where automatically-generated temporary objects were not properly cleaned up.
- Fixed with a bug when sql generation when joining two
DataFrames created usingDataFrame.aliasand CTE optimization is enabled. - Fixed a bug in
XMLReaderwhere finding the start position of a row tag could return an incorrect file position.
Improvements
- Enhanced
DataFrame.sort()to supportORDER BY ALLwhen no columns are specified. - Removed experimental warning from
Session.cte_optimization_enabled.
Snowpark pandas API Updates
New Features
- Added support for
Dataframe.groupby.rolling(). - Added support for mapping
np.percentilewith DataFrame and Series inputs toSeries.quantile. - Added support for setting the
random_stateparameter to an integer when callingDataFrame.sampleorSeries.sample. - Added support for the following
iceberg_configoptions into_iceberg:target_file_sizepartition_by
Improvements
- Enhanced autoswitching functionality from Snowflake to native pandas for methods with unsupported argument combinations:
shift()withsuffixor non-integerperiodsparameterssort_index()withaxis=1orkeyparameterssort_values()withaxis=1melt()withcol_levelparameterapply()withresult_typeparameter for DataFramepivot_table()withsort=True, non-stringindexlist, non-stringcolumnslist, non-stringvalueslist, oraggfuncdict with non-string valuesfillna()withdowncastparameter or usinglimittogether withvaluedropna()withaxis=1asfreq()withhowparameter,fill_valueparameter,normalize=True, orfreqparameter being week, month, quarter, or yeargroupby()withaxis=1,by!=None and level!=None, or by containing any non-pandas hashable labels.groupby_fillna()withdowncastparametergroupby_first()withmin_count>1groupby_last()withmin_count>1groupby_shift()withfreqparameter
- Slightly improved the performance of
agg,nunique,describe, and related methods on 1-column DataFrame and Series objects.
Bug Fixes
- Fixed a bug in
DataFrameGroupBy.aggwhere func is a list of tuples used to set the names of the output columns. - Fixed a bug where converting a modin datetime index with a timezone to a numpy array with
np.asarraywould cause aTypeError. - Fixed a bug where
Series.isinwith a Series argument matched index labels instead of the row position.
Improvements
- Add support for the following in faster pandas:
groupby.applygroupby.nuniquegroupby.sizeconcatcopystr.isdigitstr.islowerstr.isupperstr.istitlestr.lowerstr.upperstr.titlestr.matchstr.capitalizestr.__getitem__str.centerstr.countstr.getstr.padstr.lenstr.ljuststr.rjuststr.splitstr.replacestr.stripstr.lstripstr.rstripstr.translatedt.tz_localizedt.tz_convertdt.ceildt.rounddt.floordt.normalizedt.month_namedt.day_namedt.strftimedt.dayofweekdt.weekdaydt.dayofyeardt.isocalendarrolling.minrolling.maxrolling.countrolling.sumrolling.meanrolling.stdrolling.varrolling.semrolling.correxpanding.minexpanding.maxexpanding.countexpanding.sumexpanding.meanexpanding.stdexpanding.varexpanding.semcumsumcummincummaxgroupby.groupsgroupby.indicesgroupby.firstgroupby.lastgroupby.rankgroupby.shiftgroupby.cumcountgroupby.cumsumgroupby.cummingroupby.cummaxgroupby.anygroupby.allgroupby.uniquegroupby.get_groupgroupby.rollinggroupby.resampleto_snowflaketo_snowparkresample.minresample.maxresample.countresample.sumresample.meanresample.medianresample.stdresample.varresample.sizeresample.firstresample.lastresample.quantileresample.nunique
- Make faster pandas disabled by default (opt-in instead of opt-out).
- Improve performance of
drop_duplicatesby avoiding joins whenkeep!=Falsein faster pandas.
Release
1.42.0 (2025-10-28)
Snowpark Python API Updates
New Features
- Snowpark python DB-api is now generally available. Access this feature with
DataFrameReader.dbapi()to read data from a database table or query into a DataFrame using a DBAPI connection.
Release
1.41.0 (2025-10-23)
Snowpark Python API Updates
New Features
- Added a new function
serviceinsnowflake.snowpark.functionsthat allows users to create a callable representing a Snowpark Container Services (SPCS) service. - Added
connection_parametersparameter toDataFrameReader.dbapi()(PuPr) method to allow passing keyword arguments to thecreate_connectioncallable. - Added support for
Session.begin_transaction,Session.commitandSession.rollback. - Added support for the following functions in
functions.py:- Geospatial functions:
st_interpolatest_intersectionst_intersection_aggst_intersectsst_isvalidst_lengthst_makegeompointst_makelinest_makepolygonst_makepolygonorientedst_disjointst_distancest_dwithinst_endpointst_envelopest_geohashst_geomfromgeohashst_geompointfromgeohashst_hausdorffdistancest_makepointst_npointsst_perimeterst_pointnst_setsridst_simplifyst_sridst_startpointst_symdifferencest_transformst_unionst_union_aggst_withinst_xst_xmaxst_xminst_yst_ymaxst_yminst_geogfromgeohashst_geogpointfromgeohashst_geographyfromwkbst_geographyfromwktst_geometryfromwkbst_geometryfromwkttry_to_geographytry_to_geometry
- Geospatial functions:
- Added a parameter to enable and disable automatic column name aliasing for
interval_day_time_from_partsandinterval_year_month_from_partsfunctions.
Bug Fixes
- Fixed a bug that
DataFrameReader.xmlfails to parse XML files with undeclared namespaces whenignoreNamespaceisTrue. - Added a fix for floating point precision discrepancies in
interval_day_time_from_parts. - Fixed a bug where writing Snowpark pandas dataframes on the pandas backend with a column multiindex to Snowflake with
to_snowflakewould raiseKeyError. - Fixed a bug that
DataFrameReader.dbapi(PuPr) is not compatible with oracledb 3.4.0. - Fixed a bug where
modinwould unintentionally be imported during session initialization in some scenarios. - Fixed a bug where
session.udf|udtf|udaf|sproc.registerfailed when an extra session argument was passed. These methods do not expect a session argument; please remove it if provided.
Improvements
- The default maximum length for inferred StringType columns during schema inference in
DataFrameReader.dbapiis now increased from 16MB to 128MB in parquet file based ingestion.
Dependency Updates
- Updated dependency of
snowflake-connector-python>=3.17,<5.0.0.
Snowpark pandas API Updates
New Features
- Added support for the
dtypesparameter ofpd.get_dummies - Added support for
nuniqueindf.pivot_table,df.aggand other places where aggregate functions can be used. - Added support for
DataFrame.interpolateandSeries.interpolatewith the "linear", "ffill"/"pad", and "backfill"/bfill" methods. These use the SQLINTERPOLATE_LINEAR,INTERPOLATE_FFILL, andINTERPOLATE_BFILLfunctions (PuPr).
Improvements
- Improved performance of
Series.to_snowflakeandpd.to_snowflake(series)for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variablemodin.config.PandasToSnowflakeParquetThresholdBytes. - Enhanced autoswitching functionality from Snowflake to native Pandas for methods with unsupported argument combinations:
get_dummies()withdummy_na=True,drop_first=True, or customdtypeparameterscumsum(),cummin(),cummax()withaxis=1(column-wise operations)skew()withaxis=1ornumeric_only=Falseparametersround()withdecimalsparameter as a Seriescorr()withmethod!=pearsonparameter
- Set
cte_optimization_enabledto True for all Snowpark pandas sessions. - Add support for the following in faster pandas:
isinisnaisnullnotnanotnullstr.containsstr.startswithstr.endswithstr.slicedt.datedt.timedt.hourdt.minutedt.seconddt.microseconddt.nanoseconddt.yeardt.monthdt.daydt.quarterdt.is_month_startdt.is_month_enddt.is_quarter_startdt.is_quarter_enddt.is_year_startdt.is_year_enddt.is_leap_yeardt.days_in_monthdt.daysinmonthsort_valuesloc(setting columns)to_datetimerenamedropinvertduplicatedilocheadcolumns(e.g., df.columns = ["A", "B"])aggminmaxcountsummeanmedianstdvargroupby.agggroupby.mingroupby.maxgroupby.countgroupby.sumgroupby.meangroupby.mediangroupby.stdgroupby.vardrop_duplicates
- Reuse row count from the relaxed query compiler in
get_axis_len.
Bug Fixes
- Fixed a bug where the row count was not getting cached in the ordered dataframe each time count_rows() is called.
Release
1.40.0 (2025-10-02)
Snowpark Python API Updates
New Features
-
Added a new module
snowflake.snowpark.secretsthat provides Python wrappers for accessing Snowflake Secrets within Python UDFs and stored procedures that execute inside Snowflake.get_generic_secret_stringget_oauth_access_tokenget_secret_typeget_username_passwordget_cloud_provider_token
-
Added support for the following scalar functions in
functions.py:-
Conditional expression functions:
boolandboolnotboolorboolxorboolor_aggdecodegreatest_ignore_nullsleast_ignore_nullsnullifnvl2regr_valx
-
Semi-structured and structured date functions:
array_remove_atas_booleanmap_deletemap_insertmap_pickmap_size
-
String & binary functions:
chrhex_decode_binary
-
Numeric functions:
div0null
-
Differential privacy functions:
dp_interval_highdp_interval_low
-
Context functions:
last_query_idlast_transaction
-
Geospatial functions:
h3_cell_to_boundaryh3_cell_to_childrenh3_cell_to_children_stringh3_cell_to_parenth3_cell_to_pointh3_compact_cellsh3_compact_cells_stringsh3_coverageh3_coverage_stringsh3_get_resolutionh3_grid_diskh3_grid_distanceh3_int_to_stringh3_polygon_to_cellsh3_polygon_to_cells_stringsh3_string_to_inth3_try_grid_pathh3_try_polygon_to_cellsh3_try_polygon_to_cells_stringsh3_uncompact_cellsh3_uncompact_cells_stringshaversineh3_grid_pathh3_is_pentagonh3_is_valid_cellh3_latlng_to_cellh3_latlng_to_cell_stringh3_point_to_cellh3_point_to_cell_stringh3_try_coverageh3_try_coverage_stringsh3_try_grid_distancest_areast_asewkbst_asewktst_asgeojsonst_aswkbst_aswktst_azimuthst_bufferst_centroidst_collectst_containsst_coveredbyst_coversst_differencest_dimension
-
Bug Fixes
- Fixed a bug that
DataFrame.limit()fail if there is parameter binding in the executed SQL when used in non-stored-procedure/udxf environment. - Added an experimental fix for a bug in schema query generation that could cause invalid sql to be generated when using nested structured types.
- Fixed multiple bugs in
DataFrameReader.dbapi(PuPr):- Fixed UDTF ingestion failure with
pyodbcdriver caused by unprocessed row data. - Fixed SQL Server query input failure due to incorrect select query generation.
- Fixed UDTF ingestion not preserving column nullability in the output schema.
- Fixed an issue that caused the program to hang during multithreaded Parquet based ingestion when a data fetching error occurred.
- Fixed a bug in schema parsing when custom schema strings used upper-cased data type names (NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).
- Fixed UDTF ingestion failure with
- Fixed a bug in
Session.create_dataframewhere schema string parsing failed when using upper-cased data type names (e.g., NUMERIC, NUMBER, DECIMAL, VARCHAR, STRING, TEXT).
Improvements
- Improved
DataFrameReader.dbapi(PuPr) that dbapi will not retry on non-retryable error such as SQL syntax error on external data source query. - Removed unnecessary warnings about local package version mismatch when using
session.read.option('rowTag', <tag_name>).xml(<stage_file_path>)orxpathfunctions. - Improved
DataFrameReader.dbapi(PuPr) reading performance by setting the defaultfetch_sizeparameter value to 100000. - Improved error message for XSD validation failure when reading XML files using
session.read.option('rowValidationXSDPath', <xsd_path>).xml(<stage_file_path>).
Snowpark pandas API Updates
Dependency Updates
- Updated the supported
modinversions to >=0.36.0 and <0.38.0 (was previously >= 0.35.0 and <0.37.0).
New Features
- Added support for
DataFrame.queryfor dataframes with single-level indexes. - Added support for
DataFrameGroupby.__len__andSeriesGroupBy.__len__.
Improvements
- Hybrid execution mode is now enabled by default. Certain operations on smaller data will now automatically execute in native pandas in-memory. Use
from modin.config import AutoSwitchBackend; AutoSwitchBackend.disable()to turn this off and force all execution to occur in Snowflake. - Added a session parameter
pandas_hybrid_execution_enabledto enable/disable hybrid execution as an alternative to usingAutoSwitchBackend. - Removed an unnecessary
SHOW OBJECTSquery issued fromread_snowflakeunder certain conditions. - When hybrid execution is enabled,
pd.merge,pd.concat,DataFrame.merge, andDataFrame.joinmay now move arguments to backends other than those among the function arguments. - Improved performance of
DataFrame.to_snowflakeandpd.to_snowflake(dataframe)for large data by uploading data via a parquet file. You can control the dataset size at which Snowpark pandas switches to parquet with the variablemodin.config.PandasToSnowflakeParquetThresholdBytes.
Release
1.39.1 (2024-09-25)
Snowpark Python API Updates
Bug Fixes
- Added an experimental fix for a bug in schema query generation that could cause invalid sql to be genrated when using nested structured types.
Release
1.39.0 (2025-09-17)
Snowpark Python API Updates
New Features
- Added support for unstructured data engineering in Snowpark, powered by Snowflake AISQL and Cortex functions:
DataFrame.ai.complete: Generate per-row LLM completions from prompts built over columns and files.DataFrame.ai.filter: Keep rows where an AI classifier returns TRUE for the given predicate.DataFrame.ai.agg: Reduce a text column into one result using a natural-language task description.RelationalGroupedDataFrame.ai_agg: Perform the same natural-language aggregation per group.DataFrame.ai.classify: Assign single or multiple labels from given categories to text or images.DataFrame.ai.similarity: Compute cosine-based similarity scores between two columns via embeddings.DataFrame.ai.sentiment: Extract overall and aspect-level sentiment from text into JSON.DataFrame.ai.embed: Generate VECTOR embeddings for text or images using configurable models.DataFrame.ai.summarize_agg: Aggregate and produce a single comprehensive summary over many rows.DataFrame.ai.transcribe: Transcribe audio files to text with optional timestamps and speaker labels.DataFrame.ai.parse_document: OCR/layout-parse documents or images into structured JSON.DataFrame.ai.extract: Pull structured fields from text or files using a response schema.DataFrame.ai.count_tokens: Estimate token usage for a given model and input text per row.DataFrame.ai.split_text_markdown_header: Split Markdown into hierarchical header-aware chunks.DataFrame.ai.split_text_recursive_character: Split text into size-bounded chunks using recursive separators.DataFrameReader.file: Create a DataFrame containing all files from a stage as FILE data type for downstream unstructured data processing.
- Added a new datatype
YearMonthIntervalTypethat allows users to create intervals for datetime operations. - Added a new function
interval_year_month_from_partsthat allows users to easily createYearMonthIntervalTypewithout using SQL. - Added a new datatype
DayTimeIntervalTypethat allows users to create intervals for datetime operations. - Added a new function
interval_day_time_from_partsthat allows users to easily createDayTimeIntervalTypewithout using SQL. - Added support for
FileOperation.listto list files in a stage with metadata. - Added support for
FileOperation.removeto remove files in a stage. - Added an option to specify
copy_grantsfor the followingDataFrameAPIs:create_or_replace_viewcreate_or_replace_temp_viewcreate_or_replace_dynamic_table
- Added a new function
snowflake.snowpark.functions.vectorizedthat allows users to mark a function as vectorized UDF. - Added support for parameter
use_vectorized_scannerin functionSession.write_pandas(). - Added support for the following scalar functions in
functions.py:getdategetvariableinvoker_roleinvoker_shareis_application_role_in_sessionis_database_role_in_sessionis_granted_to_invoker_roleis_role_in_sessionlocaltimesystimestamp
Bug Fixes
Deprecations
- Deprecated warnings will be triggered when using snowpark-python with Python 3.9. For more details, please refer to https://docs.snowflake.com/en/developer-guide/python-runtime-support-policy.
Dependency Updates
Improvements
- Unsupported types in
DataFrameReader.dbapi(PuPr) are ingested asStringTypenow. - Improved error message to list available columns when dataframe cannot resolve given column name.
- Added a new option
cacheResulttoDataFrameReader.xmlthat allows users to cache the result of the XML reader to a temporary table after callingxml. It helps improve performance when subsequent operations are performed on the same DataFrame.
Snowpark pandas API Updates
New Features
Improvements
- Downgraded to level
logging.DEBUG - 1the log message saying that the
SnowparkDataFramereference of an internalDataFrameReferenceobject
has changed. - Eliminate duplicate parameter check queries for casing status when retrieving the session.
- Retrieve dataframe row counts through object metadata to avoid a COUNT(*) query (performance)
- Added support for applying Snowflake Cortex function
Complete. - Introduce faster pandas: Improved performance by deferring row position computation.
- The following operations are currently supported and can benefit from the optimization:
read_snowflake,repr,loc,reset_index,merge, and binary operations. - If a lazy object (e.g., DataFrame or Series) depends on a mix of supported and unsupported operations, the optimization will not be used.
- The following operations are currently supported and can benefit from the optimization:
- Updated the error message for when Snowpark pandas is referenced within apply.
- Added a session parameter
dummy_row_pos_optimization_enabledto enable/disable dummy row position optimization in faster pandas.
Dependency Updates
- Updated the supported
modinversions to >=0.35.0 and <0.37.0 (was previously >= 0.34.0 and <0.36.0).
Bug Fixes
- Fixed an issue with drop_duplicates where the same data source could be read multiple times in the same query but in a different order each time, resulting in missing rows in the final result. The fix ensures that the data source is read only once.
- Fixed a bug with hybrid execution mode where an
AssertionErrorwas unexpectedly raised by certain indexing operations.
Snowpark Local Testing Updates
New Features
- Added support to allow patching
functions.ai_complete.