@@ -389,6 +389,80 @@ The `on_start_query_execution` callback is supported by the following cursor typ
389389Note: ` AsyncCursor ` and its variants do not support this callback as they already
390390return the query ID immediately through their different execution model.
391391
392+ ## Type hints for complex types
393+
394+ * New in version 3.30.0.*
395+
396+ The Athena API does not return element-level type information for complex types
397+ (array, map, row/struct). PyAthena parses the string representation returned by
398+ Athena, but without type metadata the converter can only apply heuristics — which
399+ may produce incorrect Python types for nested values (e.g. integers left as strings
400+ inside a struct).
401+
402+ The ` result_set_type_hints ` parameter solves this by letting you provide Athena DDL
403+ type signatures for specific columns. The converter then uses precise, recursive
404+ type-aware conversion instead of heuristics.
405+
406+ ``` python
407+ from pyathena import connect
408+
409+ cursor = connect(s3_staging_dir = " s3://YOUR_S3_BUCKET/path/to/" ,
410+ region_name = " us-west-2" ).cursor()
411+ cursor.execute(
412+ " SELECT col_array, col_map, col_struct FROM one_row_complex" ,
413+ result_set_type_hints = {
414+ " col_array" : " array(integer)" ,
415+ " col_map" : " map(integer, integer)" ,
416+ " col_struct" : " row(a integer, b integer)" ,
417+ },
418+ )
419+ row = cursor.fetchone()
420+ # col_struct values are now integers, not strings:
421+ # {"a": 1, "b": 2} instead of {"a": "1", "b": "2"}
422+ ```
423+
424+ Column name matching is case-insensitive. Type hints support arbitrarily nested types:
425+
426+ ``` python
427+ cursor.execute(
428+ """
429+ SELECT CAST(
430+ ROW(ROW('2024-01-01', 123), 4.736, 0.583)
431+ AS ROW(header ROW(stamp VARCHAR, seq INTEGER), x DOUBLE, y DOUBLE)
432+ ) AS positions
433+ """ ,
434+ result_set_type_hints = {
435+ " positions" : " row(header row(stamp varchar, seq integer), x double, y double)" ,
436+ },
437+ )
438+ row = cursor.fetchone()
439+ positions = row[0 ]
440+ # positions["header"]["seq"] == 123 (int, not "123")
441+ # positions["x"] == 4.736 (float, not "4.736")
442+ ```
443+
444+ ### Constraints
445+
446+ * ** Nested arrays in native format** — Athena's native (non-JSON) string representation
447+ does not clearly delimit nested arrays. If your query returns nested arrays
448+ (e.g. ` array(array(integer)) ` ), use ` CAST(... AS JSON) ` in your query to get
449+ JSON-formatted output, which is parsed reliably.
450+ * ** Arrow, Pandas, and Polars cursors** — These cursors accept ` result_set_type_hints `
451+ but their converters do not currently use the hints because they rely on their own
452+ type systems. The parameter is passed through for forward compatibility and for
453+ result sets that fall back to the default conversion path.
454+
455+ ### Breaking change in 3.30.0
456+
457+ Prior to 3.30.0, PyAthena attempted to infer Python types for scalar values inside
458+ complex types using heuristics (e.g. ` "123" ` → ` 123 ` ). Starting with 3.30.0, values
459+ inside complex types are ** kept as strings** unless ` result_set_type_hints ` is provided.
460+ This change avoids silent misconversion but means existing code that relied on the
461+ heuristic behavior may see string values where it previously saw integers or floats.
462+
463+ To restore typed conversion, pass ` result_set_type_hints ` with the appropriate type
464+ signatures for the affected columns.
465+
392466## Environment variables
393467
394468Support [ Boto3 environment variables] ( https://boto3.amazonaws.com/v1/documentation/api/latest/guide/configuration.html#using-environment-variables ) .
0 commit comments