Skip to content

Conversation

@jtv
Copy link
Owner

@jtv jtv commented Dec 22, 2024

Changes in 8.0

Many things have changed. Most code written against 7.x will still work, but there may now be nicer ways of doing things. Some of your code may need updating, and I'm sorry about that! But I hope you'll be happy overall.

Let's go through the main things.

C++20

You will need at least C++20 to compile libpqxx. Support does not need to be absolutely complete, but the basics must be there: concepts, various utility functions in the standard library, the language's new object lifetime rules, and so on.

Thanks to std::span, there are now very few raw pointers in the API (or even internally). This makes libpqxx not only safer, but also easier to prove safe. It is important to be able to run static analysis tools that may detect bugs and vulnerabilities. But, in the C++ tradition, it shouldn't cost us anything in terms of speed.

Retiring deprecated items

Some types and functions were already deprecated and are now gone:

  • binarystring. Use blob instead.
  • connection_base was an alias. Use just regular connection.
  • encrypt_password(). Use the equivalent member functions in connection.
  • dynamic_params. Use params.
  • Some stream_to constructors. Use static "factory" member functions.
  • transaction_base::unesc_raw(). Use unesc_bin().
  • transaction_base::quote_raw(). Use quote(), passing a bytes_view.
  • Some field constructors.
  • Result row splicing. I don't think anyone ever used it.

Result iterators

Lifetimes: result and row iterators no longer reference-count their result object. In libpqxx 7, a `result object's data would stay alive in memory for as long as you had an iterator referring to it. It seemed like a good idea once, many years ago, but it made iteration staggeringly inefficient.

So, that's changed now. Just like any other iterator in C++, result and row iterators no longer keep your container alive. It's your own problem to ensure that. However result objects are still reference-counted smart pointers to the underlying data, so copying them is relatively cheap.

Semantics: Iterators for the result and row classes have never fully complied with the standard requirements for iterators.

There was a good reason for this: I wanted these iterators to be convenient and work a lot like C pointers. I wanted you to be able to dereference them as if a result object were just an array of pointers to arrays.

But it really started getting in the way. Much as I hate it, the standard requirements say that for any iterator i, i[n] should mean *(i + n). In libpqxx, if you had a result iterator i, i[n] meant "field n in the row to which i points." Really convenient, but not compliant with the standard.

So, that is no longer the case. If you want "field n in the row to which i points," say (*i)[n].

By the way, these changes mean that result and row iterators are now proper, standard random-access iterators. Some algorithms from the standard library may magically become more efficient when they detect this.

Comparing results, rows, and fields

When you compare two result, row, or field objects with == or !=, it now only checks whether they refer to (the same row/field in) the same underlying data object. It does not look at the data inside.

Row and field references

The row and field classes were cumbersome,inefficient, and hopelessly intertwined with iterators.

To avoid all that, use row_ref instead of row and field_ref instead of field. These assume that you keep the original result around, and in a stable location in memory.

The indexing operations now return row_ref and field_ref instead of row and field respectively. It probably won't affect your code, but you're running static analysis and instrumented builds to check for these things, right?

Binary data

As the documentation predicted, pqxx::bytes alias has changed to stand for std::vector<std::byte>. It used to be std::basic_string<std::byte>. And pqxx::bytes_view is now an alias for std::span<std::byte>.

This may require changes to your code. The APIs for std::basic_string and std::vector differ, perhaps in more places than they should. Do not read the data using the c_str() member function; use data() instead.

Hate to do this to you. However there were real problems with using std::basic_string the way we did. The basic_string template wasn't built for what we did, and there was no guarantee that it would work with any given compiler. Even where it did, we had to work around differences between compilers and compiler versions.

But there's also good news! Thanks to C++20's Concepts, most functions that previously only accepted a pqxx::bytes argument will now accept just about any block of binary data, such as std::array<std::byte> or a C-style array.

String conversion API

This is a whole chapter in itself. If you were specialising pqxx::type_name or pqxx::string_traits, things just got simpler, safer, and more efficient. But there's also a change in lifetime rules. This can be important for writing
safe and correct code, so read on!

Type names

Let's get the easiest part out of the way first: type_name is now deprecated. Instead of specialising type_name, you now specialises a function called name_type(). It returns a std::string_view, so the underlying data can be a string literal instead of a std::string. Some static analysis tools would report false positives about static deallocation of the type_name strings.

Defining a string conversion

Then there's pqxx::string_traits. Several changes here. If you were defining your own conversions to/from SQL strings, the 8.x API is...

  1. leaner, losing several members but mainly into_string().
  2. safer, using views and spans rather than raw C-style pointers.
  3. simpler, dropping the terminating zero at the end of various strings.
  4. friendlier, accepting std::source_location for better error reporting.
  5. richer, capable of dealing with different text encodings.
  6. faster, because of the lifetime rule changes (described below).

In addition, the conversions also get access to some information about the client text encoding, so that they'll know what they're parsing. See "Text encodings" below.

Your existing conversions may still work without changes, but that's only thanks to some specific compatibility shims. These will go away in libpqxx 9 at the latest, so I urge you to update your code.

Lifetime rule changes

In libpqxx 7.x, when you converted a C++ value to an SQL string or vice versa, you could count on having a valid output for at least as long as you kept the object in which you stored it. The details depend on the type, e.g. pqxx::string_traits<bool>::to_string() returns a string constant, either "true" or "false".

We're tightening that up in 8.x. Now, the output remains valid for at least as long as you keep the object in which you stored it, and also keep the original in place. This streamlines some trivial conversions. For example, converting a std::string_view or a C-style string to SQL is now just a matter of returning a view on that same data. It's just the same data in both C++ and SQL.

It also enables some conversions that weren't possible before. You can now convert an SQL string to a std::string_view or a C-style string pointer, because the conversion is allowed to refer to its input data.

Most code won't need to care about this change. A calling function is usually done very quickly with its converted value, or immediately arranges for more durable storage. If you call pqxx::to_string() for example, you get a std::string containing the SQL string. All that changes in that case is the conversion process skipping an internal copy step, making it a bit faster.

Using conversions

There is now a richer, more complete set of global conversion functions. This means you can convert C++ values to SQL strings without calling their pqxx::string_traits functions yourself.

The options are, from easiest to most efficient:

  • pqxx::to_string() returns the SQL value as a std::string.
  • pqxx::into_buf() renders the SQL value into a buffer you provide.
  • pqxx::to_buf() returns the SQL value, in a location of its own choosing.

The to_buf() variant can make use of optimisations such as returning a view or pointer referring to the input you give it, or returning a string constant (e.g. "true" or "false").

That concludes the changes to the string conversion API. On to the new features that won't require you to change your code, but may improve your life.

Arrays and composite types

You can now generate and parse SQL arrays, just like you can convert so many other types between their C++ representations and their SQL representations.

To parse the SQL representation, use pqxx::from_string<pqxx::array<...>>.

The template parameters in <...> are:

  1. The type for the elements inside the array.
  2. The number of dimensions. A simple array has 1 dimension, which is the default, but multi-dimensional arrays also work.;
  3. The separator character between the elements in the SQL array. It defaults to whatever the default is for the element type, but you can override it. However it must be a single ASCII byte.

Now, pqxx::array is mostly useful for loading SQL values into C++. it does not support changing or manipulating the array's contents once you've got it. When going in the other direction however, you can use the standard C++ container or range types: std::vector, std::array, std::range, std::view, and so on. Nest these inside each other to form multi-dimensional SQL arrays.

Composite types are a bit harder. These are complex types of your own, which you can define in SQL and which you map to a corresponding C++ type. To support converting these between their C++ form and their SQL form, you'll have
to define a pqxx::string_traits type. The composite.hxx header contains some functions that you can call to handle most of the work.

Source locations

Most of the functions in the libpqxx API now accept a std::source_location as a final argument. They will default to the location from where you call them.

In many cases where libpqxx throws an exception, the error message will now include a reference to that source location to help you debug the error.

Where possible, this will be the most precise location where you called into libpqxx. In places where libpqxx functions call other libpqxx functions, they will (where possible) pass the original source location along, so you're not left with a source reference that has no meaning to you and is nothing to do with your application.

You can also override this and pass your own source location in the call.

In some situations it is not possible to pass a source location, but libpqxx will try to give you at least some useful location, such as the location where you created the object on which the error occurred.

Expect this whole mechanism to change again in the future as C++ moves towards full traceback support.

Text encodings

PostgreSQL lets you communicate with the database in a choice of client encodings. SQL that goes back and forth between your client and the server will be represented in this encoding.

For a lot of purposes your code will not need to be aware of the encoding. For example, all SQL statements themselves as well as all special characters such as quotes and commas and various field separators will all be in ASCII.

There are cases where libpqxx needs to know what kind of encoding it's getting. It doesn't care about the exact encoding, but it needs to know where each character begins and ends, so it can detect special characters such as closing quotes, withing being confused by the same byte values occurring inside a multi-byte character.

This information is now also available to string conversions, to help them parse SQL data that may contain text in the client encoding.

Build changes

The build has improved in many ways. As before, you get a choice of CMake or the configure script. The CMake build now also supports "unity builds."

All shell scripts now have a .sh suffix. It takes some getting used to, but it simplifies some things such as running lint checkers on all of them without having to name them all.

The build no longer tries to figure out whether it needs to link the standard C++ filesystems library. In most cases this seems to be part of the regular standard library now. If you do need to add a link option to get std::filesystem to work, you'll have to pass that option yourself.

In the lint script (now called tools/lint.sh), set PQXX_LINT=skip to skip lint checks. This can speed up make check runs, and enable them to run without network access.

Internal changes

The "monobyte" encoding group is now called "ASCII-safe," which covers it better. I was able to fold a few encoding groups into this group, since the only encoding support we need is "find these ASCII characters" and for the ASCII-safe encodings this works just like in ASCII itself.

Thanks to constexpr there are now a few compile-time unit tests inside the headers and source files. To avoid slowing down the build with these, they get compiled only when NDEBUG is not defined. (This is the standard macro that also controls whether assert() statements get compiled.)

The project's Continuous Integration setup now tests against more tools and environments in a more systematic way. Every change gets tested against...

  • Compilers: gcc, clang, and Microsoft Visual Studio.
  • Compiler run-time sanitizers (exact list dependent on compiler): address, address use after scope, alignment, bool, bounds, bounds-strict, builtin, enum, float cast overflow, float divide-by-zero, integer divide-by-zero, leak, nonnull attribute, null, object size, pointer compare, pointer overflow, pointer subtract, return, returns-nonnull attribute, shift, shift-base, shift-exponent, signed integer overflow, undefined, unreachable, VLA bound, vptr, stack protector.
  • Operating systems: Linux, macOS, Windows.
  • GNU & Linux distributions: Arch Linux, Debian, Fedora, MinGW, Ubuntu.
  • Static analysis: clang-tidy, cmake-lint, CodeQL, Facebook's infer, mdl (a.k.a. Markdown lint), pyflakes, ruff, shellcheck,yamllint.
  • Run-time integrity checks: valgrind (all tests).
  • Compiler static analyzers (exact list dependent on compiler): dangling gsl, deprecated literal operator, double fclose, double free, exposure through output file, file leak, free of non-heap, malloc leak, possible null dereference, mismatching allocation, null dereference, null argument, possible null argument, shift count negative, stale setjmp buffer, tainted array index, unsafe call within signal handler, use after free, use of pointer in stale stack frame, use of uninitialized value, write to const, write to string literal.
  • Standard library debug switches: _FORTIFY_SOURCE (level 2), _GLIBCXX_ASSERTIONS, _GLIBCXX_DEBUG, -GLIBCXX_DEBUG_PEDANTIC, _GLIBCXX_SANITIZE_VECTOR.
  • LLM ("AI") reviews: Copilot.

@jtv jtv changed the title Bump to 8.0.0 in preparation. 8.0.0 development Dec 23, 2024
@jtv jtv changed the title 8.0.0 development WIP: 8.0.0 development Dec 23, 2024
@jtv jtv force-pushed the start-8 branch 12 times, most recently from 72d1847 to 6335fbe Compare December 26, 2024 22:51
@jtv jtv added the 8.0 label Jan 12, 2025
@jtv jtv force-pushed the start-8 branch 2 times, most recently from 2a7eda0 to 5449a00 Compare January 19, 2025 14:13
@jtv jtv force-pushed the start-8 branch 2 times, most recently from 1596f43 to 2fd99ba Compare March 14, 2025 12:43
jtv added 2 commits December 21, 2025 23:06
Annotate zero-terminated string arguments.

Uses a gcc-specific attribute, `null_terminated_string_arg`, to mark
`char *`, `char const *`, `char[]`, and `char const[]` parameters that
are zero-terminated.

I don't think this does anything for optimisation, but it makes debug
builds take a closer look.

Does seem to slow the build down.  :-/  But better safe than sorry,
especially with `zview`.
jtv and others added 3 commits December 23, 2025 12:11
Okay, you misinterpreted it, so clearly you're not smart enough to tell
me exactly _this._  But you _sort of_ noticed that I forgot to specify a
return type in my config check, and tried to warn me.

Thing is, you did it _after_ merging.  Instead of focusing on irrelevant
trivialities in review and then complaining afterwards.
There were problems with this that I couldn't figure out, and from more
reading, I get the impression that there's nobody designing it who does
know exactly how it's supposed to work.  There's no magic solution to
portability problems there, just new and unknown intricacies to replace
the old, somewhat-known ones.
jtv and others added 2 commits December 24, 2025 10:06
Make `composite_into_buf()` return a `zview` instead of `size_t`.  The
terminating zero is a convenience for the case where the composite is
being passed as an SQL statement parameter.  But it also simplifies
calling code because it no longer needs to construct its own view.
Which is what it was most likely to do as the very next step.

Adds a test for the case where there's a `std::vector<std::string>` in
the composite (as in #1108).  It reproduces part of the problem.

(Oh and yes, Copilot, the formatter removed a space in the new config
check.  Do I really have to mention such trivia in the description?)
Give `params` an optional encoding argument.

There's no really nice way to do this.  How do you add an optional
parameter to a constructor that takes variadic arguments?  Concepts
don't help.  There's a trick, but it just complicates things too much.

So instead I just use `if constexpr`.  To keep things convenient (and
terse, which really matters for an offhand little class like this) I
let the caller pass either an `encoding_group`, or a `connection`, or a
transaction.  Luckily this set of types is disjunct from the types that
you may want to pass as an actual parameter.  It's messy, but I think
reasonably effective.

It does complicate the explanations a bit: pass this as your first
argument _if needed._  Which you find out from an exception.  Not ideal.
Pass `std::source_location` in more places.

The purpose of all this is to make any error messages that we generate refer to either (by default) the location where the application called libpqxx that led to the error; or one that the caller passed in explicitly.

Unfortunately it's hard to avoid situations where the error message will point to somewhere inside libpqxx instead.
@jtv jtv requested a review from Copilot December 30, 2025 18:59
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 135 out of 309 changed files in this pull request and generated no new comments.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants