-
-
Notifications
You must be signed in to change notification settings - Fork 275
8.0.0 development #914
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
jtv
wants to merge
875
commits into
master
Choose a base branch
from
start-8
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
8.0.0 development #914
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
72d1847 to
6335fbe
Compare
2a7eda0 to
5449a00
Compare
1596f43 to
2fd99ba
Compare
Annotate zero-terminated string arguments. Uses a gcc-specific attribute, `null_terminated_string_arg`, to mark `char *`, `char const *`, `char[]`, and `char const[]` parameters that are zero-terminated. I don't think this does anything for optimisation, but it makes debug builds take a closer look. Does seem to slow the build down. :-/ But better safe than sorry, especially with `zview`.
Okay, you misinterpreted it, so clearly you're not smart enough to tell me exactly _this._ But you _sort of_ noticed that I forgot to specify a return type in my config check, and tried to warn me. Thing is, you did it _after_ merging. Instead of focusing on irrelevant trivialities in review and then complaining afterwards.
There were problems with this that I couldn't figure out, and from more reading, I get the impression that there's nobody designing it who does know exactly how it's supposed to work. There's no magic solution to portability problems there, just new and unknown intricacies to replace the old, somewhat-known ones.
Make `composite_into_buf()` return a `zview` instead of `size_t`. The terminating zero is a convenience for the case where the composite is being passed as an SQL statement parameter. But it also simplifies calling code because it no longer needs to construct its own view. Which is what it was most likely to do as the very next step. Adds a test for the case where there's a `std::vector<std::string>` in the composite (as in #1108). It reproduces part of the problem. (Oh and yes, Copilot, the formatter removed a space in the new config check. Do I really have to mention such trivia in the description?)
Give `params` an optional encoding argument. There's no really nice way to do this. How do you add an optional parameter to a constructor that takes variadic arguments? Concepts don't help. There's a trick, but it just complicates things too much. So instead I just use `if constexpr`. To keep things convenient (and terse, which really matters for an offhand little class like this) I let the caller pass either an `encoding_group`, or a `connection`, or a transaction. Luckily this set of types is disjunct from the types that you may want to pass as an actual parameter. It's messy, but I think reasonably effective. It does complicate the explanations a bit: pass this as your first argument _if needed._ Which you find out from an exception. Not ideal.
Pass `std::source_location` in more places. The purpose of all this is to make any error messages that we generate refer to either (by default) the location where the application called libpqxx that led to the error; or one that the caller passed in explicitly. Unfortunately it's hard to avoid situations where the error message will point to somewhere inside libpqxx instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Copilot reviewed 135 out of 309 changed files in this pull request and generated no new comments.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Changes in 8.0
Many things have changed. Most code written against 7.x will still work, but there may now be nicer ways of doing things. Some of your code may need updating, and I'm sorry about that! But I hope you'll be happy overall.
Let's go through the main things.
C++20
You will need at least C++20 to compile libpqxx. Support does not need to be absolutely complete, but the basics must be there: concepts, various utility functions in the standard library, the language's new object lifetime rules, and so on.
Thanks to
std::span, there are now very few raw pointers in the API (or even internally). This makes libpqxx not only safer, but also easier to prove safe. It is important to be able to run static analysis tools that may detect bugs and vulnerabilities. But, in the C++ tradition, it shouldn't cost us anything in terms of speed.Retiring deprecated items
Some types and functions were already deprecated and are now gone:
binarystring. Useblobinstead.connection_basewas an alias. Use just regularconnection.encrypt_password(). Use the equivalent member functions inconnection.dynamic_params. Useparams.stream_toconstructors. Use static "factory" member functions.transaction_base::unesc_raw(). Useunesc_bin().transaction_base::quote_raw(). Usequote(), passing abytes_view.fieldconstructors.Result iterators
Lifetimes:
resultandrowiterators no longer reference-count theirresultobject. In libpqxx 7, a `result object's data would stay alive in memory for as long as you had an iterator referring to it. It seemed like a good idea once, many years ago, but it made iteration staggeringly inefficient.So, that's changed now. Just like any other iterator in C++,
resultandrowiterators no longer keep your container alive. It's your own problem to ensure that. Howeverresultobjects are still reference-counted smart pointers to the underlying data, so copying them is relatively cheap.Semantics: Iterators for the
resultandrowclasses have never fully complied with the standard requirements for iterators.There was a good reason for this: I wanted these iterators to be convenient and work a lot like C pointers. I wanted you to be able to dereference them as if a
resultobject were just an array of pointers to arrays.But it really started getting in the way. Much as I hate it, the standard requirements say that for any iterator
i,i[n]should mean*(i + n). In libpqxx, if you had aresultiteratori,i[n]meant "fieldnin the row to whichipoints." Really convenient, but not compliant with the standard.So, that is no longer the case. If you want "field
nin the row to whichipoints," say(*i)[n].By the way, these changes mean that result and row iterators are now proper, standard random-access iterators. Some algorithms from the standard library may magically become more efficient when they detect this.
Comparing results, rows, and fields
When you compare two
result,row, orfieldobjects with==or!=, it now only checks whether they refer to (the same row/field in) the same underlying data object. It does not look at the data inside.Row and field references
The
rowandfieldclasses were cumbersome,inefficient, and hopelessly intertwined with iterators.To avoid all that, use
row_refinstead ofrowandfield_refinstead offield. These assume that you keep the originalresultaround, and in a stable location in memory.The indexing operations now return
row_refandfield_refinstead ofrowandfieldrespectively. It probably won't affect your code, but you're running static analysis and instrumented builds to check for these things, right?Binary data
As the documentation predicted,
pqxx::bytesalias has changed to stand forstd::vector<std::byte>. It used to bestd::basic_string<std::byte>. Andpqxx::bytes_viewis now an alias forstd::span<std::byte>.This may require changes to your code. The APIs for
std::basic_stringandstd::vectordiffer, perhaps in more places than they should. Do not read the data using thec_str()member function; usedata()instead.Hate to do this to you. However there were real problems with using
std::basic_stringthe way we did. Thebasic_stringtemplate wasn't built for what we did, and there was no guarantee that it would work with any given compiler. Even where it did, we had to work around differences between compilers and compiler versions.But there's also good news! Thanks to C++20's Concepts, most functions that previously only accepted a
pqxx::bytesargument will now accept just about any block of binary data, such asstd::array<std::byte>or a C-style array.String conversion API
This is a whole chapter in itself. If you were specialising
pqxx::type_nameorpqxx::string_traits, things just got simpler, safer, and more efficient. But there's also a change in lifetime rules. This can be important for writingsafe and correct code, so read on!
Type names
Let's get the easiest part out of the way first:
type_nameis now deprecated. Instead of specialisingtype_name, you now specialises a function calledname_type(). It returns astd::string_view, so the underlying data can be a string literal instead of astd::string. Some static analysis tools would report false positives about static deallocation of thetype_namestrings.Defining a string conversion
Then there's
pqxx::string_traits. Several changes here. If you were defining your own conversions to/from SQL strings, the 8.x API is...into_string().std::source_locationfor better error reporting.In addition, the conversions also get access to some information about the client text encoding, so that they'll know what they're parsing. See "Text encodings" below.
Your existing conversions may still work without changes, but that's only thanks to some specific compatibility shims. These will go away in libpqxx 9 at the latest, so I urge you to update your code.
Lifetime rule changes
In libpqxx 7.x, when you converted a C++ value to an SQL string or vice versa, you could count on having a valid output for at least as long as you kept the object in which you stored it. The details depend on the type, e.g.
pqxx::string_traits<bool>::to_string()returns a string constant, either "true" or "false".We're tightening that up in 8.x. Now, the output remains valid for at least as long as you keep the object in which you stored it, and also keep the original in place. This streamlines some trivial conversions. For example, converting a
std::string_viewor a C-style string to SQL is now just a matter of returning a view on that same data. It's just the same data in both C++ and SQL.It also enables some conversions that weren't possible before. You can now convert an SQL string to a
std::string_viewor a C-style string pointer, because the conversion is allowed to refer to its input data.Most code won't need to care about this change. A calling function is usually done very quickly with its converted value, or immediately arranges for more durable storage. If you call
pqxx::to_string()for example, you get astd::stringcontaining the SQL string. All that changes in that case is the conversion process skipping an internal copy step, making it a bit faster.Using conversions
There is now a richer, more complete set of global conversion functions. This means you can convert C++ values to SQL strings without calling their
pqxx::string_traitsfunctions yourself.The options are, from easiest to most efficient:
pqxx::to_string()returns the SQL value as astd::string.pqxx::into_buf()renders the SQL value into a buffer you provide.pqxx::to_buf()returns the SQL value, in a location of its own choosing.The
to_buf()variant can make use of optimisations such as returning a view or pointer referring to the input you give it, or returning a string constant (e.g. "true" or "false").That concludes the changes to the string conversion API. On to the new features that won't require you to change your code, but may improve your life.
Arrays and composite types
You can now generate and parse SQL arrays, just like you can convert so many other types between their C++ representations and their SQL representations.
To parse the SQL representation, use
pqxx::from_string<pqxx::array<...>>.The template parameters in
<...>are:Now,
pqxx::arrayis mostly useful for loading SQL values into C++. it does not support changing or manipulating the array's contents once you've got it. When going in the other direction however, you can use the standard C++ container or range types:std::vector,std::array,std::range,std::view, and so on. Nest these inside each other to form multi-dimensional SQL arrays.Composite types are a bit harder. These are complex types of your own, which you can define in SQL and which you map to a corresponding C++ type. To support converting these between their C++ form and their SQL form, you'll have
to define a
pqxx::string_traitstype. Thecomposite.hxxheader contains some functions that you can call to handle most of the work.Source locations
Most of the functions in the libpqxx API now accept a
std::source_locationas a final argument. They will default to the location from where you call them.In many cases where libpqxx throws an exception, the error message will now include a reference to that source location to help you debug the error.
Where possible, this will be the most precise location where you called into libpqxx. In places where libpqxx functions call other libpqxx functions, they will (where possible) pass the original source location along, so you're not left with a source reference that has no meaning to you and is nothing to do with your application.
You can also override this and pass your own source location in the call.
In some situations it is not possible to pass a source location, but libpqxx will try to give you at least some useful location, such as the location where you created the object on which the error occurred.
Expect this whole mechanism to change again in the future as C++ moves towards full traceback support.
Text encodings
PostgreSQL lets you communicate with the database in a choice of client encodings. SQL that goes back and forth between your client and the server will be represented in this encoding.
For a lot of purposes your code will not need to be aware of the encoding. For example, all SQL statements themselves as well as all special characters such as quotes and commas and various field separators will all be in ASCII.
There are cases where libpqxx needs to know what kind of encoding it's getting. It doesn't care about the exact encoding, but it needs to know where each character begins and ends, so it can detect special characters such as closing quotes, withing being confused by the same byte values occurring inside a multi-byte character.
This information is now also available to string conversions, to help them parse SQL data that may contain text in the client encoding.
Build changes
The build has improved in many ways. As before, you get a choice of CMake or the
configurescript. The CMake build now also supports "unity builds."All shell scripts now have a
.shsuffix. It takes some getting used to, but it simplifies some things such as running lint checkers on all of them without having to name them all.The build no longer tries to figure out whether it needs to link the standard C++ filesystems library. In most cases this seems to be part of the regular standard library now. If you do need to add a link option to get
std::filesystemto work, you'll have to pass that option yourself.In the lint script (now called
tools/lint.sh), setPQXX_LINT=skipto skip lint checks. This can speed upmake checkruns, and enable them to run without network access.Internal changes
The "monobyte" encoding group is now called "ASCII-safe," which covers it better. I was able to fold a few encoding groups into this group, since the only encoding support we need is "find these ASCII characters" and for the ASCII-safe encodings this works just like in ASCII itself.
Thanks to
constexprthere are now a few compile-time unit tests inside the headers and source files. To avoid slowing down the build with these, they get compiled only whenNDEBUGis not defined. (This is the standard macro that also controls whetherassert()statements get compiled.)The project's Continuous Integration setup now tests against more tools and environments in a more systematic way. Every change gets tested against...
gcc,clang, and Microsoft Visual Studio.clang-tidy,cmake-lint, CodeQL, Facebook'sinfer,mdl(a.k.a. Markdown lint),pyflakes,ruff,shellcheck,yamllint.valgrind(all tests).fclose, doublefree, exposure through output file, file leak,freeof non-heap,mallocleak, possible null dereference, mismatching allocation, null dereference, null argument, possible null argument, shift count negative, stalesetjmpbuffer, tainted array index, unsafe call within signal handler, use afterfree, use of pointer in stale stack frame, use of uninitialized value, write toconst, write to string literal._FORTIFY_SOURCE(level 2),_GLIBCXX_ASSERTIONS,_GLIBCXX_DEBUG,-GLIBCXX_DEBUG_PEDANTIC,_GLIBCXX_SANITIZE_VECTOR.