Skip to content

Conversation

@Bodigrim
Copy link
Collaborator

@Bodigrim Bodigrim commented Jan 24, 2026

Template Α: This PR modifies behaviour or interface

Include the following checklist in your PR:

@Bodigrim Bodigrim force-pushed the utf8 branch 3 times, most recently from 52b3ba5 to 550ea8f Compare January 25, 2026 14:14
@Bodigrim Bodigrim marked this pull request as ready for review January 25, 2026 14:14
@ulysses4ever
Copy link
Collaborator

In principle, this looks like the right direction of travel. But I'm wary of the CPP and dependence on an internal text module.

@Bodigrim
Copy link
Collaborator Author

CPP can be avoided if everyone is happy to require text >= 2.0. This would require regenerating bootstrap build plan for GHC 9.2.8, but otherwise I hardly see any drawbacks. Shall we?

@geekosaur
Copy link
Collaborator

With all the stuff we already need for bootstrapping, and 9.2.8 being disrecommended due to bugs, I think we should go ahead and bump it if it avoids CPP.

This currently requires CPP for text<2, required for Cabal bootstrap plan on GHC 9.2.
Hopefully we will be able to get rid of it in a not so distant future:
`text-2.0` itself is buildable with GHC >= 8.0 and shipped as a boot package
starting from GHC 9.4.
@Bodigrim
Copy link
Collaborator Author

One does not simply update a bootstrap plan. Raised #11464 to discuss.

@geekosaur
Copy link
Collaborator

And for what it's worth, I don't see any other text version checks in the codebase.

@ulysses4ever
Copy link
Collaborator

Since it's a change to the library, not the tool, I'd be more careful. In particular, I'd only require text>=2.0 after the last GHC shipping text<2.0 falls out of our support window. Thoughts @Mikolaj @mpickering @ffaf1?

@Bodigrim
Copy link
Collaborator Author

In particular, I'd only require text>=2.0 after the last GHC shipping text<2.0 falls out of our support window.

Could you elaborate why? What's the relationship?

@geekosaur
Copy link
Collaborator

All I can think of is someone building such an older ghc from source, since that would require the newer text to be a bootlib.

@Bodigrim
Copy link
Collaborator Author

All I can think of is someone building such an older ghc from source, since that would require the newer text to be a bootlib.

If they are building an old GHC from source, surely they are building an old Cabal boot library from the GHC source tree, so all is fine.

(For the record, not a hill I'm going to die on; I'm genuinely curious)

@Mikolaj
Copy link
Member

Mikolaj commented Jan 26, 2026

This would require regenerating bootstrap build plan for GHC 9.2.8

We can drop the plan for GHC 9.2.8, even because we already have more plans than we normally offer.

Re requiring new text vs CPP, that's a tough one, in particular, because CPP (and non-strict deps) make it much harder to test all paths in CI. Text 2.0 is 4 years old, so maybe it's time to switch? That obviously needs a major version, asking major lib users ahead of time (especially @mpilgrem and the illustrious distro maintainers) and I may still be missing some old promise or custom around this that @ulysses4ever remembers.

@ulysses4ever
Copy link
Collaborator

I don't have any reasoning, just things I came to believe over time, overhearing various talks and reading various ancient threads on this bug tracker (no links, sorry). I'd ask Oleg hadn't he explicitly asked not to ping him on ongoing cabal matters.

For distributors, apart from Mike, I think it'd be prudent to ping @juhp for feedback. Maybe @maralorn could also opine. (Apologies in advance for a rather vague ping.) The question is whether you guys see any issue in Cabal-the-library requiring text>=2.

@Bodigrim
Copy link
Collaborator Author

Text 2.0 is 4 years old, so maybe it's time to switch? That obviously needs a major version, asking major lib users ahead of time (especially @mpilgrem and the illustrious distro maintainers) and I may still be missing some old promise or custom around this that @ulysses4ever remembers.

Stack already requires text>=2.1.3.

text-2.0 has been a boot package since GHC 9.4 and pretty much all major Linux distributives are already shipping it or newer.

@maralorn
Copy link
Contributor

Yeah, I this would be no problem on Nixos.

@Bodigrim
Copy link
Collaborator Author

@ulysses4ever @Mikolaj @geekosaur is this enough of assurances to go ahead with this or do you have further reservations?

@ulysses4ever
Copy link
Collaborator

With respect to text 2, yes, thank you!

Is there no way to avoid depending on an internal module?

bs1
((acc `shiftL` 6) .|. fromIntegral cn .&. 0x3F)
| otherwise -> (replacementChar, f bs1)
go decoderResult bs = case decoderResult of
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we're using the same replacementChar, couldn't we just use Data.Text.Encoding.decodeUtf8Lenient? (It's also in 2.0, if you're worried about that.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is all a bit awkward for historical reasons. In the modern Haskell ecosystem Cabal-syntax should not be parsing ByteString but with UTF-8 semantics, this is obviously stupid. And all FieldLineStream and FieldLine should wrap Text, not ByteString. But fixing it properly is a significant breaking change and I don't want to disrupt ongoing work on Cabal formatters.

@Bodigrim
Copy link
Collaborator Author

Bodigrim commented Feb 1, 2026

Is there no way to avoid depending on an internal module?

It's possible with decodeUtf8Chunk and decodeUtf8More, but very awkward, to a degree that I'd rather revert changes to Distribution.Parsec.FieldLineStream if it proves to be a sticking point. What do you think?

@mpilgrem
Copy link
Collaborator

mpilgrem commented Feb 1, 2026

Stack would be unaffected:

  • Stack builds (by default) with the version of Cabal that is the boot package of the specified GHC version
  • Stack's own reliance on Cabal is on a single version - most often the version in the most recent Stackage LTS Haskell snapshot (unless there is a compelling reason to jump forwards).

@Bodigrim
Copy link
Collaborator Author

Bodigrim commented Feb 3, 2026

It's possible with decodeUtf8Chunk and decodeUtf8More, but very awkward, to a degree that I'd rather revert changes to Distribution.Parsec.FieldLineStream if it proves to be a sticking point. What do you think?

@ulysses4ever @Mikolaj @geekosaur @ffaf1?

(While my preference is to keep changes, I will not be grossly offended if you ask me to revert. I'd rather move forward one way or another then drain myself on a protracted dicussion about a relatively minor change)

Copy link
Collaborator

@ulysses4ever ulysses4ever left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have an informed opinion and will trust your judgement w.r.t. internals.

Otherwise, great job!

@geekosaur
Copy link
Collaborator

Artem was the one concerned about it; I was just making a suggestion, but in thinking about it the whole Text thing probably means a different internal function to access the resulting Text directly to make it a ByteString.

So probably just go with it as is until we have a chance to redesign the APIs to use Text like they should have in the first place.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants