Skip to content

Upgrade ClickHouse to 21.10.2.15-stable or later #1385

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
0xr1 opened this issue Mar 16, 2022 · 20 comments
Closed

Upgrade ClickHouse to 21.10.2.15-stable or later #1385

0xr1 opened this issue Mar 16, 2022 · 20 comments

Comments

@0xr1
Copy link

0xr1 commented Mar 16, 2022

Problem Statement

7 RCE and DoS vulnerabilities were disclosed in ClickHouse DBMS recently. More details here: https://jfrog.com/blog/7-rce-and-dos-vulnerabilities-found-in-clickhouse-dbms/

Fix

In order to fix the issues, update ClickHouse to the v21.10.2.15-stable version or later.

If upgrading is not possible, add firewall rules in the server that will restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only.

Solution Brainstorm

ClickHouse version needs to be bumped up here https://github.com/getsentry/self-hosted/blob/master/docker-compose.yml#L189

@chadwhitacre
Copy link
Member

Seems like an easy PR @0xr1. 😉

Not sure how strict we need/want to be with ClickHouse version in SaaS vs. self-hosted, tho ... cf. #1097

restrict the access to the web port (8123) and the TCP server’s port (9000) to specific clients only

Are these locked down in any way in a stock self-hosted install? I think not but maybe, or maybe there is something we can do to lock them down if upgrading ClickHouse isn't as easy as it seems.

@chadwhitacre chadwhitacre changed the title Upgrading ClickHouse Upgrade ClickHouse to 21.10.2.15-stable or later Mar 16, 2022
@chadwhitacre
Copy link
Member

chadwhitacre commented Apr 18, 2022

I asked around internally about this and did some digging:

  1. The version we're pinned to in self-hosted tracks the minimum version we're pinned to in Snuba: 20.3.9.70.
  2. In dev envs we switch to altinity/clickhouse-server:21.6.1.6734-testing-arm under ARM, because there are no stable builds supporting ARM.
  3. Altinity is a third-party ClickHouse hosting company (TIL). We are migrating to their Altinity Stable Build, which tracks upstream ClickHouse, sometimes adding critical fixes. The main benefit is that these are known to have been operated by Altinity for their cloud offering, and they include docs on how to upgrade safely.
  4. We’re trying to move everything from stock 20.3 to Altinity 21.8. We have an initial cluster stood up, but it's not yet serving errors/transactions in SaaS. There are unresolved issues (internal doc) with versions > 20.3.9.70, so it's a no-go for self-hosted at this point.
  5. There are multiple breaking changes between 20.3 and 21.8, it is not a simple drop-in migration even if we were already on Altinity 20.3 (vs. stock).
  6. The current path we are choosing with upgrading errors/transactions in SaaS is not the same as the one self-hosted customers will do. We are currently doing dual writes to avoid landing in a situation where we don’t know how to recover. We will eventually get to discussing about how to upgrade in place which would be applicable for self-hosted.

tl;dr It seems we're a ways off from moving past 20.3.

@chadwhitacre
Copy link
Member

Is there an Altinity Stable 20.3 that has security fixes and would work for self-hosted?

@emmatyping
Copy link
Contributor

emmatyping commented Oct 14, 2022

It sounds like SaaS has moved to 21.8, so the hard part of the work is out of the way. For ARM I think we can use a stable image altinity/clickhouse-server:21.8.12.29.altinitydev.arm, and whatever version of clickhouse SaaS uses for x86. It sounds like the upgrade is not straightforward, so we may need to make a hard stop soon (which could also be useful for #1703).

As far as I can tell, since 21.8 is an LTS, it got the security patches for the above mentioned CVEs

@chadwhitacre
Copy link
Member

Update

  1. Prod is on 21.8, it’s difficult for Sentry devs on M1 macs to test features on CH 20.3 for compatibility
  2. Some changes in the future might be blocking on CH
  3. Replays broke for CH 20.3, what’s to say that won’t happen again in the future?
  4. Feels like it’s a very low priority item for SNS to internally validate if ingested data on 20.3 is compatible with 21.8, don’t know if they will ever get to it (slack thread🔒)
  5. Clickhouse vulnerabilities in current version (CVEs)

@hubertdeng123
Copy link
Member

Notes from talking with SnS team:

  • We should be able to go directly onto 21.8 after shutting down the single node cluster of clickhouse, so this can be baked into the install script
  • No new configurations for clickhouse 21.8

Workflow to get this done:

  1. Put up a PR in self-hosted to upgrade the clickhouse images
  2. Test it on https://self-hosted.getsentry.net
  3. Put up a public notice in the develop docs to notify people of this upgrade, especially the folks using their own Clickhouse setup.
  4. Merge in PR and release
  5. Mention in release notes

@hubertdeng123
Copy link
Member

Going to attempt to perform this upgrade in prod after backing up clickhouse containers. Using the steps outlined here

@hubertdeng123
Copy link
Member

Not yet updating to >21.10.2.15, but making progress!

#2536

@williamdes
Copy link
Contributor

New request at #2741

@hubertdeng123
Copy link
Member

Commenting here that the newest clickhouse versions have ARM images, which would be great for us!

@williamdes
Copy link
Contributor

Hello,

I was wondering what is blocking upgrades to v22 or v23 of ClickHouse ?
https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=22
https://hub.docker.com/r/altinity/clickhouse-server/tags?page=1&name=23

As far as I understand we all are using a two year old version, what are the impacts of upgrading ?
Where can I find the code samples that interact with CH ?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 16, 2024
@aldy505
Copy link
Collaborator

aldy505 commented Mar 19, 2024

As far as I understand we all are using a two year old version, what are the impacts of upgrading ?

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

This is their changelog: https://github.com/ClickHouse/ClickHouse/blob/master/CHANGELOG.md

Most of the time if there are any "backward incompatible changes", the existing query will be fine, but it won't do anything.

Where can I find the code samples that interact with CH ?

I found some here: https://github.com/getsentry/snuba/blob/master/snuba/replacers/errors_replacer.py, it is executed from here https://github.com/getsentry/snuba/blob/338ae983506f787852c07d16e13a544bb64c5055/snuba/replacer.py#L348-L397

@williamdes
Copy link
Contributor

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 19, 2024
@williamdes
Copy link
Contributor

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

Can you confirm the exact version that is working with Sentry on your setup ?
Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.

@aldy505
Copy link
Collaborator

aldy505 commented Mar 19, 2024

ClickHouse has 2 version: LTS and stable. On my company, I use the stable once, since I'm to lazy to handle big breaking change once the LTS version release. But since 2021 or so, I haven't met any breaking change that broke my app with Cilckhouse.

Can you confirm the exact version that is working with Sentry on your setup ? Maybe I can also bump and confirm that is works fine too. It may end up in a bump for self hosted.

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

@williamdes
Copy link
Contributor

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

Very cool !
Could you open a PR to share the implementation details ?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 19, 2024
@aldy505
Copy link
Collaborator

aldy505 commented Mar 19, 2024

I'm using the default value on the repo right now. One thing that's different on my deployment is just I switched from Kafka to Redpanda.

Very cool ! Could you open a PR to share the implementation details ?

It's on Sentry's Discord: https://discord.com/channels/621778831602221064/796028405833007104/1201076383426809948

@williamdes
Copy link
Contributor

See #3001 for more upgrade news

@hubertdeng123
Copy link
Member

Yep, we are working on this :)

@aldy505
Copy link
Collaborator

aldy505 commented Jun 14, 2024

Done in #2536 and #3009

@aldy505 aldy505 closed this as completed Jun 14, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jun 30, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
Archived in project
Archived in project
Development

No branches or pull requests

6 participants