Skip to content

Conversation

@ekexium
Copy link
Contributor

@ekexium ekexium commented Mar 24, 2025

Enable the check on the master branch

Signed-off-by: ekexium <eke@fastmail.com>
@ti-chi-bot ti-chi-bot bot added dco-signoff: yes Indicates the PR's author has signed the dco. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Mar 24, 2025
@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Mar 25, 2025
Copy link
Contributor

@zyguan zyguan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we introduce an "escape route" in case users encounter related issues after upgrading to the new version?

@cfzjywxk
Copy link
Contributor

Should we introduce an "escape route" in case users encounter related issues after upgrading to the new version?

Currently, it is enabled experimentally on the master branch, and it may be reverted if any issues arise.
Based on previous lessons, it is crucial to avoid false positive in production environments, as even with workarounds, they have already caused service unavailability.

The challendge is it's difficult to verify no false positive would happen.

@zyguan
Copy link
Contributor

zyguan commented Mar 26, 2025

Should we introduce an "escape route" in case users encounter related issues after upgrading to the new version?

Currently, it is enabled experimentally on the master branch, and it may be reverted if any issues arise. Based on previous lessons, it is crucial to avoid false positive in production environments, as even with workarounds, they have already caused service unavailability.

The challendge is it's difficult to verify no false positive would happen.

I know it's difficult to verify no false positive would happen. What if there is an unfound false positive which causes service unavailability in the customer‘s production environment?

  • if there is no "escape route", we have to urgently release a version with the bug fix.
  • if there is a switch to turn off the validation, we only need to inform the customer to disable this feature.

IMO, an even better approach is to keep this feature disabled by default for upgraded clusters.

@cfzjywxk
Copy link
Contributor

Makes sense, it is still needed to introduce the relevant configuration option or system variable, and as @zyguan mentioned, it should only take effect for newly created clusters. Perhaps this is the safest way to enable the check.

@ekexium
Copy link
Contributor Author

ekexium commented Mar 31, 2025

If we add a switch to this, what should be the behvaior when the switch is off? Skip all checks or still check stale reads?

@you06
Copy link
Contributor

you06 commented Mar 31, 2025

If we add a switch to this, what should be the behvaior when the switch is off? Skip all checks or still check stale reads?

I prefer skipping checks than logging the violations. Because the logs may be ignored, and too many logs can be another serious issue.

@cfzjywxk
Copy link
Contributor

If we add a switch to this, what should be the behvaior when the switch is off? Skip all checks or still check stale reads?

I prefer skipping checks than logging the violations. Because the logs may be ignored, and too many logs can be another serious issue.

The stale read check is already enabled in the LTS version with error logging, the behaviour of the new switch needs some considerations. Disabling for all is a more common approach.

Signed-off-by: ekexium <eke@fastmail.com>
@ti-chi-bot ti-chi-bot bot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Apr 14, 2025
@ti-chi-bot ti-chi-bot bot added lgtm and removed needs-1-more-lgtm Indicates a PR needs 1 more LGTM. labels Apr 17, 2025
@ti-chi-bot ti-chi-bot bot removed the lgtm label Apr 25, 2025
@ekexium ekexium force-pushed the enable-ts-validation-for-normal-read branch from 3ef5e15 to b5a008d Compare April 25, 2025 04:04
Signed-off-by: ekexium <eke@fastmail.com>
@ekexium ekexium force-pushed the enable-ts-validation-for-normal-read branch from b5a008d to 726d0bc Compare April 25, 2025 04:46
@ti-chi-bot ti-chi-bot bot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Apr 25, 2025
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 25, 2025
@cfzjywxk cfzjywxk requested review from you06 and zyguan April 25, 2025 06:30
@ti-chi-bot ti-chi-bot bot added the lgtm label Apr 25, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 25, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: cfzjywxk, zyguan

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Apr 25, 2025
@ti-chi-bot
Copy link

ti-chi-bot bot commented Apr 25, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-25 09:04:15.127900615 +0000 UTC m=+951148.812136713: ☑️ agreed by cfzjywxk.
  • 2025-04-17 14:35:14.951396572 +0000 UTC m=+2958208.635632668: ☑️ agreed by zyguan.
  • 2025-04-25 03:59:15.945923527 +0000 UTC m=+587299.757713910: ✖️🔁 reset by ekexium.
  • 2025-04-25 06:29:39.716934822 +0000 UTC m=+596323.528725198: ☑️ agreed by cfzjywxk.
  • 2025-04-25 08:52:33.481919747 +0000 UTC m=+604897.293710129: ☑️ agreed by zyguan.

@ti-chi-bot ti-chi-bot bot merged commit 8645f93 into tikv:master Apr 25, 2025
11 checks passed
serprex added a commit to PeerDB-io/tikv-client-go that referenced this pull request Jun 30, 2025
* *: bump pd client (tikv#1575)

 

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* OWNERS: Auto Sync OWNERS files from community membership (tikv#1576)

 

Signed-off-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>

* sync etcd endpoints immediately after initializing the client (tikv#1573)

 

Signed-off-by: Vlad Dmitriev <vldmit@gmail.com>

* p-dml: resolve locks concurrently (tikv#1584)

close tikv#1577

Signed-off-by: you06 <you1474600@gmail.com>

* memdb: prevent iterator invalidation (tikv#1563)

ref pingcap/tidb#59153

Signed-off-by: ekexium <eke@fastmail.com>

* locate: refactor RegionRequestSender.SendReqCtx (tikv#1565)

 

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* locate: fix the default settings of circuit breaker (tikv#1593)

ref tikv/pd#8678

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* util: define and implement core interfaces for async api (tikv#1591)

ref tikv#1586

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* Add a retry when getting ts from PD for validating read ts (tikv#1600)

 

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* pdclient: Add caller info to pd client (tikv#1516)

ref tikv/pd#8593

Signed-off-by: okJiang <819421878@qq.com>

* locate: fix TestTiKVClientReadTimeout (tikv#1601)

 

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* *: update pd client (tikv#1605)

 

Signed-off-by: Ryan Leung <rleungx@gmail.com>

* Validate ts only for stale read (tikv#1607)

ref pingcap/tidb#59402

Signed-off-by: ekexium <eke@fastmail.com>

* execdetails: export scheduler write details (tikv#1606)

 

Signed-off-by: Neil Shen <overvenus@gmail.com>

* Update pd client (tikv#1615)

Signed-off-by: disksing <i@disksing.com>

* ci: allow use label to skip integration tests (tikv#1616)

 

Signed-off-by: disksing <i@disksing.com>

* remove useless metric tidb_tikvclient_cop_duration_seconds_bucket (tikv#1602)

 

Signed-off-by: XuHuaiyu <391585975@qq.com>

* client: implement SendRequestAsync for RPCClient (tikv#1604)

ref tikv#1586

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* execdetails: export grpc process and wait time to time details (tikv#1614)

 

Signed-off-by: Neil Shen <overvenus@gmail.com>

Co-authored-by: Bisheng Huang <hbisheng@gmail.com>

* Refine pessimistic lock related metrics and stats (tikv#1620)

 

Signed-off-by: yibin87 <huyibin@pingcap.com>

* metrics: adjust bucket count to reduce metrics data (tikv#1609)

 

Signed-off-by: Lynn <zimu_xia@126.com>

* update tidb for integration tests (tikv#1621)

 

Signed-off-by: disksing <i@disksing.com>

* support redact key in logs (tikv#1612)

ref pingcap/tidb#59279

Signed-off-by: tangenta <tangenta@126.com>

Co-authored-by: you06 <you1474600@gmail.com>

* update integration_test/go.mod (tikv#1624)

 

Signed-off-by: tangenta <tangenta@126.com>

* memdb: introduce snapshot interface (tikv#1623)

 

Signed-off-by: you06 <you1474600@gmail.com>

Co-authored-by: ekexium <eke@fastmail.com>

* pd: enable OutputMustContainAllKeyRange (tikv#1632)

 

Signed-off-by: lhy1024 <admin@liudos.us>

* Fix backoff lose info when forked (tikv#1627)

ref pingcap/tidb#60271

Signed-off-by: yibin87 <huyibin@pingcap.com>

* tikv: disable health-feedback in next-gen (tikv#1635)

 

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* enable ts validation for normal read (tikv#1619)

 

Signed-off-by: ekexium <eke@fastmail.com>

* Add txn write conflict metrics (tikv#1551)

close tikv#1550

Signed-off-by: sujuntao <juntao.su@foxmail.com>

Co-authored-by: sujuntao <juntao.su@foxmail.com>

* apicodec: fix a typo when encoding request for CmdMvccGetByKey (tikv#1638)

 

Signed-off-by: tiancaiamao <tiancaiamao@gmail.com>

Co-authored-by: cfzjywxk <lsswxrxr@163.com>

* *: update kvproto version (tikv#1636)

ref tikv#1631

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* txn: provide more information in commit RPC / log mvcc debug info when commit failed for `TxnLockNotFound` (tikv#1640)

ref tikv#1631

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* txn: handle undetermined error in client go (tikv#1642)

close tikv#1641

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* txn: fix the implemention of undetermined error (tikv#1644)

close tikv#1641

Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>

* locate: implement SendReqAsync for RegionRequestSender (tikv#1618)

ref tikv#1586

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* update pd client for resource group and keyspace (tikv#1645)

 

Signed-off-by: lhy1024 <admin@liudos.us>

* tests: bump tidb to fix integration tests (tikv#1650)

 

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* Fix some metrics that miss const labels (tikv#1652)

 

Signed-off-by: yibin87 <huyibin@pingcap.com>

* Replace etcd safe point with txn safe point for read safety check (tikv#1634)

 

Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>

* Fix stale read metrics (tikv#1649)

close tikv#1648

Signed-off-by: you06 <you1474600@gmail.com>

* *: support async batch get (tikv#1646)

ref tikv#1586

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* Update kvproto dependancy and set keyspace name for rpc context (tikv#1667)

close tikv#1668

Signed-off-by: yibin87 <huyibin@pingcap.com>

* ci: add next-gen integration tests (tikv#1661)

 

Signed-off-by: ekexium <eke@fastmail.com>

* snapshot: set `ReplicaRead` to false when `ReplicaReadType` fallbacks to `ReplicaReadLeader` (tikv#1663)

ref pingcap/tidb#61745

Signed-off-by: you06 <you1474600@gmail.com>

* resource_control: support collecting cross AZ traffic in ru consumption (tikv#1669)

 

Signed-off-by: glorv <glorvs@163.com>

* txnkv: prevent some actions from being interrupted by kill (tikv#1665)

fix pingcap/tidb#61454

Signed-off-by: zyguan <zhongyangguan@gmail.com>

* region_cache: add ForceRefreshAllStores function (tikv#1686)

 

Signed-off-by: guo-shaoge <shaoge1994@163.com>

* upgrade gRPC to allow consumption by peerdb

* Bump gprc version to 1.73.0

Remove references to `grpc.NewSharedBufferPool()`, it was removed from
the grpc package and causes build to fail - it's always on.
(https://pkg.go.dev/google.golang.org/grpc/experimental#WithBufferPool)

Signed-off-by: Tiago Scolari <git@tscolari.me>

* go mod tidy

---------

Signed-off-by: Ryan Leung <rleungx@gmail.com>
Signed-off-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Signed-off-by: Vlad Dmitriev <vldmit@gmail.com>
Signed-off-by: you06 <you1474600@gmail.com>
Signed-off-by: ekexium <eke@fastmail.com>
Signed-off-by: zyguan <zhongyangguan@gmail.com>
Signed-off-by: MyonKeminta <MyonKeminta@users.noreply.github.com>
Signed-off-by: okJiang <819421878@qq.com>
Signed-off-by: Neil Shen <overvenus@gmail.com>
Signed-off-by: disksing <i@disksing.com>
Signed-off-by: XuHuaiyu <391585975@qq.com>
Signed-off-by: yibin87 <huyibin@pingcap.com>
Signed-off-by: Lynn <zimu_xia@126.com>
Signed-off-by: tangenta <tangenta@126.com>
Signed-off-by: lhy1024 <admin@liudos.us>
Signed-off-by: Chao Wang <cclcwangchao@hotmail.com>
Signed-off-by: glorv <glorvs@163.com>
Signed-off-by: guo-shaoge <shaoge1994@163.com>
Signed-off-by: Tiago Scolari <git@tscolari.me>
Co-authored-by: Ryan Leung <rleungx@gmail.com>
Co-authored-by: Ti Chi Robot <ti-community-prow-bot@tidb.io>
Co-authored-by: Vlad Dmitriev <vldmit@gmail.com>
Co-authored-by: you06 <you1474600@gmail.com>
Co-authored-by: ekexium <eke@fastmail.com>
Co-authored-by: zyguan <zhongyangguan@gmail.com>
Co-authored-by: MyonKeminta <9948422+MyonKeminta@users.noreply.github.com>
Co-authored-by: okJiang <819421878@qq.com>
Co-authored-by: Neil Shen <overvenus@gmail.com>
Co-authored-by: disksing <i@disksing.com>
Co-authored-by: HuaiyuXu <xuhuaiyu@pingcap.com>
Co-authored-by: Bisheng Huang <hbisheng@gmail.com>
Co-authored-by: yibin <huyibin@pingcap.com>
Co-authored-by: Lynn <zimu_xia@126.com>
Co-authored-by: tangenta <tangenta@126.com>
Co-authored-by: lhy1024 <lhylhy1024@gmail.com>
Co-authored-by: JT <43174723+ImSjt@users.noreply.github.com>
Co-authored-by: sujuntao <juntao.su@foxmail.com>
Co-authored-by: tiancaiamao <tiancaiamao@gmail.com>
Co-authored-by: cfzjywxk <lsswxrxr@163.com>
Co-authored-by: 王超 <cclcwangchao@hotmail.com>
Co-authored-by: glorv <glorvs@163.com>
Co-authored-by: guo-shaoge <shaoge1994@163.com>
Co-authored-by: Kevin Biju <kevin@peerdb.io>
Co-authored-by: Tiago Scolari <tiago@diagrid.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved dco-signoff: yes Indicates the PR's author has signed the dco. lgtm size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants