Augment VTAdmin test to also test VTOrc setup#283
Conversation
Signed-off-by: Manan Gupta <manan@planetscale.com>
Signed-off-by: Manan Gupta <manan@planetscale.com>
frouioui
left a comment
There was a problem hiding this comment.
Should we have two separate tests? One for vtadmin, another for vtorc?
|
I don't think 2 are required. I actually think 1 is preferable since we should have a test that launches all the Vitess components like the users would, instead of launching only some of them in each separate test. @frouioui Is there a reason why you would prefer 2 separate tests? |
|
@GuptaManan100 okay sounds good to me! I was thinking about this in case we wanted to have "more" dedicated tests for vtorc and vtadmin. But setting one big cluster with all the components is good! |
frouioui
left a comment
There was a problem hiding this comment.
Left two comments, otherwise, it looks good to me!
* test: augment vtadmin test to also test vtorc setup Signed-off-by: Manan Gupta <manan@planetscale.com> * ci: change the test name to reflect VTOrc addition too Signed-off-by: Manan Gupta <manan@planetscale.com>
Description
This PR augments the VTAdmin end-to-end test to also test VTOrc. The initial cluster-setup file is changed to also launch VTOrc. After verifying that VTAdmin is working as expected, we also go on to verify that VTOrc runs as expected. This is done by stopping replication on all the valid replicas. Following this, we try and write to the primary, which only succeeds if VTOrc can fix the failure and repair replication.
An additional flag
disable-active-reparenthas been added to the vttablet in the test setup, just as how we recommend users to run. Furthermore, VTop only repairs replication if the source set on a tablet doesn't match the primary information. All of these pieces together mean that our test for the VTOrc setup is correct.I have also manually verified that if VTOrc is not running, the writes are stalled indefinitely.
There is a scope for improving the test to timeout after some time. Currently, that part is left to the CI timeout configured to 1 hour in the BuildKite pipeline configuration.