Skip to content

Conversation

@xwduan
Copy link
Contributor

@xwduan xwduan commented May 26, 2025

What changed?

Add Dynamic Config for safe keepalive rollout

Why?

We need ability to switch on/off the keepalive setting for keepalive config

How did you test it?

  • built
  • run locally and tested manually
  • covered by existing tests
  • added new unit test(s)
  • added new functional test(s)

Potential risks

No risk is expected.

@xwduan xwduan marked this pull request as ready for review May 26, 2025 23:10
@xwduan xwduan requested a review from a team as a code owner May 26, 2025 23:10
Copy link
Contributor

@dnr dnr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this? Can't we roll out with the static config changes?

On the server side, the dynamic config isn't dynamic at all, it requires a process restart. On the client side, it would only apply to new connections.

"go.temporal.io/server/common/searchattribute"
"go.temporal.io/server/common/telemetry"
"go.temporal.io/server/common/testing/testhooks"
"go.temporal.io/server/service/history/configs"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in common, shared by all services. It shouldn't have a dependency on history service.

Comment on lines 238 to 241
EnableInterNodeServerKeepAlive = NewGlobalBoolSetting(
"system.enableInterNodeServerKeepAlive",
false,
`enableInterNodeServerKeepAlive is the config to enable keep alive for inter-node connections on server side.`,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EnableInterNodeServerKeepAlive = NewGlobalBoolSetting(
"system.enableInterNodeServerKeepAlive",
false,
`enableInterNodeServerKeepAlive is the config to enable keep alive for inter-node connections on server side.`,
EnableInternodeServerKeepalive = NewGlobalBoolSetting(
"system.enableInternodeServerKeepalive",
false,
`enableInternodeServerKeepalive is the config to enable keep alive for inter-node connections on server side.`,

Comment on lines 243 to 247
EnableInterNodeClientKeepAlive = NewGlobalBoolSetting(
"system.enableInterNodeClientKeepAlive",
false,
`enableInterNodeClientKeepAlive is the config to enable keep alive for inter-node connections on client side.`,
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EnableInterNodeClientKeepAlive = NewGlobalBoolSetting(
"system.enableInterNodeClientKeepAlive",
false,
`enableInterNodeClientKeepAlive is the config to enable keep alive for inter-node connections on client side.`,
)
EnableInternodeClientKeepalive = NewGlobalBoolSetting(
"system.enableInternodeClientKeepalive",
false,
`enableInternodeClientKeepalive is the config to enable keep alive for inter-node connections on client side.`,
)

resolver *membership.GRPCResolver,
tracingStatsHandler telemetry.ClientStatsHandler,
monitor membership.Monitor,
dynamicConfig *configs.Config,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need history service configs here? this rpc factory is for all services

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed.

Comment on lines 261 to 270
if !d.dynamicConfig.EnableInterNodeClientKeepAlive() {
// default keepalive settings for clients
return grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: time.Duration(math.MaxInt64),
Timeout: 20 * time.Second,
PermitWithoutStream: false,
})
}
serviceConfig := d.config.Services[string(serviceName)]
return grpc.WithKeepaliveParams(serviceConfig.RPC.ClientConnectionConfig.GetKeepAliveClientParameters())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe write this reversed, like:

Suggested change
if !d.dynamicConfig.EnableInterNodeClientKeepAlive() {
// default keepalive settings for clients
return grpc.WithKeepaliveParams(keepalive.ClientParameters{
Time: time.Duration(math.MaxInt64),
Timeout: 20 * time.Second,
PermitWithoutStream: false,
})
}
serviceConfig := d.config.Services[string(serviceName)]
return grpc.WithKeepaliveParams(serviceConfig.RPC.ClientConnectionConfig.GetKeepAliveClientParameters())
// default keepalive settings for clients
params := keepalive.ClientParameters{
Time: time.Duration(math.MaxInt64),
Timeout: 20 * time.Second,
PermitWithoutStream: false,
}
if d.dynamicConfig.EnableInterNodeClientKeepAlive() {
serviceConfig := d.config.Services[string(serviceName)]
params = serviceConfig.RPC.ClientConnectionConfig.GetKeepAliveClientParameters()
}
return grpc.WithKeepaliveParams(params)

)
}

func ConfigProvider(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is here (and it shouldn't be), it needs a new name, "ConfigProvider" for something that provides history configs is too generic outside of the context of the history service.

)
factory.EnableInternodeServerKeepalive = enableServerKeepalive
factory.EnableInternodeClientKeepalive = enableClientKeepalive
logger.Info(fmt.Sprintf("RPC factory created. enableServerKeepalive: %v, enableClientKeepalive: %v", enableServerKeepalive, enableClientKeepalive))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we really need this log line? I'd rather drop it unless you think it is very important for debugging (and then maybe debug level?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is temporary and I will remove it when fully rolled out. I want to make sure the dc change is taking effect when rolling out. I think debug level make sense.

Comment on lines +368 to +369
factory.EnableInternodeServerKeepalive = enableServerKeepalive
factory.EnableInternodeClientKeepalive = enableClientKeepalive
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing you set these externally instead of passing them to NewFactory to avoid having to change all the tests? that's kind of messy but it's okay since it's just temporary.

Copy link
Contributor Author

@xwduan xwduan May 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, and when removing, we don't have to deal with signature change again. I will add todo comments for that as well.

@xwduan xwduan enabled auto-merge (squash) May 30, 2025 23:58
@xwduan xwduan merged commit 0a1aa5f into main May 31, 2025
56 checks passed
@xwduan xwduan deleted the will/add_dc_for_keepalive branch May 31, 2025 00:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants