Skip to content

[Feature Request] Migration APIs from self-hosted Temporal to Temporal Cloud #49

@atihkin

Description

@atihkin

Goals

The purpose of this FR is to outline our strategy for how to migrate customers' namespaces from their self-hosted Temporal instance to Temporal Cloud. We assume the following requirements.

  • No changes needed in customer’s application code.
  • Live migration handled by Temporal end-to-end via internal workflows.
  • Signals & queries to workflows will be handled and customers don’t need to modify code.
  • Minimize the setup and coordination required from customers (with the caveat, self-hosted must run a version of Temporal which supports migration to Cloud).
  • Customers have full control over the migration process. They can decide when to handover and have the ability to abort/rollback (reverse the migration) before handing over.
  • Migration of namespaces in the other direction, i.e. from Temporal Cloud to self-hosted Temporal is not in the scope of this FR.

Image

Glossary of terms

  • Migration server: A migration server is a single-tenant Temporal server and can access both self-hosted server and Temporal Cloud server via secure network connections.
  • Migration proxy: By default, a migration server requires admin access to a self-hosted server and vice versa during migration. To enhance security, we will introduce proxies between the self-hosted server and migration server: a customer-side proxy and a cloud-side proxy.

User flows

  • Temporal will coordinate with the customer prior to the migration and create a migration server. This could take several hours.
  • Customers have to install the migration proxy to allow connections between the self-hosted server and migration server (steps may vary based on the customer's network setup).
  • StartMigrationRequest: Customer initializes the request to migrate namespace(s) from self-hosted server to cloud.
  • Migration request includes namespace(s) to migrate, the migration endpoint/cert.
  • Temporal creates a cloud namespace with a “non-active” namespace status. Customers set the right permissions and access controls for their cloud namespace.
  • GetMigrationResponse: Customers can monitor the progress of workflow replication, time remaining for completion, and when the migration is complete.
  • HandoverNamespaceRequest: Customers can handover back and forth between their source namespace and the cloud namespace. This provides time to validate everything is working as expected in cloud.
  • Customer has to switch worker traffic from self-hosted to cloud endpoint to validate that everything is working correctly.
  • Customer updates the Temporal client in the application code to connect to the cloud namespace endpoint instead of the self-hosted namespace endpoint.
  • ConfirmMigrationRequest or AbortMigrationRequest: Customers confirm and complete the migration to cloud, or they can decide to abort the migration. Common reasons to abort migration may include: wrong namespace was migrated, or replication errors.

Cloud APIs for migration

message StartMigrationRequest {
    // The migration specification.
    temporal.api.cloud.namespace.v1.MigrationSpec spec = 1;
    // The id to use for this async operation.
    // Optional, if not provided a random id will be generated.
    string async_operation_id = 2;
}

message StartMigrationResponse {
    // The migration id.
    string migration_id = 1;
    // The cloud namespace.
    string namespace = 2; 
    // The async operation.
    temporal.api.cloud.operation.v1.AsyncOperation async_operation = 3;
}

message MigrationSpec {
    oneof variant {
        // Details for migration from self-hosted to cloud.
        MigrationToCloudSpec to_cloud_spec = 1;
    }

    // The id of the migration endpoint used for connecting 
    // the self-hosted Temporal cluster to Temporal cloud.
    string migration_endpoint_id = 3;
}

message MigrationToCloudSpec {
    // The source namespace name for the migration.
    string source_namespace = 1;
    // Details for the namespace that will be created as a result of the migration.
    NamespaceSpec target_namespace_spec = 2;
}
message GetMigrationRequest {
    // The migration id.
    string migration_id = 1;
}

message GetMigrationResponse {
    // The migration.
    temporal.api.cloud.namespace.v1.Migration migration = 1;
}

message GetMigrationsRequest {
    // The requested size of the page to retrieve.
    // Cannot exceed 1000.
    // Optional, defaults to 100.
    int32 page_size = 1;
    // The page token if this is continuing from another response.
    // Optional, defaults to empty.
    string page_token = 2;
}

message GetMigrationsResponse {
    // The list of migrations.
    repeated temporal.api.cloud.namespace.v1.Migration migrations = 1;
    // The next page's token.
    string next_page_token = 2;
}

message Migration {
    // The unique id of this migration.
    string migration_id = 1;
    // The MigrationSpec provided in the StartMigrationRequest.
    MigrationSpec spec = 2;
    // The state of the migration.
    State state = 5;
    // The source and destination replicas involved in the migration.
    repeated MigrationReplica replicas = 4;
    // The number of workflows replicated.
    int64 replicated_workflows = 5;
    // The number of workflows remaining.
    int64 replicated_workflows_remaining = 6;
    // An error message if the migration failed.
    string failure_message = 7;

    enum State {
        STATE_UNSPECIFIED = 0;
        STATE_MIGRATION_STARTED = 1;
        STATE_REPLICATION_IN_PROGRESS = 2;
        STATE_WAITING_FOR_HANDOVER = 3;
        STATE_HANDOVER_IN_PROGRESS = 4;
        STATE_READY_FOR_CONFIRMATION = 5;
        STATE_COMPLETE = 6;
        STATE_FAILED = 7;
        STATE_ABORT_IN_PROGRESS = 8;
        STATE_ABORTED = 9;
    }
}

message MigrationReplica {
    // The id of this replica. Indicates whether the replica is on the source
    // or destination side of the migration.
    string id = 1; // e.g. "source" / "target"
    // The state of this replica.
    State state = 2;

    enum State {
        STATE_UNSPECIFIED = 0;
        STATE_ACTIVE = 1;
        STATE_PASSIVE_OUT_OF_SYNC = 2;
        STATE_PASSIVE_IN_SYNC = 3;
        // If aborted migration, or if replication failed.
        STATE_ABANDONED = 4;
    }
}
message HandoverNamespaceRequest {
    // The migration id.
    string id = 1;
    // The id of replica to make active.
    string to_replica_id = 2;
    // The id to use for this async operation.
    // Optional, if not provided a random id will be generated.
    string async_operation_id = 3;
}

message HandoverNamespaceResponse {
    // The async operation.
    temporal.api.cloud.operation.v1.AsyncOperation async_operation = 1;
}
message ConfirmMigrationRequest {
    // The migration id.
    string migration_id = 1;
    // The id to use for this async operation.
    // Optional, if not provided a random id will be generated.
    string async_operation_id = 2;
}

message ConfirmMigrationResponse {
    // The async operation.
    temporal.api.cloud.operation.v1.AsyncOperation async_operation = 1;
}
message AbortMigrationRequest {
    // The migration id.
    string migration_id = 1;
    // The id to use for this async operation.
    // Optional, if not provided a random id will be generated.
    string async_operation_id = 2;
}

message AbortMigrationResponse {
    // The async operation.
    temporal.api.cloud.operation.v1.AsyncOperation async_operation = 1;
}

Pre-requisites & limitations

Refer to this document for pre-requisites and limitations of using this migration tool.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions