Skip to content

Migrations may fail if started from replica #56

Closed
@opomuc

Description

@opomuc

So, exact bug reproduction scenario is thw following:

  • some time-consuming ddl-changing migration is added in production (e. g. changing space format on an empty space)
  • admin triggers migrator.up on a replica node (let's call is coordinator)
  • coordinator triggers migrations on all replicaset leaders (including leader of its own replicaset)
  • all leaders apply migrations and respond 'ok' to coordinator
  • space format change is sent to coordinator from its leader via replication channel, and it takes considerable time to apply, so coordinator's actual ddl remains unchanged for some time
  • upon receiving 'ok's from all leaders, coordinator triggers config.patch_clusterwide with supposedly new ddl schema, which it collects from local spaces
  • BUT since schema is not yet changed on coordinator itself (since coordinator is async lagging replica), it tries to apply "old" ddl, and fails total operation with smth like CheckSchemaError: Incompatible schema: spaces["somespace"] //format/3 (expected table, got nil)

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingcustomer

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions