Notifications for event type "monitor" are generating with wrong instance ID #1595

lelenanam · 2017-11-29T17:05:17Z

eventmanager receives requests with event type monitor and wrong instanceID.
When eventmanager tries to get instance name from user service:
https://github.com/weaveworks/notification/blob/4e1d40e0cba471d0393e4b657e31b6291aaa6d3f/eventmanager/manager.go#L349
it receives the error "Not found".

Maybe instance was deleted.
logs from prod eventmanager:

WARN: 2017/11/23 10:40:22.517185 instance name for ID 3436 not found
WARN: 2017/11/23 10:40:23.312762 instance name for ID 5681 not found
WARN: 2017/11/23 10:40:27.137901 instance name for ID 3097 not found
WARN: 2017/11/23 10:40:27.448799 instance name for ID 3183 not found
WARN: 2017/11/23 10:40:27.714566 instance name for ID 7194 not found
WARN: 2017/11/23 10:40:28.243433 instance name for ID 849 not found
WARN: 2017/11/23 10:40:28.464883 instance name for ID 4788 not found
WARN: 2017/11/23 10:40:32.132018 instance name for ID 3097 not found
WARN: 2017/11/23 10:40:32.476875 instance name for ID 6159 not found
WARN: 2017/11/23 10:40:35.957953 instance name for ID 3436 not found
WARN: 2017/11/23 10:40:36.520205 instance name for ID 4127 not found

for dev:

WARN: 2017/11/29 02:06:36.931257 instance name for ID 100 not found for event type monitor
WARN: 2017/11/29 02:06:41.933785 instance name for ID 100 not found for event type monitor
WARN: 2017/11/29 02:06:51.900865 instance name for ID 100 not found for event type monitor
WARN: 2017/11/29 02:06:56.892187 instance name for ID 100 not found for event type monitor

Cortex shouldn't generate events with nonexistent instanceID.

The text was updated successfully, but these errors were encountered:

rndstr · 2017-12-06T12:06:30Z

I see 7.2k errors/hour on prod

aaron7 · 2017-12-13T18:29:53Z

Moving conversation from slack

Question: How should the cortex ruler, notifications service, flux etc know whether an instance has been deleted or not?

One option would be to use the deleted flag set in the instances table as the single source of information as to whether an instance is active or not. Services would use the users client (or an agreed API in the case of open source projects) to determine whether an instance is active or not. This can be cached on the client side.

bboreham · 2017-12-13T18:31:20Z

Seems a bit poor to have to poll the "is it deleted" service; I would just remove the rules for deleted instances via some sync process.

lelenanam · 2017-12-13T19:08:29Z

Can we do it in users service after instance deletion via cortex's API to delete configs?
@bboreham what is "some sync process", is it separate service?

aaron7 · 2017-12-13T19:37:12Z

Seems a bit poor to have to poll the "is it deleted" service; I would just remove the rules for deleted instances via some sync process.

Instances can be inactive for other reasons. One would be when a trial expires. Should the ruler stop running for these instances as well?
Can instances be undeleted? If so, should we be deleting the rules?
A sync process would be another thing to manage and ensure is running. It's duplicating state across DBs whereas pinging an "instance details" endpoint feels less error prone and many services already talk to the users service.

jml · 2017-12-14T13:17:40Z

Presenting the options as I understand them, trying to write trade-offs for each up fairly.

(Apologies for the formality, but I don't really know how to do this without writing a design doc.)

Problem

Instances can be deleted / disabled
Only the users service knows about this
Services which perform actions on behalf of active instances need to stop doing those actions, to avoid unnecessary work and log spam

Options

1. Provide an "is deleted?" endpoint

As described in #1595 (comment)

users would provide an "is deleted" endpoint
each project (flux, cortex, notifications) would query this endpoint and then update their own databases to remove or disable configs for deleted instances

Pros:

Don't have to manage extra sync process

Cons:

Open source projects have to know about proprietary users service

2. Projects have "instance deleted" endpoint

each project (flux, cortex, notifications) provides an "instance deleted" endpoint that can be called by a third party
this endpoint would remove or disable configs for deleted instances
we would have a system internally that periodically
- queries the users service to find recently deleted instances
- calls this endpoint on each of our systems (flux, cortex, notifications)

Pros:

cortex could delete its useful configs service
open source projects maintain conceptual integrity

Cons:

have to duplicate this for each endpoint
we would need to run a synchronisation job

3. Central configs service

We have a single, proprietary "all-singing, all dancing" configs service
Configs for flux, cortex, notifications are stored there
It would have its own single database
We would migrate existing configs from their respective DBs there
It would have dedicated get & set endpoints for flux, cortex, notifications
- enables validation
- allows for different consistency requirements (e.g. cortex wants CAS, CAS operation on configs cortexproject/cortex#330, not sure if others need it)
Would still need to provide "rump" configs service for Cortex, so that it actually meaningfully functions as a standalone project
- jml unsure whether this applies to flux

Pros:

Wouldn't have to duplicate synchronization effort for each project (flux, notifications, cortex)

Cons:

Have to provide rump configs services to maintain conceptual integrity

3.1. Combine with users service

Rather than this all-singing, all-dancing configs service be a separate service with a separate DB, it would just be a part of the users service, with the configs stored in the users DB.

Pros:

fewer services = less management overhead
can disable configs in the same transaction as instance being disabled, removing a host of consistency issues

3.2. Standalone service

Standalone configs service with its own database.

jml can't think of advantages to this.

Common components

Whether it's deleted, disabled, deactivated, etc. AFAICT, each option handles these cases equally well
Whichever option, the Cortex ruler is still going to need logic to remove deleted instances from its in-memory scheduling queue

bboreham · 2017-12-14T13:25:05Z

the Cortex ruler is still going to need logic to remove deleted instances from its in-memory scheduling queue

If it gets an update saying the set of rules for an instance is now empty, that should have the desired effect of stopping it doing any work. It already handles this kind of update. The instance object would disappear on next restart.

bboreham · 2017-12-14T13:26:40Z

Instances can be inactive for other reasons. One would be when a trial expires. Should the ruler stop running for these instances as well?

That one seems an obvious yes to me.

jml · 2017-12-14T14:24:06Z

My own preferences are for 2 or 3.1.

I'm considering cortexproject/cortex#620 blocked on this. If we go with 2, I want to keep its current direction. If we go with either 3 option, I'll move the new endpoints back to the configs service.

leth · 2017-12-14T14:31:54Z

Instances can be inactive for other reasons. One would be when a trial expires. Should the ruler stop running for these instances as well?

If we do this, I suggest we use 'RefuseDataUpload' to stop rule evaluation, instead of trial expiry.
The intent of the flag is to provide some leeway after trial expiry for people to be able to give us money without exepriencing any data loss.

lelenanam · 2017-12-14T19:16:08Z

@jml @bboreham What about combination 2 and 3.1?

each project (flux, cortex, notifications) provides an "instance deleted" endpoint that can be called by a third party
this endpoint would remove or disable configs for deleted instances
users service would call these endpoints after instance deletion and by timer to delete/disable configs for deleted instances (to minimize the interval of unclean state)

If users service cannot call endpoint (service unavailable or users were restarted):

add flag to DB clean_up: true / false
add goroutine with the ticker (1 minute or so) to users service which checks if everything is cleaned up (SELECT id WHERE deleted_at IS NOT NULL AND clean_up == false) and call endpoints again
if everything is cleaned up, set flag clean_up = true for this instanceID else do "clean up" again
check this flag on users restart
services can response 200 if clean up is done and 404 if not found (data was already cleaned up before) so we can do clean up until all services response with 200 or 404

Pros:

don't have to manage extra sync process
open source projects maintain conceptual integrity
fewer services = less management overhead
easy to implement

bboreham · 2017-12-15T16:39:08Z

Seems workable. The endpoints to call should be configurable, on the expectation we will add more.
I think your "goroutine with ticker" is the same thing as an "extra sync process".

squaremo · 2017-12-15T16:41:44Z

Flux has an events history database that is amenable to 2., but rather less so to 3.
It also keeps a "last connected" value in a config database (which still has old flux configs in it, which will all go away at some point real soon now).

squaremo · 2017-12-15T16:47:52Z

@lelenanam's design is pretty sound I reckon. A subtly that it nicely accounts for is that cleanup might take longer than a request timeout, so the cleaner-up service should keep asking until the cleanee gives a definitive answer. (I'd suggest a 202 Accepted is mandated unless the cleanee definitely finished cleaning stuff up).

jml · 2017-12-15T17:02:51Z

Cool. I also like @lelenanam's design.

I think we've got enough consensus. Full steam ahead!

Also, this unblocks cortexproject/cortex#620, which makes me happy.

bboreham · 2017-12-15T17:07:32Z

Note we need to support both "deleting" and "undeleting", on the expectation that some people will pay up after being blocked, or call support after hitting the wrong button.

leth · 2017-12-15T17:13:49Z

Note we need to support both "deleting" and "undeleting", on the expectation that some people will pay up after being blocked, or call support after hitting the wrong button.

Within a grace period of Ndays, after which no "undeleting" is possible? Or just forever?

bboreham · 2017-12-15T17:19:39Z

On the assumption we are going to get asymptotically-increasing numbers of customers, it should not be forever.

bboreham · 2018-01-03T14:13:26Z

Pending release of the automated solution, I would like to fix manually in prod.

Check in users_vpc I have a good set of instances:

users_vpc=> select id, name, created_at, updated_at, deleted_at from organizations where id::integer in (7194, 4127, 7872, 3436, 6429, 3097, 7948, 3183, 7030, 6881, 7851, 7215, 6159);
  id  |           name            |          created_at           |          updated_at           |          deleted_at           
------+---------------------------+-------------------------------+-------------------------------+-------------------------------
 6881 | dokku.alemayhu.com        | 2017-10-27 07:56:27.382758+00 | 2017-10-27 07:56:27.413257+00 | 2017-12-20 08:49:10.442969+00
 6159 | Demo cluster              | 2017-09-25 15:46:08.626422+00 | 2017-09-25 15:46:08.643229+00 | 2017-10-04 15:04:00.317968+00
 7948 | CDA_AWS                   | 2018-01-03 09:22:17.856354+00 | 2018-01-03 09:22:17.832977+00 | 2018-01-03 10:08:23.58787+00
 7030 | Untitled Cluster          | 2017-11-05 20:13:41.880082+00 | 2017-11-05 20:13:41.900732+00 | 2017-12-05 17:09:13.798918+00
 7851 | Test Kubernetes 1.9 - AWS | 2017-12-21 13:29:45.775124+00 | 2017-12-21 13:29:45.772331+00 | 2017-12-21 14:43:33.689161+00
 7872 | Fragrant Water 98         | 2017-12-23 05:52:51.323363+00 | 2017-12-23 05:52:51.323429+00 | 2017-12-23 07:15:51.045066+00
 6429 | Untitled Cluster          | 2017-10-11 14:34:42.004431+00 | 2017-10-11 14:34:42.026041+00 | 2017-12-21 13:40:17.687019+00
 7215 | Infotec Cluster           | 2017-11-15 22:52:55.382829+00 | 2017-11-15 22:52:55.393004+00 | 2017-11-30 02:14:10.184838+00
 7194 | Dev                       | 2017-11-14 18:02:08.448751+00 | 2017-11-14 18:02:08.518782+00 | 2017-11-20 21:11:24.586089+00
 3183 | ufleet                    | 2017-04-20 02:25:02.491618+00 | 2017-04-20 02:25:02.505831+00 | 2017-08-31 13:30:54.639802+00
 3436 | Untitled Cluster          | 2017-05-09 08:07:59.208617+00 | 2017-05-09 08:07:59.23524+00  | 2017-05-09 08:26:20.480151+00
 3097 | develop                   | 2017-04-12 22:58:17.933778+00 | 2017-04-12 22:58:17.939226+00 | 2017-08-23 20:22:00.793044+00
 4127 | WebStorage                | 2017-06-15 02:25:02.579848+00 | 2017-06-15 02:25:02.591922+00 | 2017-07-04 06:49:58.021162+00
(13 rows)

Proposed SQL to run in configs_vpc:

begin;
update configs set deleted_at=now() where subsystem='cortex' and owner_type='org' and owner_id::integer in (7194, 4127, 7872, 3436, 6429, 3097, 7948, 3183, 7030, 6881, 7851, 7215, 6159);      
-- check 13 rows affected
commit;

awh · 2018-01-03T14:17:54Z

LGTM

bboreham · 2018-01-03T15:10:19Z

I had to restart the ruler to get it to notice the deleted configs. Bit of background at cortexproject/cortex#629 (comment)

bboreham · 2018-01-15T10:55:58Z

There are a few more dead instances now:

users_vpc=> select id, name, created_at, updated_at, deleted_at from organizations where id::integer in (7880,7889,7937,7950,7960,7965,7977,7987,7988,7992,8004,8008,8010,8021,8069,8086,8097);
  id  |        name         |          created_at           |          updated_at           |          deleted_at           
------+---------------------+-------------------------------+-------------------------------+-------------------------------
 8021 | k8skaldera          | 2018-01-07 23:09:19.595403+00 | 2018-01-07 23:09:19.581745+00 | 2018-01-07 23:16:49.372384+00
 8097 | Loud Smoke 32       | 2018-01-12 04:26:40.449985+00 | 2018-01-12 04:26:40.454457+00 | 2018-01-12 04:49:51.439579+00
 8086 | Kubernetes demo     | 2018-01-11 09:19:21.475861+00 | 2018-01-11 09:19:21.474558+00 | 2018-01-11 10:59:41.534514+00
 8069 | Cold Meadow 39      | 2018-01-10 17:19:16.014111+00 | 2018-01-10 17:19:16.019992+00 | 2018-01-11 01:49:19.81741+00
 7937 | Pale Sky 66         | 2018-01-02 11:51:25.718927+00 | 2018-01-02 11:51:25.701633+00 | 2018-01-03 14:50:09.061251+00
 7950 | CDA_AWS             | 2018-01-03 10:10:29.394609+00 | 2018-01-03 10:10:29.396749+00 | 2018-01-03 11:17:25.169663+00
 7960 | ysung_project       | 2018-01-04 03:53:57.597988+00 | 2018-01-04 03:53:57.602639+00 | 2018-01-06 03:46:53.459064+00
 7965 | kube-1.9            | 2018-01-04 10:39:33.065076+00 | 2018-01-04 10:39:33.03896+00  | 2018-01-04 13:34:57.782992+00
 7992 | Dry Glitter 09      | 2018-01-05 18:07:20.301864+00 | 2018-01-05 18:07:20.308132+00 | 2018-01-06 10:52:00.230072+00
 7987 | Katacoda            | 2018-01-05 13:37:16.411692+00 | 2018-01-05 13:37:16.398824+00 | 2018-01-07 23:09:02.52032+00
 7977 | Natural Mountain 08 | 2018-01-05 05:20:11.955521+00 | 2018-01-05 05:20:11.95937+00  | 2018-01-09 12:07:39.507649+00
 7988 | Weathered Pond 87   | 2018-01-05 14:02:11.305702+00 | 2018-01-05 14:02:11.295524+00 | 2018-01-05 14:23:57.101085+00
 8010 | weave cloud cluster | 2018-01-07 03:48:40.665608+00 | 2018-01-07 03:48:40.65374+00  | 2018-01-08 08:07:58.192334+00
 8004 | Lingering Nebula 82 | 2018-01-06 12:28:42.290327+00 | 2018-01-06 12:28:42.296545+00 | 2018-01-06 15:07:05.740274+00
 8008 | Summer Pulsar 96    | 2018-01-07 01:11:22.317214+00 | 2018-01-07 01:11:22.321397+00 | 2018-01-08 21:34:25.892606+00
 7880 | Katacoda            | 2017-12-24 18:33:22.742926+00 | 2017-12-24 18:33:22.745753+00 | 2018-01-10 19:18:03.240745+00
 7889 | CDA                 | 2017-12-26 09:20:07.345884+00 | 2017-12-26 09:20:07.346221+00 | 2018-01-03 09:21:37.127121+00
(17 rows)

configs_vpc=> select id, owner_id, updated_at from configs where owner_id::integer in (7880,7889,7937,7950,7960,7965,7977,7987,7988,7992,8004,8008,8010,8021,8069,8086,8097) order by owner_id;
  id  | owner_id |          updated_at           
------+----------+-------------------------------
 4342 | 7880     | 2017-12-24 18:41:40.058104+00
 4345 | 7889     | 2017-12-26 09:45:41.081177+00
 4369 | 7937     | 2018-01-03 11:46:45.975614+00
 4368 | 7950     | 2018-01-03 10:53:53.484958+00
 4382 | 7960     | 2018-01-04 06:08:16.548535+00
 4385 | 7965     | 2018-01-04 10:40:24.588699+00
 4416 | 7977     | 2018-01-05 05:21:55.965438+00
 4445 | 7987     | 2018-01-05 13:38:31.80088+00
 4446 | 7988     | 2018-01-05 14:04:05.802222+00
 4447 | 7988     | 2018-01-05 14:04:11.778524+00
 4470 | 7992     | 2018-01-06 08:02:17.516438+00
 4473 | 8004     | 2018-01-06 12:30:34.597514+00
 4475 | 8008     | 2018-01-07 01:16:54.99508+00
 4477 | 8010     | 2018-01-07 16:12:09.772199+00
 4482 | 8021     | 2018-01-07 23:09:57.181069+00
 4590 | 8069     | 2018-01-11 01:29:45.683579+00
 4596 | 8086     | 2018-01-11 09:26:52.512932+00
 4628 | 8097     | 2018-01-12 04:27:47.769741+00
(18 rows)

Proposed bandaid to run in configs_vpc:

begin;
update configs set deleted_at=now() where subsystem='cortex' and owner_type='org' and owner_id::integer in (7880,7889,7937,7950,7960,7965,7977,7987,7988,7992,8004,8008,8010,8021,8069,8086,8097);
-- check 18 rows affected
commit;

bboreham · 2018-01-25T11:24:41Z

select id, name, created_at, updated_at, deleted_at from organizations where id::integer in (8221,7798,8146,8150,8128,4576,8125,8250,8151,8143,7963,7966,8205,7986,8141,8118,8222,8221,8203,7964,7892,8115,8208,8122,8038,8276,8051,8116,7997,8194,7991,8205,7978,8206,8028,8146);
  id  |             name              |          created_at           |          updated_at           |          deleted_at           
------+-------------------------------+-------------------------------+-------------------------------+-------------------------------
 8194 | Joyful Frost 30               | 2018-01-18 10:22:41.556206+00 | 2018-01-18 10:22:41.556017+00 | 2018-01-23 09:13:06.315787+00
 7892 | dev-mco                       | 2017-12-26 16:06:43.142625+00 | 2017-12-26 16:06:43.143551+00 | 2018-01-17 15:05:38.548542+00
 8203 | Matthias' GKE cluster-3       | 2018-01-18 17:42:22.916803+00 | 2018-01-18 17:42:22.908651+00 | 2018-01-18 18:14:58.607752+00
 8028 | k8skaldera                    | 2018-01-08 11:51:08.18631+00  | 2018-01-08 11:51:08.181953+00 | 2018-01-09 06:44:29.875469+00
 8205 | Continuous Delivery Lab       | 2018-01-18 18:15:17.959194+00 | 2018-01-18 18:15:17.955281+00 | 2018-01-18 18:44:02.054871+00
 8206 | Matthias' GKE cluster-4       | 2018-01-18 20:14:46.338147+00 | 2018-01-18 20:14:46.339742+00 | 2018-01-18 20:25:37.324175+00
 8208 | Matthias' GKE cluster-5       | 2018-01-18 20:31:49.395869+00 | 2018-01-18 20:31:49.395924+00 | 2018-01-18 20:43:01.147786+00
 8038 | Proud Dust 26                 | 2018-01-08 21:35:13.120613+00 | 2018-01-08 21:35:13.118286+00 | 2018-01-10 16:17:20.533928+00
 8221 | Dark Haze 49                  | 2018-01-19 18:30:24.767358+00 | 2018-01-19 18:30:24.771951+00 | 2018-01-19 18:52:42.583574+00
 8222 | Solitary Hill 93              | 2018-01-19 18:49:16.312972+00 | 2018-01-19 18:49:16.315565+00 | 2018-01-19 19:06:37.014265+00
 8051 | Katacoda                      | 2018-01-09 18:04:03.217683+00 | 2018-01-09 18:04:03.181472+00 | 2018-01-22 21:31:25.741103+00
 8250 | Troubleshooting Dashboard Lab | 2018-01-22 08:58:05.54134+00  | 2018-01-22 08:58:05.545126+00 | 2018-01-23 07:31:52.886287+00
 7798 | Little Dust 29                | 2017-12-18 17:26:39.332012+00 | 2017-12-18 17:26:39.050603+00 | 2018-01-18 16:41:46.905568+00
 8276 | Billowing Violet 92           | 2018-01-23 11:37:27.21717+00  | 2018-01-23 11:37:27.215747+00 | 2018-01-23 12:13:13.165184+00
 7963 | Restless Blossom 41           | 2018-01-04 08:15:53.045786+00 | 2018-01-04 08:15:53.049369+00 | 2018-01-06 07:02:07.799174+00
 7964 | Rough Water 56                | 2018-01-04 08:39:17.952285+00 | 2018-01-04 08:39:17.947623+00 | 2018-01-17 14:59:35.344015+00
 7966 | Scaleway K8S v1.9             | 2018-01-04 11:33:00.440643+00 | 2018-01-04 11:33:00.423112+00 | 2018-01-16 09:39:53.281618+00
 8141 | Falling Meteor 51             | 2018-01-16 09:21:33.347285+00 | 2018-01-16 09:21:33.351855+00 | 2018-01-16 12:05:45.068978+00
 8118 | taius-weavenet-01             | 2018-01-14 06:37:05.574075+00 | 2018-01-14 06:37:05.573295+00 | 2018-01-17 02:38:36.089439+00
 8115 | Taius Weave                   | 2018-01-14 02:11:04.811992+00 | 2018-01-14 02:11:04.817304+00 | 2018-01-14 06:36:38.41711+00
 8116 | TTYS0                         | 2018-01-14 02:43:48.76743+00  | 2018-01-14 02:43:48.772683+00 | 2018-01-14 16:10:54.922181+00
 8122 | Lively River 48               | 2018-01-14 17:50:42.094882+00 | 2018-01-14 17:50:42.094592+00 | 2018-01-14 20:16:04.879431+00
 8125 | Delicate Cloud 75             | 2018-01-14 22:05:15.561556+00 | 2018-01-14 22:05:15.562532+00 | 2018-01-23 20:15:05.662821+00
 7978 | Ancient Glitter 40            | 2018-01-05 06:07:21.431979+00 | 2018-01-05 06:07:21.436613+00 | 2018-01-05 06:44:21.357913+00
 8128 | Misty Lake 38                 | 2018-01-15 03:30:45.794722+00 | 2018-01-15 03:30:45.799824+00 | 2018-01-17 12:54:13.705835+00
 8143 | imp-clus-eu-01                | 2018-01-16 09:43:29.729405+00 | 2018-01-16 09:43:29.731379+00 | 2018-01-17 14:17:08.929985+00
 7986 | Cold Snow 06                  | 2018-01-05 12:59:54.122827+00 | 2018-01-05 12:59:54.127882+00 | 2018-01-05 13:36:56.184495+00
 7991 | Bold Rain 29                  | 2018-01-05 18:04:38.667873+00 | 2018-01-05 18:04:38.674139+00 | 2018-01-22 21:32:42.715592+00
 7997 | weave scope                   | 2018-01-06 06:37:21.731052+00 | 2018-01-06 06:37:21.732513+00 | 2018-01-06 06:55:35.426185+00
 8146 | Continuous Delivery Lab       | 2018-01-16 18:22:55.42937+00  | 2018-01-16 18:22:55.427111+00 | 2018-01-16 23:27:22.840116+00
 8150 | taius-wn-01                   | 2018-01-17 02:38:50.215728+00 | 2018-01-17 02:38:50.223807+00 | 2018-01-17 03:15:48.634601+00
 8151 | Container Firewalls Lab       | 2018-01-17 03:11:23.767276+00 | 2018-01-17 03:11:23.777558+00 | 2018-01-17 05:09:18.204714+00
 4576 | wanfuuyu                      | 2017-07-09 03:37:34.978414+00 | 2017-07-09 03:37:34.989376+00 | 2018-01-23 09:14:29.776885+00
(33 rows)

configs_vpc=> select id, owner_id, updated_at from configs where owner_id::integer in (8221,7798,8146,8150,8128,4576,8125,8250,8151,8143,7963,7966,8205,7986,8141,8118,8222,8221,8203,7964,7892,8115,8208,8122,8038,8276,8051,8116,7997,8194,7991,8205,7978,8206,8028,8146);
  id  | owner_id |          updated_at           
------+----------+-------------------------------
 4347 | 7892     | 2017-12-26 16:08:08.776413+00
 4354 | 4576     | 2017-12-29 13:01:39.415683+00
 4383 | 7963     | 2018-01-04 08:18:52.574655+00
 4384 | 7964     | 2018-01-04 08:40:04.244923+00
 4391 | 7966     | 2018-01-04 11:33:41.184624+00
 4417 | 7978     | 2018-01-05 06:09:36.2951+00
 4442 | 7986     | 2018-01-05 13:18:50.091111+00
 4469 | 7997     | 2018-01-06 06:40:17.098917+00
 4493 | 8028     | 2018-01-08 11:51:54.826364+00
 4560 | 7991     | 2018-01-09 17:13:18.660008+00
 4564 | 8051     | 2018-01-09 18:04:31.423364+00
 4583 | 8038     | 2018-01-10 16:10:53.705202+00
 4633 | 7798     | 2018-01-12 11:17:15.699857+00
 4670 | 8115     | 2018-01-14 02:12:15.412447+00
 4673 | 8116     | 2018-01-14 02:47:34.757343+00
 4675 | 8118     | 2018-01-14 07:39:09.492622+00
 4678 | 8122     | 2018-01-14 17:58:38.59462+00
 4680 | 8125     | 2018-01-14 22:07:09.220705+00
 4683 | 8128     | 2018-01-15 03:31:54.634187+00
 4728 | 8141     | 2018-01-16 09:22:44.509884+00
 4729 | 8143     | 2018-01-16 09:44:01.269041+00
 4771 | 8146     | 2018-01-16 18:23:17.443464+00
 4779 | 8150     | 2018-01-17 02:45:51.092203+00
 4780 | 8151     | 2018-01-17 03:15:02.667248+00
 4839 | 8194     | 2018-01-18 10:49:38.820544+00
 4865 | 8203     | 2018-01-18 17:47:12.606109+00
 4866 | 8205     | 2018-01-18 18:15:44.521633+00
 4886 | 8206     | 2018-01-18 20:15:35.017216+00
 4972 | 4576     | 2018-01-23 08:58:30.604403+00
 4888 | 8208     | 2018-01-18 20:32:37.601448+00
 4918 | 8221     | 2018-01-19 18:31:00.809736+00
 4919 | 8222     | 2018-01-19 18:49:36.995865+00
 4935 | 8250     | 2018-01-22 08:58:36.677845+00
 4977 | 8276     | 2018-01-23 11:40:59.018686+00
(34 rows)

Proposed bandaid to run in configs_vpc:

begin;
update configs set deleted_at=now() where subsystem='cortex' and owner_type='org' and owner_id::integer in (8221,7798,8146,8150,8128,4576,8125,8250,8151,8143,7963,7966,8205,7986,8141,8118,8222,8221,8203,7964,7892,8115,8208,8122,8038,8276,8051,8116,7997,8194,7991,8205,7978,8206,8028,8146);
-- check 34 rows affected
commit;

lelenanam · 2018-02-08T22:04:55Z

Fixed via cortexproject/cortex#629 and cortexproject/cortex#683

lelenanam mentioned this issue Nov 29, 2017

Notifications for event type "monitor" generates with wrong instance ID cortexproject/cortex#616

Closed

This was referenced Dec 20, 2017

Add organization cleaner #1652

Merged

Add endpoints to deactivate and restore config cortexproject/cortex#629

Merged

lelenanam self-assigned this Jan 19, 2018

rade added the bug broken end user functionality; not working as the developers intended it label Jan 22, 2018

lelenanam closed this as completed Feb 8, 2018

weaveworks-admin-bot unassigned lelenanam Jan 3, 2019

Notifications for event type "monitor" are generating with wrong instance ID #1595

Notifications for event type "monitor" are generating with wrong instance ID #1595

Comments

lelenanam commented Nov 29, 2017

rndstr commented Dec 6, 2017

Uh oh!

aaron7 commented Dec 13, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboreham commented Dec 13, 2017

Uh oh!

lelenanam commented Dec 13, 2017

Uh oh!

aaron7 commented Dec 13, 2017

Uh oh!

jml commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Options

1. Provide an "is deleted?" endpoint

2. Projects have "instance deleted" endpoint

3. Central configs service

3.1. Combine with users service

3.2. Standalone service

Common components

Uh oh!

bboreham commented Dec 14, 2017

Uh oh!

bboreham commented Dec 14, 2017

Uh oh!

jml commented Dec 14, 2017

Uh oh!

leth commented Dec 14, 2017

Uh oh!

lelenanam commented Dec 14, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboreham commented Dec 15, 2017

Uh oh!

squaremo commented Dec 15, 2017

Uh oh!

squaremo commented Dec 15, 2017

Uh oh!

jml commented Dec 15, 2017

Uh oh!

bboreham commented Dec 15, 2017

Uh oh!

leth commented Dec 15, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bboreham commented Dec 15, 2017

Uh oh!

bboreham commented Jan 3, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

awh commented Jan 3, 2018

Uh oh!

bboreham commented Jan 3, 2018

Uh oh!

bboreham commented Jan 15, 2018

Uh oh!

bboreham commented Jan 25, 2018

Uh oh!

lelenanam commented Feb 8, 2018

Uh oh!

aaron7 commented Dec 13, 2017 •

edited

Loading

jml commented Dec 14, 2017 •

edited

Loading

lelenanam commented Dec 14, 2017 •

edited

Loading

leth commented Dec 15, 2017 •

edited

Loading

bboreham commented Jan 3, 2018 •

edited

Loading