Skip to content

Wraping a function Sentry.GCPFunction.wrapEventFunction leads to premature function exit #3325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
3 tasks done
killthekitten opened this issue Mar 12, 2021 · 12 comments · Fixed by #3740
Closed
3 tasks done

Comments

@killthekitten
Copy link

killthekitten commented Mar 12, 2021

Package + Version

@sentry/serverless

Version:

6.2.2

Description

It looks like Sentry.GCPFunction.wrapEventFunction does not correctly signal function termination in async background cloud functions. The function exits prematurely while the async code continues running in the background. You can see this behavior in the logs:

Screenshot 2021-03-12 at 14 53 10

This is a follow up to an issue in the nodejs-pubsub client, where multiple users reported that pub/sub event functions operate with an unexpectedly high latency. In my case, removing the sentry wrapper solved the issue.

image

The access to the CPU is limited once the function terminates, and any code that keeps running in background takes minutes to execute or eventually times out.

It looks like only the pub/sub triggered functions are affected. The HTTP triggered functions wrapped in Sentry.GCPFunction.wrapHttpFunction work as expected.

package.json

{
  "name": "sentry-gcf-repro",
  "version": "1.0.0",
  "main": "function.js",
  "license": "MIT",
  "dependencies": {
    "@sentry/serverless": "^6.2.2"
  },
  "engines": {
    "node": ">= 14.15 <15"
  }
}

function.js

const Sentry = require("@sentry/serverless");

Sentry.GCPFunction.init({
  dsn: process.env.SENTRY_DSN,
  tracesSampleRate: 1.0,
});

const sleep = (ms) => {
  return new Promise((resolve) => {
    setTimeout(resolve, ms);
  });
}

const sayHello = async () => {
  await sleep(5000);
  console.log("Hello")
}

exports.sayHelloHandler = Sentry.GCPFunction.wrapEventFunction(
  sayHello
);

Steps to reproduce

You would need to create a topic, deploy the function and send a message to the topic:

gcloud pubsub topics create hello
gcloud functions deploy sayHello --trigger-topic hello --runtime nodejs14 --entry-point sayHelloHandler --region europe-west3
gcloud pubsub topics publish hello --message="{}"
@ace-n
Copy link

ace-n commented Mar 17, 2021

Speaking to the Cloud Functions side - it looks like Sentry's wrapEventFunction does not return a Promise. The Cloud Functions Node.js runtime expects promises to be returned by any functions that wait on them - otherwise, it has no way of knowing when your function is finished running.

More concretely:

exports.badFunction = (event) => {
  // BAD: not returning a promise
  return [Promise.resolve()]
}

exports.goodFunction = (event) => {
  // OK: returns a raw promise
  return Promise.resolve()
}

exports.badFunction = (event) => {
  // OK: also returns a promise
  return Promise.all([Promise.resolve()])
}

Suggested fix

I'm not very familiar with the Sentry code, so take this suggested fix with a large grain of salt:

This method appears to be returning a collection of Promises. Maybe that should be changed to return a single Promise via Promise.all?

@killthekitten
Copy link
Author

@ace-n Thanks for chiming in, I was thinking something along those lines too!

@danieljimeneznz
Copy link

danieljimeneznz commented May 9, 2021

Also facing this issue in @senty/serverless: "6.3.5" - wondering what's the progress on this?

@killthekitten This should be classified as a bug? (I hypothesize wrapHttpFunction only works correctly for promises as you have to call res.send to terminate the function).

Looking at @ace-n suggested change, seems like the right thing to do, granted the scope of the change to domainify isn't too wide, and I can come up with some valid unit tests, I'm tempted to give fixing it a go.

@killthekitten
Copy link
Author

killthekitten commented May 9, 2021

@danieljimeneznz I removed the wrapper for the time being, and I haven't seen anyone from the sentry team acknowledging the issue. It would be great if you could come up with a fix, or at least some tests 👍

@danieljimeneznz
Copy link

danieljimeneznz commented May 9, 2021

Seems like a pretty big bug to me... basically makes any GCP asynchronous functions unusable with sentry. In my case, I have a function starting up, and dying before one of the async tasks in the function resolves.

Started looking at this today, tests seem quite complex, and the code looks like its doing what its supposed to.
Interestingly enough it appears to convert the function into a callback based method, but looking at the code it would still correctly resolve the promise before calling the callback.

@danieljimeneznz
Copy link

@marshall-lee would you have any idea why the wrapper would not await for the promise to finish resolving?
Since you wrote most of that code, I'm wondering if you have any ideas?

Otherwise I'm quite happy to do my own investigation, and possible fix for this.

@danieljimeneznz
Copy link

danieljimeneznz commented May 10, 2021

Figured out the problem!

This wrapper converts whatever function its given into a function that has a signature:

(data: any, context: Context, callback: Function): void

However for some Cloud functions, specifically for new trigger services such as firestore, pubsub, https callables, etc. expect a promiselike to be returned and do not support callback based functions. i.e. they expect

(data: any, context: Context): Promise<any>

Based on this, I am guessing that the code starts putting async tasks on the micro task queue, and since there is no callback to call on completion, the function runner moves to the next tick, sees there are no macro tasks and kills the function, eventually having the promises resolve out of the function runner environment.

Now I know what the underlying issue is I can write a fix but keeping support for callbacks makes it a little more complex (I think I can also write a test for this now I understand it better).

I believe the best fix is to only return a callback based function when the passed in fn to _wrapEventFunction contains 3 arguments. - There are no bugs in the domainify or proxyFunction that I could see.

I'd be interested to see what the sentry team thinks of this, as dropping support for callbacks would make the code simpler but would technically be a breaking change.

@danieljimeneznz
Copy link

danieljimeneznz commented May 10, 2021

I have found a workaround for this bug, if you wrap the wrapped function in a new Promise and call the sentry wrapped function passing in the resolve function this works as expected i.e. for your hello function example above (which I tested in GCP):

/**
  * Possible callback calls will either be:
  * callback(null, value);        // Success - where 'value' is the return value of the wrapped function
  * callback(error, undefined);   // Error - where 'error' is the error thrown by the wrapped function
  */

exports.sayHelloHandler = (message, context) => new Promise((resolve, reject) => {
  const callback = (error: any | null, value?: any) => setImmediate(!error ? resolve(value) : reject(error));
  Sentry.GCPFunction.wrapEventFunction(sayHello)(message, context, callback);
});

Correct log output:
image

I haven't started a fix as I have found this workaround, which works for the time being. I will be open to giving a proper fix a go sometime I have some more bandwidth (I've spent most of a day digging the root cause out).

@dbousamra
Copy link

dbousamra commented May 19, 2021

This has just caught us out. We neglected to look too closely at the types/code. I think the docs need updating asap, as it's grossly misleading.

@danieljimeneznz
Copy link

For anyone using my workaround (@dbousamra), I have added a setImmediate to the callback.
This was added to ensure node completed its current event loop and ensure the call stack was basically empty before the promise was resolved. I added this due to repetitive function invocations causing a Maximum call stack size exceeded error (at the end of the function invocation) - functions were still working with the error correctly though.

@HazAT
Copy link
Member

HazAT commented Jun 24, 2021

We are working on a fix for this
#3740

@killthekitten
Copy link
Author

@HazAT thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants