-
Notifications
You must be signed in to change notification settings - Fork 103
Osquery runtime refactoring, bugfixes and stability improvements #176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
osquery/runtime.go
Outdated
|
||
// Error case | ||
err := r.instance.eg.Wait() | ||
level.Error(r.instance.logger).Log( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to be controversial, but let's keep the logs as level.Info
with err
as a message. field
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought we were doing:
level.Info(logger).Log(
"msg", "this is the message about what went wrong",
"err", err, // this may also be a wrapped error. if so, you can leave off msg.
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I clarify... Are you both suggesting the same thing, or is there some difference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we're suggesting the same thing.
cmd/launcher/launcher.go
Outdated
@@ -147,7 +147,7 @@ func main() { | |||
defer ext.Shutdown() | |||
|
|||
// Start the osqueryd instance | |||
instance, err := osquery.LaunchOsqueryInstance( | |||
runner := osquery.NewRunner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm interested as to why you broke up LaunchOsqueryInstance
into this two step process of creating a runner and explicitly starting it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels a bit more natural to me, and is patterned after https://golang.org/pkg/net/http/#Server. I don't think it has to be this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be LaunchOsqueryInstance
for a few reasons.
- There is no time in the codebase where you start a runner and don't immediately start it.
- It is not possible to create a runner, pass it to two functions, and have them both start it with different instances.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decoupling the runner from the Start method has the advantage of making the whole thing easier to test.
I do think it should be called Run not Start, but I agree with the separation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really care and I'm happy to agree for the sake of not talking about it further, but tests should use the private interface, the public interface should reflect the expected external usage of the library in my opinion.
As far as I can tell, there is no advantage to launching the instance in two steps from this perspective. Does having a "runner" allow you to do anything different or more advanced?
osquery/runtime.go
Outdated
// the following are instance artifacts that are created and held as a result | ||
// of launching an osqueryd process | ||
eg *errgroup.Group |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: can this be called something more descriptive than eg
cmd/launcher/launcher.go
Outdated
@@ -147,7 +147,7 @@ func main() { | |||
defer ext.Shutdown() | |||
|
|||
// Start the osqueryd instance | |||
instance, err := osquery.LaunchOsqueryInstance( | |||
runner := osquery.NewRunner( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be LaunchOsqueryInstance
for a few reasons.
- There is no time in the codebase where you start a runner and don't immediately start it.
- It is not possible to create a runner, pass it to two functions, and have them both start it with different instances.
// updated wholesale without updating the actual OsqueryInstance pointer which | ||
// may be held by the original caller. | ||
type osqueryInstanceFields struct { | ||
type osqueryOptions struct { | ||
// the following are options which may or may not be set by the functional | ||
// options included by the caller of LaunchOsqueryInstance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update this comment with the new name is LaunchOsqueryInstance is decided to not be used.
osquery/runtime.go
Outdated
for _, opt := range opts { | ||
opt(o) | ||
} | ||
r.instanceLock.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this lock being held if r.instance.eg.Wait()
will block forever (if there are no errors)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r.instance.Wait()
will wait until all of the async routines have completed, and should not block indefinitely. We only reached this point after a write was made to r.instance.doneCtx.Done()
, which indicates to all the async threads that they should exit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I may be able to move this below line 338 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But don't the async routines that are started in the err group loop indefinitely until Shutdown()
is called? Which also requires the acquisition of this lock?
I haven't tested this, but as it is now, once this goroutine starts, won't one not be able to call Shutdown()
, Restart()
, or Healthy()
since they all require the acquisition of r.instanceLock
I'm sorry if I'm missing something obvious here, I'm just confused and trying to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is only hit after some error (or call to Shutdown
or Restart
) causes the r.instance.doneCtx.Done()
channel to close (causing the channel read to complete on line 325). Moving the lock below 338 may help avoid any deadlock that could be caused by holding the lock during the wait.
osquery/runtime.go
Outdated
} | ||
go func() { | ||
for { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this in an infinite for loop if r.instance.eg.Wait()
is supposed to block until something goes wrong
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The loop ensures that we start a new instance each time the existing instance dies, until the shutdown channel is closed (Shutdown
was called).
prevent leaking of goroutines (relative to existing implementation).
(previously only the extension server was checked).
os/exec: Inconsistent behaviour in
exec.Cmd.Wait
golang/go#20730).