Skip to content

Conversation

@Babar
Copy link
Contributor

@Babar Babar commented Dec 9, 2021

Basically a re-do of #158.
This got rolled back because @Babar wrote 3 different PRs and one of them was bad.
This one wasn't. Technically it wasn't good either, as it only did test, not really untest, run or keeptesting.

This re-do fixes this.

Still the same features:

  • Can parametrize the number of threads in the config, or on the command line, default to the number of cores
  • Reports hosts depending whether they failed ssh or something else
  • Doesn't have any extra dependency, as the thread pool is done in a very simple way

Test Plan:
Ran it on my machine. Tried test and untest, both worked.

Signed-off-by: Olivier Raginel babar@fb.com

@jaymzh
Copy link
Collaborator

jaymzh commented Dec 9, 2021

IIRC, this broke internal wrappers... so CC @NaomiReeves on that.

But you definitely should make sure you've tested this with internal wrappers.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

The other PR broke the wrappers, from what I remember. The part where I moved the ARGS to be hostnames, 'cause some parts of our wrappers rely on TT just dropping unknown parameters.
I've fixed the wrapper too, but this is mostly to get speed improvements when testing a bunch of hosts.
Also, I've fixed the lint, so updating the PR shortly.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

Oh one thing I'd love feedback on:

  • How do people think we should deal with failures?
  • How do we deal with logs?
    In test, we report the failures and exit with:
  • 0 if no failure
  • 1 if all hosts failed and at least one failure was due to ssh connection issue
  • 2 otherwise (meaning at least one host failed for whatever reason and at least one succeeded)
  • 3 if all hosts failed because they were already tested by another user
    I kept this logic for the other modes (so untest, run and keeptesting), even though 3 cannot happen.

As for logging, run can get very noisy with all the threads writing at the same time, and not knowing what's what. So I started writing something to prefix the logs with the hostname. But then I went down the rabbit hole, so I'll try to finish this some other night.

@jaymzh
Copy link
Collaborator

jaymzh commented Dec 9, 2021

Oh one thing I'd love feedback on:

  • How do people think we should deal with failures?
  • How do we deal with logs?
    In test, we report the failures and exit with:
  • 0 if no failure
  • 1 if all hosts failed and at least one failure was due to ssh connection issue
  • 2 otherwise (meaning at least one host failed for whatever reason and at least one succeeded)
  • 3 if all hosts failed because they were already tested by another user
    I kept this logic for the other modes (so untest, run and keeptesting), even though 3 cannot happen.

As for logging, run can get very noisy with all the threads writing at the same time, and not knowing what's what. So I started writing something to prefix the logs with the hostname. But then I went down the rabbit hole, so I'll try to finish this some other night.

I'm glad you asked!

A lot of this is actually the kind of stuff that we (we = TM folks, not FB folks, so past-past-past-life) solved over in https://github.com/txcketmaster-xx/onall - it forks a bunch of ssh's, joins 'em all back, and handles output nicely. It has a few options - live, buffered, prefixed-by-hostname, in-a-directory one-file-per-host, etc. and I think it's a great model for how to manage output and make it usable for folks.

As for return values: 0 - no errors, we can all agree on that. But maybe some more nuance for errors:

  • All failed, all SSH failures (no test-setup/run failures)
  • All failed, all because in-testing
  • All failed, other test-setup/run failures
  • All failed, mix of issues
  • Some failed, all SSH related
  • Some failed, all because in-testing
  • Some failed, other test-setup/run failures
  • Some failed, mix of issues

Further, I might recommend outputing a JSON structure of host=>failure, perhaps on stderr, or perhaps to a configurable file.

@Babar
Copy link
Contributor Author

Babar commented Dec 9, 2021

Stupid question, but what's the difference between onall and parallel? They're bot written in perl, and seem to be doing pretty much the same thing. I mean, parallel --tag ssh {} .... Haven't read the entire code yet, but I know parallel's code fairly well, which is why I'm asking.

@Babar Babar force-pushed the parallel-testing-v2 branch from 8652279 to 7b35279 Compare April 21, 2024 07:32
Babar added 4 commits April 21, 2024 00:36
Summary:
Basically a re-do of facebook#158.
This got rolled back because @Babar wrote 3 different PRs and one of them was bad.
This one wasn't. Technically it wasn't good either, as it only did `test`, not really `untest`, `run` or `keeptesting`.

This re-do fixes this.

Still the same features:
* Can parametrize the number of threads in the config, or on the command line, default to the number of cores
* Reports hosts depending whether they failed ssh or something else
* Doesn't have any extra dependency, as the thread pool is done in a very simple way

Test Plan:
Ran it on my machine. Tried test and untest, both worked.

Reviewers: nreeves, dcavalca, jaymzh

Subscribers:

Tasks:

Tags:
Signed-off-by: Olivier Raginel <babar@fb.com>
Summary:
As upload can take a very long time, we can just have a dedicated thread for it
@Babar Babar force-pushed the parallel-testing-v2 branch from ed550cf to 76ef58a Compare April 21, 2024 07:36
Olivier Raginel added 3 commits April 21, 2024 00:59
@Babar Babar requested a review from gogsbread May 24, 2024 16:36
@dafyddcrosby dafyddcrosby requested review from dafyddcrosby and removed request for gogsbread November 21, 2025 20:52
@Babar Babar force-pushed the parallel-testing-v2 branch from 635b96b to a464fd4 Compare November 21, 2025 21:24
return hostname, host_thread[:status]
rescue TasteTester::Exceptions::AlreadyTestingError => e
logger.error("User #{e.username} is already testing on #{hostname}")
return hostname, 42
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider using constants for the status codes - will make future debugging much easier (thread debugging is so fun)

@Babar Babar changed the title Allow parallel testing, take #2 Allow parallel testing, take 2 - (was #158) Nov 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants