Skip to content

Conversation

@JamesKunstle
Copy link
Contributor

@JamesKunstle JamesKunstle commented Feb 4, 2025

adds a smoke-test.yaml workflow that's very similar to unit-test.yaml but calls a different tox+pytest path and requires a cuda runner.

@mergify mergify bot added CI/CD Affects CI/CD configuration testing Relates to testing ci-failure dependencies Pull requests that update a dependency file labels Feb 4, 2025
@JamesKunstle
Copy link
Contributor Author

This PR removes the reusable workflow contributions in #419 but keeps the smoke testing.

@JamesKunstle
Copy link
Contributor Author

This PR needs to go in so that we can test the manual functionality of the smoke-test addition. It works on a dev machine.



@pytest.fixture(scope="module")
def custom_tmp_path():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Apologies for the drive-by comment. The rest of this PR is over my head 😆 . I was just reading this PR and noticed something related to this fixture.)

The implementation of this method looks similar to pytest's own tmp_path fixture. I wondered if you could call custom_tmp_path(tmp_path) and simply return tmp_path here, or even use tmp_path_factory if you want a session-scoped fixture.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's been a while since I wrote this- I kinda remember doing exactly what you recommend and having the behavior be strange, and this was a convenient workaround. I don't really remember why though- I obviously should have made a note!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add a note to revisit and then it's ok

Copy link
Contributor

@cdoern cdoern left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

initial pass, some of these are nits, some are comments for the future. Overall I like the shape of this

./scripts/e2e-ci.sh -mp
# HACK(osilkin): The above test runs the medium workflow test which does not actually test the training library.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so, if this job doesn't use the training library should we remove it?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this uses --pipeline full which uses the full loop from ilab (sorry I did this 😆 )

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be tangential to this PR but might be nice to see a green CI on this by just removing the test.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do that in a follow-up so we're doing on thing at a time in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure!

on:
# TEMP - only runs when manually invoked
# and only runs against branches in the repo.
workflow_dispatch:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can add on_pull so that it runs and passes on this PR!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true- I should do that

ipykernel
jupyter

huggingface_hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want a version for that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if we know there's a minimal version that we have to have.

from instructlab.training.main_ds import run_training

MINIMAL_TRAINING_ARGS = {
"max_seq_len": 140, # this config fits nicely on 4xL40s and may need modification for other setups
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this makes sense, but if we want to use these PROFILES then we should scope work to lift and shift the system profiles from ilab to the training repo

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

true, let's scope that for future work



@pytest.fixture(scope="module")
def prepared_data_dir(custom_tmp_path: pathlib.Path) -> pathlib.Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docstring

return pathlib.Path(__file__).resolve()


def data_in_repo_path() -> pathlib.Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing docstring

return data_in_repo_path


def chat_template_in_repo_path() -> pathlib.Path:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

docstring plz

assert True


@pytest.mark.slow
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we aren't skipping this one but its also running training, why is that?

name: "Run smoke tests via Tox::pytest"
# These tests will be long running and require accelerated hardware.
# They will help to verify that the library is *functionally* correct but
# will not try to verify that the libary is *correct*.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: library


I don't understand the distinction. :) Functionally correct is correct; just maybe not "non-functionally" correct (benchmarks, code quality etc.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True- I was trying to express that this will run tests that execute the code but won't benchmark the output model to see if it's better. That'd be my definition of a correctness test for this.

- name: "Verify cuda environment is setup"
run: |
export CUDA_HOME="/usr/local/cuda"
export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: use $CUDA_HOME when defining LD_LIBRARY_PATH:

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$CUDA_HOME/lib64:$CUDA_HOME/extras/CUPTI/lib64"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gotcha, updated and changed the bash variable expansion to be correctly scoped

run: |
df -h
- name: "Run unit tests with Tox and Pytest"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not "unit"

ipykernel
jupyter

huggingface_hub
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only if we know there's a minimal version that we have to have.

@@ -0,0 +1,227 @@
# Standard
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: please rename test_smoke and test_unit into smoke and unit. It's duplicative.

pytest-asyncio
pytest-cov
pytest-html
-r requirements-dev.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should some of these go to requirements-dev.txt itself? (what's the distinction between putting dependencies here and in that requirements file?)

deps =
pytest
pytest-asyncio
pytest-cov
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we use cov / html anywhere?

tox.ini Outdated
# `--` delimits flags that are meant for tox vs. those that are positional arguments for
# the command that's being run in the environment.

# format, check, and linting targets don't build and install the project to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this comment belongs to the next section

[testenv:py3-smoke]
description = run accelerated smoke tests with pytest
passenv =
HF_HOME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could define it once in testenv section not to repeat

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm isolating this to the smoketest case specifically- I'm thinking that if it needs to be more broadly available we can change it as you say

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair but should you then passenv for unit tests like it's done here?

pytest-cov
pytest-html
-r requirements-dev.txt
-r requirements-cuda.txt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what if I want to run with a different accelerator? is it a choice to be made by the code that sets the test environment up? (github workflow?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think the workflow has to call specific code. This is set to cuda because we only have cuda runners

@booxter
Copy link
Contributor

booxter commented Apr 8, 2025

I'd like to see @courtneypacheco or @ktdreyer to confirm the AWS runner configuration here is valid. And if any of this actions code could be shifted to common actions defined for the whole org.

Test groups are divided into three categories:
1) unit tests
2) smoke tests
3) benchmark tests

They each have a dedicated tox entrypoint.

Adds outer product of [FSDP, DeepSpeed] x [CPU offload, Not] test
matrix.

DEEPSPEED TESTS ARE BROKEN IN THIS COMMIT and are marked xFail- to be
fixed in another, later commit.

Signed-off-by: James Kunstle <[email protected]>
# These tests will be long running and require accelerated hardware.

on:
# TEMP - only runs when manually invoked
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it? isn't pull_request_target now triggering it automatically?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could definitely update this comment to be more accurate, but I do think it could actually be valuable to have workflow_dispatch in addition to pull_request_target. If you have workflow_dispatch set, then you can trigger these smoke tests any time on existing branches. This could be particularly useful if you want to quickly rerun the smoke tests against a release branch when you know there has been a CUDA update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update the comment

[testenv:py3-smoke]
description = run accelerated smoke tests with pytest
passenv =
HF_HOME
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair but should you then passenv for unit tests like it's done here?

- name: "Install packages"
run: |
cat /etc/os-release
sudo dnf install -y gcc gcc-c++ make git python3.11 python3.11-devel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been trying to clean this up in instructlab itself (instructlab/instructlab#3140), so when I see it here, we can also simplify it:

Suggested change
sudo dnf install -y gcc gcc-c++ make git python3.11 python3.11-devel
sudo dnf install -y gcc gcc-c++ make git-core python3.11 python3.11-devel

@mergify mergify bot added the one-approval label Apr 10, 2025
users can dispatch a workflow that runs smoke tests against a selected
branch

Signed-off-by: James Kunstle <[email protected]>
@mergify mergify bot removed the ci-failure label Apr 10, 2025
@JamesKunstle JamesKunstle requested a review from cdoern April 10, 2025 21:44
@JamesKunstle JamesKunstle self-assigned this Apr 10, 2025
@JamesKunstle JamesKunstle merged commit fce38cf into main Apr 10, 2025
17 checks passed
@JamesKunstle JamesKunstle deleted the smoke-testing branch April 10, 2025 23:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration dependencies Pull requests that update a dependency file one-approval testing Relates to testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants