Skip to content

Conversation

@booxter
Copy link
Contributor

@booxter booxter commented Apr 19, 2025

This is a completion of a workaround for flash-attn. The initial attempt didn't work because we still used tox to run smoke tests, which means another venv was created that did not include flash-attn.

To achieve sharing the venv for both torch installation and smoke tests, we are using tox-current-env plugin here.

@mergify mergify bot added CI/CD Affects CI/CD configuration ci-failure labels Apr 19, 2025
Copy link
Member

@RobotSail RobotSail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@booxter Is the problem that this fixes only in CI? Or would the end user be expected to run tox in the same way (i.e. having a venv)?

@booxter
Copy link
Contributor Author

booxter commented Apr 28, 2025

@RobotSail until Dao-AILab/flash-attention#958 is merged (and they don't seem keen to do so for reasons that I don't agree with), every project that depends on flash-attn has to pre-install their build dependencies, particularly pytorch. tox doesn't allow to order installation of dependencies to guarantee that torch is always installed before flash-attn, so I think any tox user is potentially affected by this. tox being a development tool and not something an end-user is supposed to use in production is probably a mitigating factor, but yes - this sucks. :(

@booxter
Copy link
Contributor Author

booxter commented Apr 28, 2025

@Mergifyio rebase

@mergify
Copy link
Contributor

mergify bot commented Apr 28, 2025

rebase

✅ Branch has been successfully rebased

@booxter booxter force-pushed the ihrachyshka-tox-flash-attn-venv branch from d644d8e to 2d0ed54 Compare April 28, 2025 17:04
@mergify mergify bot removed the ci-failure label Apr 28, 2025
@booxter
Copy link
Contributor Author

booxter commented Apr 28, 2025

@mergify mergify bot added the ci-failure label Apr 28, 2025
@booxter booxter force-pushed the ihrachyshka-tox-flash-attn-venv branch from 2d0ed54 to 25361bd Compare April 28, 2025 17:28
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025
@booxter booxter force-pushed the ihrachyshka-tox-flash-attn-venv branch from 25361bd to ee2403b Compare April 28, 2025 17:42
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025
@booxter booxter force-pushed the ihrachyshka-tox-flash-attn-venv branch from ee2403b to e221dc8 Compare April 28, 2025 20:12
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025
@booxter
Copy link
Contributor Author

booxter commented Apr 28, 2025

(The more I patch this, the more I don't like it. But let's see how horrible the final "fix" would look like...)

This is a completion of a workaround for flash-attn. The initial attempt
didn't work because we still used tox to run smoke tests, which means
another venv was created that did not include flash-attn.

To achieve sharing the venv for both torch+ installation and smoke
tests, we are using tox-current-env plugin here.

Signed-off-by: Ihar Hrachyshka <[email protected]>
@booxter booxter force-pushed the ihrachyshka-tox-flash-attn-venv branch from e221dc8 to 926fbfe Compare April 28, 2025 23:20
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 28, 2025
@booxter
Copy link
Contributor Author

booxter commented Apr 29, 2025

This run confirms that tox-current-env works now: https://github.com/instructlab/training/actions/runs/14719818289/job/41311408922

Now to decide if we want to see this solution. (If not, I'm all ears to hear about alternatives that would allow us to run smoke tests with [cuda] depenendencies that include the flash-attn library with broken build instructions.)

@booxter booxter marked this pull request as ready for review April 29, 2025 00:44
@mergify mergify bot added the one-approval label Apr 29, 2025
@mergify mergify bot removed the one-approval label Apr 29, 2025
@mergify mergify bot merged commit e902da1 into main Apr 29, 2025
10 of 12 checks passed
@mergify mergify bot deleted the ihrachyshka-tox-flash-attn-venv branch April 29, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CI/CD Affects CI/CD configuration ci-failure

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants