Skip to content

Add Falcon H1 model support #1616

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

HamzaYousLM
Copy link

This pull request adds support for the Falcon H1 model to the GPTQModel repository. The following changes have been made:

  • Added the falcon_h1 definition to the gptqmodel/models/definitions folder.
  • Imported the falcon_h1 model in gptqmodel/models/definitions/__init__.py.
  • Included falcon_h1 in the MODEL_MAP within gptqmodel/models/auto.py.

These updates enable the integration and usage of the Falcon H1 model within the framework.

@Qubitium
Copy link
Collaborator

Qubitium commented May 21, 2025

@HamzaYousLM Thanks for the PR! I just a few quick questions.

  1. is falcon h1 released to the public? I want to check on the modeling.py file to see if the correct order of layer modules is set.
  2. related to 1, right now gate up and down are all separate, usually gate and up are together [gate, up] and down calculations depends on the result of both gate and up where gate and up do not depend on each other.

Asking because if down calculations is dependendent on result of gate + up and gate and up are not dependedent in the modeling forward, we can make the quantization faster and more accurate by grouping gate and up together. This depends on the forwarding code of the modeling.py file for falcon h1.

For most transformer models:

[ gate, up ],
[ down ]

Comment on lines +28 to +31
["feed_forward.gate_proj"],
["feed_forward.up_proj"],
["feed_forward.down_proj"],
]
Copy link
Collaborator

@Qubitium Qubitium May 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HamzaYousLM If falcon h1 is like falcon e which is llama based, the layer modules should be following for faster quantization. Pleaes check.

layer_modules = [
        ["feed_forward.gate_proj", "feed_forward.up_proj"],
        ["feed_forward.down_proj"],
    ]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants