Skip to content

1D Matrix Multiplication example for HAT #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: code-reflection
Choose a base branch
from

Conversation

jjfumero
Copy link

@jjfumero jjfumero commented Nov 19, 2024

Add new example for 1D Matrix Multiplication in HAT.

How to test?

## Compile 
java --add-modules jdk.incubator.code --enable-preview --source 24 bld

## Run with the OpenCL Backend
java @bldr/hatrun ffi-opencl matmul  

## Run with the CUDA Backend
java @bldr/hatrun ffi-ptx matmul 

Note that the generated kernel for OpenCL contains a race condition:

__kernel void matrixMultiplyKernel(
    __global KernelContext_t *kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
){
    kc->x=get_global_id(0);                   //  << Shared struct across all threads to store the thread-id 
    if(kc->x<kc->maxX){
        for(int j = 0; j<size; j=j+1){
            float acc = (float)0;
            for(int k = 0; k<size; k=k+1){
                acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
            }
            matrixC->array[(long)(kc->x*size+j)]=acc;
        }
    }
    return;
}

After applying a patch provided by Gary Frost to solve the race condition, it works.

Patch:

diff --git a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
index ade90914d7e..2719fed31ed 100644
--- a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
+++ b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
@@ -26,7 +26,6 @@
 
 
 import hat.buffer.Buffer;
-import hat.buffer.KernelContext;
 import hat.callgraph.KernelCallGraph;
 import hat.callgraph.KernelEntrypoint;
 import hat.optools.FuncOpWrapper;
@@ -72,9 +71,13 @@ T typedefStructOrUnion(boolean isStruct, String name, Consumer<T> consumer) {
 
 
     public final T scope() {
-        return
-                identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
-                //.identifier("kc").rarrow().identifier("maxX").equals().globalSize().semicolon().nl();
+
+        identifier("KernelContext_t").space().identifier("mine").semicolon().nl();
+        identifier("KernelContext_t").asterisk().space().identifier("kc").equals().ampersand().identifier("mine").semicolon().nl();
+        identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
+        identifier("kc").rarrow().identifier("maxX").equals().identifier("global_kc").rarrow().identifier("maxX").semicolon().nl();
+        return self();
+
     }
 
     public abstract T globalPtrPrefix();
@@ -137,7 +140,7 @@ public T kernelEntrypoint(KernelEntrypoint kernelEntrypoint, Object[] args) {
                 }
             }
             parenNlIndented(_ -> {
-                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("kc");
+                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("global_kc");
                         list.stream().skip(1).forEach(info ->
                                 comma().space().type(info.javaType).space().varName(info.varOp)
                         );

Note: this PR does not provide this path, only the example and the runner extension to run the matrix multiplication.


Progress

  • Change must not contain extraneous whitespace

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/276/head:pull/276
$ git checkout pull/276

Update a local copy of the PR:
$ git checkout pull/276
$ git pull https://git.openjdk.org/babylon.git pull/276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 276

View PR using the GUI difftool:
$ git pr show -t 276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/276.diff

Using Webrev

Link to Webrev Comment

@bridgekeeper bridgekeeper bot added the oca Needs verification of OCA signatory status label Nov 19, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 19, 2024

Hi @jjfumero, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user jjfumero" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

@openjdk
Copy link

openjdk bot commented Nov 19, 2024

@jjfumero This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

1D Matrix Multiplication example for HAT

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the code-reflection branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@jjfumero
Copy link
Author

/signed

@bridgekeeper bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Nov 19, 2024
@bridgekeeper
Copy link

bridgekeeper bot commented Nov 19, 2024

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

@bridgekeeper
Copy link

bridgekeeper bot commented Dec 17, 2024

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jjfumero
Copy link
Author

Still waiting for the OCA approval.

@bridgekeeper
Copy link

bridgekeeper bot commented Jan 15, 2025

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jjfumero
Copy link
Author

Still waiting for the OCA approval

@SirYwell
Copy link
Member

You might want to contact [email protected] regarding your OCA approval status. Not sure what's up there, but it shouldn't take that long.

@bridgekeeper bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Jan 17, 2025
@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 17, 2025
@mlbridge
Copy link

mlbridge bot commented Jan 17, 2025

Webrevs

@bridgekeeper
Copy link

bridgekeeper bot commented Feb 12, 2025

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@jjfumero
Copy link
Author

Pending for review

@bridgekeeper
Copy link

bridgekeeper bot commented Mar 13, 2025

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@SidneyLann
Copy link

@jjfumero Hi, can use Babylon to implement operations to do llama inference now? Babylon has all the basic ops for at least one platform ie. cuda now?

@jjfumero
Copy link
Author

jjfumero commented Apr 7, 2025

Hi @SidneyLann , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet.

@grfrost
Copy link
Collaborator

grfrost commented Apr 7, 2025 via email

@SidneyLann
Copy link

@grfrost
Hi Gray
Are you develope many platforms(ptx,cuda,spirv,hip,etc) simultaneously?How about complete one platform(ie. cuda) first?

@openjdk
Copy link

openjdk bot commented Apr 18, 2025

@jjfumero this pull request can not be integrated into code-reflection due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout dev/examples
git fetch https://git.openjdk.org/babylon.git code-reflection
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge code-reflection"
git push

@openjdk openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 18, 2025
@bridgekeeper
Copy link

bridgekeeper bot commented May 6, 2025

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

@SidneyLann
Copy link

openjdk/valhalla#1478 (comment)

@SidneyLann Valhalla is ready for experimental use, you can either build the project from source (build instructions can be found here) or you can grab a prebuilt package here. Please give it a try and report to us any issue you find, it would be a great help in the stabilization of Valhalla.

If you want to know whether Valhalla can be released to mainline soon then the answer is we don't know and we are trying our best. I believe an act of trying, reporting issues, and even contributing will help Valhalla to land sooner.

@grfrost
Hi Gray
Is babylon waiting for valhalla ready? valhalla is ready for experimental use now, and also babylon ? Thank you.

@grfrost
Copy link
Collaborator

grfrost commented Jun 2, 2025

@SidneyLann

No Babylon is not waiting for Valhalla. We don't use it at present, but its possible that we might down the line.

@grfrost
Copy link
Collaborator

grfrost commented Jun 2, 2025

@SidneyLann Sorry just saw your Q above regarding 'why not finish CUDA version first?'

The reason we have multiple backends, at various stages of development is because we want to ensure that HAT can be implemented on the widest possible set of backends (CUDA/HIP/OpenCL/SPIRV), so we are building 'reference' implementations of each.

I am attempting to provide 'reference' (i.e. almost definitely not maximally performant :) ) multiple backends to make sure this is plausible, and to ensure the program model scales.

Our eventual hope is to persuade CUDA/OpenCL/HIP experts (maybe the vendor runtime owners themselves) to eventually help us build out more robust implementations.

OpenCL is probably more thouroughly tested and complete, just because I am more familiar with OpenCL.

@jjfumero
Copy link
Author

Conflicts solved. It works with the latest tip: 5bdc8ff

@openjdk openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review merge-conflict Pull request has merge conflict with target branch and removed merge-conflict Pull request has merge conflict with target branch ready Pull request is ready to be integrated labels Jun 11, 2025
@openjdk openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ready Pull request is ready to be integrated rfr Pull request is ready for review
Development

Successfully merging this pull request may close these issues.

4 participants