1D Matrix Multiplication example for HAT #276

jjfumero · 2024-11-19T10:12:08Z

Add new example for 1D Matrix Multiplication in HAT.

How to test?

## Compile 
java --add-modules jdk.incubator.code --enable-preview --source 24 bld

## Run with the OpenCL Backend
java @bldr/hatrun ffi-opencl matmul  

## Run with the CUDA Backend
java @bldr/hatrun ffi-ptx matmul

Note that the generated kernel for OpenCL contains a race condition:

__kernel void matrixMultiplyKernel(
    __global KernelContext_t *kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
){
    kc->x=get_global_id(0);                   //  << Shared struct across all threads to store the thread-id 
    if(kc->x<kc->maxX){
        for(int j = 0; j<size; j=j+1){
            float acc = (float)0;
            for(int k = 0; k<size; k=k+1){
                acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
            }
            matrixC->array[(long)(kc->x*size+j)]=acc;
        }
    }
    return;
}

After applying a patch provided by Gary Frost to solve the race condition, it works.

Patch:

diff --git a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
index ade90914d7e..2719fed31ed 100644
--- a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
+++ b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
@@ -26,7 +26,6 @@
 
 
 import hat.buffer.Buffer;
-import hat.buffer.KernelContext;
 import hat.callgraph.KernelCallGraph;
 import hat.callgraph.KernelEntrypoint;
 import hat.optools.FuncOpWrapper;
@@ -72,9 +71,13 @@ T typedefStructOrUnion(boolean isStruct, String name, Consumer<T> consumer) {
 
 
     public final T scope() {
-        return
-                identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
-                //.identifier("kc").rarrow().identifier("maxX").equals().globalSize().semicolon().nl();
+
+        identifier("KernelContext_t").space().identifier("mine").semicolon().nl();
+        identifier("KernelContext_t").asterisk().space().identifier("kc").equals().ampersand().identifier("mine").semicolon().nl();
+        identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
+        identifier("kc").rarrow().identifier("maxX").equals().identifier("global_kc").rarrow().identifier("maxX").semicolon().nl();
+        return self();
+
     }
 
     public abstract T globalPtrPrefix();
@@ -137,7 +140,7 @@ public T kernelEntrypoint(KernelEntrypoint kernelEntrypoint, Object[] args) {
                 }
             }
             parenNlIndented(_ -> {
-                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("kc");
+                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("global_kc");
                         list.stream().skip(1).forEach(info ->
                                 comma().space().type(info.javaType).space().varName(info.varOp)
                         );

Note: this PR does not provide this path, only the example and the runner extension to run the matrix multiplication.

Progress

Change must not contain extraneous whitespace

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/276/head:pull/276
$ git checkout pull/276

Update a local copy of the PR:
$ git checkout pull/276
$ git pull https://git.openjdk.org/babylon.git pull/276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 276

View PR using the GUI difftool:
$ git pr show -t 276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/276.diff

Using Webrev

Link to Webrev Comment

bridgekeeper · 2024-11-19T10:12:56Z

Hi @jjfumero, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user jjfumero" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

openjdk · 2024-11-19T10:13:03Z

@jjfumero This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

1D Matrix Multiplication example for HAT

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the code-reflection branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

jjfumero · 2024-11-19T10:24:36Z

/signed

bridgekeeper · 2024-11-19T10:25:56Z

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

bridgekeeper · 2024-12-17T15:27:52Z

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

jjfumero · 2024-12-18T08:44:37Z

Still waiting for the OCA approval.

bridgekeeper · 2025-01-15T14:06:52Z

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

jjfumero · 2025-01-15T14:07:50Z

Still waiting for the OCA approval

SirYwell · 2025-01-15T15:58:48Z

You might want to contact [email protected] regarding your OCA approval status. Not sure what's up there, but it shouldn't take that long.

mlbridge · 2025-01-17T15:00:21Z

Webrevs

04: Full (eb019756)
03: Full (3e13f8ea)
02: Full (46353d01)
01: Full (cd3c7ce9)
00: Full (65b647fc)

bridgekeeper · 2025-02-12T17:40:55Z

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

jjfumero · 2025-02-13T06:33:11Z

Pending for review

bridgekeeper · 2025-03-13T11:19:56Z

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

SidneyLann · 2025-04-07T15:17:37Z

@jjfumero Hi, can use Babylon to implement operations to do llama inference now? Babylon has all the basic ops for at least one platform ie. cuda now?

jjfumero · 2025-04-07T16:20:33Z

Hi @SidneyLann , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet.

grfrost · 2025-04-07T17:51:27Z

Hi Sidney As Juan mentioned, I don't think we are there yet. But we have plans. We need to add low level primitives to HAT for matrix mul/scans etc to allow us to handoff to the vendor backends (GPU drivers) without unnecessary copies. Gary

…

On Mon, Apr 7, 2025 at 5:20 PM Juan Fumero ***@***.***> wrote: Hi @SidneyLann <https://github.com/SidneyLann> , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet. — Reply to this email directly, view it on GitHub <#276 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***> [image: jjfumero]*jjfumero* left a comment (openjdk/babylon#276) <#276 (comment)> Hi @SidneyLann <https://github.com/SidneyLann> , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet. — Reply to this email directly, view it on GitHub <#276 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

SidneyLann · 2025-04-07T23:27:00Z

@grfrost
Hi Gray
Are you develope many platforms(ptx,cuda,spirv,hip,etc) simultaneously？How about complete one platform(ie. cuda) first?

openjdk · 2025-04-18T19:12:46Z

@jjfumero this pull request can not be integrated into code-reflection due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout dev/examples
git fetch https://git.openjdk.org/babylon.git code-reflection
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge code-reflection"
git push

bridgekeeper · 2025-05-06T05:46:02Z

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

SidneyLann · 2025-06-01T07:09:14Z

openjdk/valhalla#1478 (comment)

@SidneyLann Valhalla is ready for experimental use, you can either build the project from source (build instructions can be found here) or you can grab a prebuilt package here. Please give it a try and report to us any issue you find, it would be a great help in the stabilization of Valhalla.

If you want to know whether Valhalla can be released to mainline soon then the answer is we don't know and we are trying our best. I believe an act of trying, reporting issues, and even contributing will help Valhalla to land sooner.

@grfrost
Hi Gray
Is babylon waiting for valhalla ready? valhalla is ready for experimental use now, and also babylon ? Thank you.

grfrost · 2025-06-02T12:17:10Z

@SidneyLann

No Babylon is not waiting for Valhalla. We don't use it at present, but its possible that we might down the line.

grfrost · 2025-06-02T12:25:57Z

@SidneyLann Sorry just saw your Q above regarding 'why not finish CUDA version first?'

The reason we have multiple backends, at various stages of development is because we want to ensure that HAT can be implemented on the widest possible set of backends (CUDA/HIP/OpenCL/SPIRV), so we are building 'reference' implementations of each.

I am attempting to provide 'reference' (i.e. almost definitely not maximally performant :) ) multiple backends to make sure this is plausible, and to ensure the program model scales.

Our eventual hope is to persuade CUDA/OpenCL/HIP experts (maybe the vendor runtime owners themselves) to eventually help us build out more robust implementations.

OpenCL is probably more thouroughly tested and complete, just because I am more familiar with OpenCL.

jjfumero · 2025-06-11T15:15:04Z

Conflicts solved. It works with the latest tip: 5bdc8ff

jjfumero added 5 commits November 15, 2024 15:29

Matrix Multiplication Example added

c9e8b09

Matrix-Multiplication checks

3873dae

Precision control error down to 1%

ac07746

Merge branch 'code-reflection' into dev/examples

def310e

MatrixMult example moved to matmul directory

aa8e176

bridgekeeper bot added the oca Needs verification of OCA signatory status label Nov 19, 2024

bridgekeeper bot added the oca-verify Needs verification of OCA signatory status label Nov 19, 2024

jjfumero added 2 commits November 27, 2024 12:22

Merge branch 'code-reflection' into dev/examples

f954650

Merge branch 'code-reflection' into dev/examples

0ffe487

jjfumero added 3 commits January 10, 2025 12:05

Merge branch 'code-reflection' into dev/examples

566cd50

Merge with latest develop

0391b81

Minor fix seq-comparison code

65b647f

bridgekeeper bot removed oca Needs verification of OCA signatory status oca-verify Needs verification of OCA signatory status labels Jan 17, 2025

openjdk bot added ready Pull request is ready to be integrated rfr Pull request is ready for review labels Jan 17, 2025

Merge branch 'code-reflection' into dev/examples

cd3c7ce

openjdk bot added merge-conflict Pull request has merge conflict with target branch and removed ready Pull request is ready to be integrated rfr Pull request is ready for review labels Apr 18, 2025

Merge branch 'code-reflection' into dev/examples

46353d0

Merge branch 'code-reflection' into dev/examples

3e13f8e

openjdk bot added ready Pull request is ready to be integrated and removed merge-conflict Pull request has merge conflict with target branch labels Jun 16, 2025

Merge branch 'code-reflection' into dev/examples

eb01975

1D Matrix Multiplication example for HAT #276

Are you sure you want to change the base?

1D Matrix Multiplication example for HAT #276

Uh oh!

Conversation

jjfumero commented Nov 19, 2024 • edited by openjdk bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to test?

Note that the generated kernel for OpenCL contains a race condition:

Progress

Reviewing

Uh oh!

bridgekeeper bot commented Nov 19, 2024

Uh oh!

openjdk bot commented Nov 19, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjfumero commented Nov 19, 2024

Uh oh!

bridgekeeper bot commented Nov 19, 2024

Uh oh!

bridgekeeper bot commented Dec 17, 2024

Uh oh!

jjfumero commented Dec 18, 2024

Uh oh!

bridgekeeper bot commented Jan 15, 2025

Uh oh!

jjfumero commented Jan 15, 2025

Uh oh!

SirYwell commented Jan 15, 2025

Uh oh!

mlbridge bot commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Webrevs

Uh oh!

bridgekeeper bot commented Feb 12, 2025

Uh oh!

jjfumero commented Feb 13, 2025

Uh oh!

bridgekeeper bot commented Mar 13, 2025

Uh oh!

SidneyLann commented Apr 7, 2025

Uh oh!

jjfumero commented Apr 7, 2025

Uh oh!

grfrost commented Apr 7, 2025 via email

Uh oh!

SidneyLann commented Apr 7, 2025

Uh oh!

openjdk bot commented Apr 18, 2025

Uh oh!

bridgekeeper bot commented May 6, 2025

Uh oh!

SidneyLann commented Jun 1, 2025

Uh oh!

grfrost commented Jun 2, 2025

Uh oh!

grfrost commented Jun 2, 2025

Uh oh!

jjfumero commented Jun 11, 2025

Uh oh!

Uh oh!

jjfumero commented Nov 19, 2024 •

edited by openjdk bot

Loading

openjdk bot commented Nov 19, 2024 •

edited

Loading

mlbridge bot commented Jan 17, 2025 •

edited

Loading