modeld: autodetect tinygrad backend #35405

andiradulescu · 2025-05-31T08:56:39Z

PR does the following:

set tinygrad backend from the compiled model
fix for METAL brew issue on macOS
use llvm@19 instead of llvm, since current latest llvm (20) isn't detected as a device by tinygrad and also installs python 3.13 as a dependency (current tinygrad uses llvm@19)

the reason I didn't detect the devices (again) at runtime is because Device.get_available_devices() takes between 0.9 sec to 1.4 sec (on my M1 mac), thus delaying the start of modeld and dmonitoringmodeld.

adeebshihadeh · 2025-06-01T20:35:52Z

trigger-jenkins

github-actions · 2025-06-11T02:12:49Z

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

github-actions · 2025-06-13T02:12:57Z

This PR has been automatically closed due to inactivity. Feel free to re-open once activity resumes.

github-actions · 2025-06-23T02:14:17Z

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

adeebshihadeh

Let's restrict the tinygrad backend types

tools/mac_setup.sh

selfdrive/modeld/SConscript

This reverts commit 0e9755f.

Co-authored-by: Adeeb Shihadeh <[email protected]>

adeebshihadeh · 2025-06-25T22:14:32Z

selfdrive/modeld/dmonitoringmodeld.py

+    if not TICI:
+      backend = backend_from_jit(self.model_run)
+      os.environ[backend] = '1'
+      cloudlog.warning(f"dmonitoringmodeld backend set to {backend}")


this feels like it should be an upstream tinygrad PR

more like the backend should be exposed as a parameter self.model_run.backend ?

or the model/jit should run or fallback to the device it was compiled for?

opened a tinygrad issue: tinygrad/tinygrad#10989

github-actions · 2025-07-09T02:14:02Z

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

adeebshihadeh · 2025-07-09T18:33:42Z

Should be good to go after rebasing

adeebshihadeh · 2025-07-09T20:17:19Z

trigger-jenkins

sshane · 2025-07-12T07:48:06Z

Breaks scons compilation for me

PYTHONPATH=":/home/batman/openpilot/tinygrad_repo" GPU=1 python3 /home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py /home/batman/openpilot/selfdrive/modeld/models/dmonitoring_model.onnx /home/batman/openpilot/selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl
Traceback (most recent call last):
  File "/home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py", line 136, in <module>
    test_val = compile(onnx_file) if not getenv("RUN") else None
               ^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py", line 22, in compile
    run_onnx = OnnxRunner(onnx_model)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in __init__
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in <genexpr>
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in <dictcomp>
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                                                                                               ^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 47, in attribute_parse
    return attribute_types[onnx_attribute.type](onnx_attribute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 20, in <lambda>
    4: lambda a: buffer_parse(a.t),
                 ^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 72, in buffer_parse
    ret = Tensor(ret.item(), dtype=dtype).reshape(shape)
                 ^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4385, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 328, in item
    return self.data()[(0,) * len(self.shape)]
           ^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 316, in data
    return self._buffer().as_typed_buffer(self.shape)
           ^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 302, in _buffer
    def _buffer(self) -> Buffer: return cast(Buffer, self.cast(self.dtype.base).contiguous().to("CPU").realize().uop.base.buffer)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 269, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 192, in run_schedule
    for si, ei in lower_schedule(schedule):
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 185, in lower_schedule
    raise e
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 179, in lower_schedule
    try: yield (si, lower_schedule_item(si))
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 174, in lower_schedule_item
    return ExecItem(*cast(tuple[Runner,list], si_lowerer.rewrite(si.ast, si.bufs)), si.metadata, si.fixedvars)
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/uop/ops.py", line 731, in rewrite
    if (ret:=match(uop, ctx)) is not None and ret is not uop: return ret
             ^^^^^^^^^^^^^^^
  File "<string>", line 3, in compiled_match
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 167, in <lambda>
    (UPat(Ops.SINK, name="sink"), lambda ctx,sink: (runner:=get_runner(ctx[0].device, sink), [ctx[x] for x in runner.p.globals])),
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 135, in get_runner
    method_cache[ckey] = method_cache[bkey] = ret = CompiledRunner(replace(prg, device=device))
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 66, in __init__
    self.lib:bytes = precompiled if precompiled is not None else Device[p.device].compiler.compile_cached(p.src)
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/device.py", line 341, in compile_cached
    lib = self.compile(src)
          ^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/runtime/ops_gpu.py", line 27, in compile
    raise CompileError(f"OpenCL Compile Error\n\n{mstr.value.decode()}")
tinygrad.device.CompileError: OpenCL Compile Error

Compilation started
1:1:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
                         ^
1:3:8: error: declaring variable of type 'half' is not allowed
  half val0 = *(data1+0);
       ^
Compilation failed

scons: *** [selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl] Error 1
scons: building terminated because of errors.

This reverts commit ce92fd1.

Revert "modeld: autodetect tinygrad backend (#35405)" This reverts commit ce92fd1.

andiradulescu · 2025-07-12T12:59:01Z

@sshane can you please paste the output of:

clinfo -l
lsb_release -a

# if you don't have clinfo:
sudo apt install clinfo

I will make the next PR always fallback to CPU, when the detected backend fails for any reason.

sshane · 2025-07-12T13:03:29Z

batman@workstation-shane:~$ clinfo -l
Platform #0: Intel(R) CPU Runtime for OpenCL(TM) Applications
 `-- Device #0: AMD Ryzen Threadripper PRO 3955WX 16-Cores     
Platform #1: NVIDIA CUDA
 `-- Device #0: NVIDIA GeForce GTX 1080 Ti

batman@workstation-shane:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

andiradulescu · 2025-07-12T13:24:15Z

@sshane can you also paste for?

nvidia-smi

I’m interested in the CUDA info/version. last one. thanks 🙏

sshane · 2025-07-13T04:26:47Z

batman@workstation-shane:~$ nvidia-smi
Sat Jul 12 21:26:33 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:61:00.0  On |                  N/A |
| 40%   65C    P0             63W /  250W |    3367MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

andiradulescu added 2 commits May 31, 2025 11:41

modeld: autodetect tinygrad backend

3c76ebf

modeld: autodetect tinygrad CUDA backend

0e9755f

andiradulescu mentioned this pull request Jun 5, 2025

GPU=1 on PC #35127

Closed

github-actions bot added the stale label Jun 11, 2025

github-actions bot closed this Jun 13, 2025

adeebshihadeh reopened this Jun 13, 2025

github-actions bot removed the stale label Jun 14, 2025

github-actions bot added the stale label Jun 23, 2025

est2mzd added a commit to est2mzd/openpilot that referenced this pull request Jun 24, 2025

Fixed Simulation Problems git add -u PR commaai#35405

7be62be

adeebshihadeh requested changes Jun 25, 2025

View reviewed changes

tools/mac_setup.sh Outdated Show resolved Hide resolved

selfdrive/modeld/SConscript Outdated Show resolved Hide resolved

adeebshihadeh reviewed Jun 25, 2025

View reviewed changes

selfdrive/modeld/SConscript Outdated Show resolved Hide resolved

andiradulescu and others added 2 commits June 25, 2025 21:19

Revert "modeld: autodetect tinygrad CUDA backend"

506e498

This reverts commit 0e9755f.

comment why llvm@19

f71b0fe

Co-authored-by: Adeeb Shihadeh <[email protected]>

andiradulescu force-pushed the modeld-autodetect branch 2 times, most recently from d4c46ba to 17ea215 Compare June 25, 2025 21:06

backend from jit

5accb2b

andiradulescu force-pushed the modeld-autodetect branch from 17ea215 to 5accb2b Compare June 25, 2025 21:08

andiradulescu added 2 commits June 26, 2025 00:20

fix static analysis

341bbd1

simplify

31f7e8f

andiradulescu requested a review from adeebshihadeh June 25, 2025 21:44

adeebshihadeh reviewed Jun 25, 2025

View reviewed changes

github-actions bot removed the stale label Jun 26, 2025

andiradulescu mentioned this pull request Jun 26, 2025

TinyJit has no clean way to get the device it was compiled with tinygrad/tinygrad#10989

Open

Merge branch 'master' into modeld-autodetect

b982f83

github-actions bot added the stale label Jul 9, 2025

andiradulescu mentioned this pull request Jul 9, 2025

less dirty comma zero #35667

Draft

Merge branch 'master' into modeld-autodetect

aa47705

andiradulescu force-pushed the modeld-autodetect branch from 95eac21 to aa47705 Compare July 9, 2025 19:18

compile flags log

289fa5b

andiradulescu requested a review from adeebshihadeh July 9, 2025 19:43

github-actions bot removed the stale label Jul 10, 2025

adeebshihadeh merged commit ce92fd1 into commaai:master Jul 12, 2025
16 checks passed

sshane added a commit that referenced this pull request Jul 12, 2025

Revert "modeld: autodetect tinygrad backend (#35405)"

23f5cf2

This reverts commit ce92fd1.

sshane mentioned this pull request Jul 12, 2025

Revert "modeld: autodetect tinygrad backend" #35701

Merged

sshane added a commit that referenced this pull request Jul 12, 2025

Revert "modeld: autodetect tinygrad backend" (#35701)

6f1a1b3

Revert "modeld: autodetect tinygrad backend (#35405)" This reverts commit ce92fd1.

modeld: autodetect tinygrad backend #35405

modeld: autodetect tinygrad backend #35405

Uh oh!

Conversation

andiradulescu commented May 31, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

adeebshihadeh commented Jun 1, 2025

Uh oh!

github-actions bot commented Jun 11, 2025

Uh oh!

github-actions bot commented Jun 13, 2025

Uh oh!

github-actions bot commented Jun 23, 2025

Uh oh!

adeebshihadeh left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

adeebshihadeh Jun 25, 2025

Choose a reason for hiding this comment

Uh oh!

andiradulescu Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Jul 9, 2025

Uh oh!

adeebshihadeh commented Jul 9, 2025

Uh oh!

adeebshihadeh commented Jul 9, 2025

Uh oh!

Uh oh!

sshane commented Jul 12, 2025

Uh oh!

andiradulescu commented Jul 12, 2025

Uh oh!

sshane commented Jul 12, 2025

Uh oh!

andiradulescu commented Jul 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sshane commented Jul 13, 2025

Uh oh!

Uh oh!

andiradulescu commented May 31, 2025 •

edited

Loading

andiradulescu Jun 26, 2025 •

edited

Loading

andiradulescu commented Jul 12, 2025 •

edited

Loading