Skip to content

modeld: autodetect tinygrad backend #35405

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jul 12, 2025

Conversation

andiradulescu
Copy link
Contributor

@andiradulescu andiradulescu commented May 31, 2025

PR does the following:

  • set tinygrad backend from the compiled model
  • fix for METAL brew issue on macOS
  • use llvm@19 instead of llvm, since current latest llvm (20) isn't detected as a device by tinygrad and also installs python 3.13 as a dependency (current tinygrad uses llvm@19)

the reason I didn't detect the devices (again) at runtime is because Device.get_available_devices() takes between 0.9 sec to 1.4 sec (on my M1 mac), thus delaying the start of modeld and dmonitoringmodeld.

@adeebshihadeh
Copy link
Contributor

trigger-jenkins

@andiradulescu andiradulescu mentioned this pull request Jun 5, 2025
Copy link
Contributor

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

@github-actions github-actions bot added the stale label Jun 11, 2025
Copy link
Contributor

This PR has been automatically closed due to inactivity. Feel free to re-open once activity resumes.

@github-actions github-actions bot closed this Jun 13, 2025
@adeebshihadeh adeebshihadeh reopened this Jun 13, 2025
@github-actions github-actions bot removed the stale label Jun 14, 2025
Copy link
Contributor

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

@github-actions github-actions bot added the stale label Jun 23, 2025
est2mzd added a commit to est2mzd/openpilot that referenced this pull request Jun 24, 2025
Copy link
Contributor

@adeebshihadeh adeebshihadeh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's restrict the tinygrad backend types

andiradulescu and others added 2 commits June 25, 2025 21:19
@andiradulescu andiradulescu force-pushed the modeld-autodetect branch 2 times, most recently from d4c46ba to 17ea215 Compare June 25, 2025 21:06
Comment on lines +84 to +87
if not TICI:
backend = backend_from_jit(self.model_run)
os.environ[backend] = '1'
cloudlog.warning(f"dmonitoringmodeld backend set to {backend}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this feels like it should be an upstream tinygrad PR

Copy link
Contributor Author

@andiradulescu andiradulescu Jun 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more like the backend should be exposed as a parameter self.model_run.backend ?

or the model/jit should run or fallback to the device it was compiled for?

opened a tinygrad issue: tinygrad/tinygrad#10989

Copy link
Contributor

github-actions bot commented Jul 9, 2025

This PR has had no activity for 9 days. It will be automatically closed in 2 days if there is no activity.

@github-actions github-actions bot added the stale label Jul 9, 2025
@adeebshihadeh
Copy link
Contributor

Should be good to go after rebasing

@adeebshihadeh
Copy link
Contributor

trigger-jenkins

@github-actions github-actions bot removed the stale label Jul 10, 2025
@adeebshihadeh adeebshihadeh merged commit ce92fd1 into commaai:master Jul 12, 2025
16 checks passed
@sshane
Copy link
Contributor

sshane commented Jul 12, 2025

Breaks scons compilation for me

PYTHONPATH=":/home/batman/openpilot/tinygrad_repo" GPU=1 python3 /home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py /home/batman/openpilot/selfdrive/modeld/models/dmonitoring_model.onnx /home/batman/openpilot/selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl
Traceback (most recent call last):
  File "/home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py", line 136, in <module>
    test_val = compile(onnx_file) if not getenv("RUN") else None
               ^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/examples/openpilot/compile3.py", line 22, in compile
    run_onnx = OnnxRunner(onnx_model)
               ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in __init__
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in <genexpr>
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                                                                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 143, in <dictcomp>
    self.graph_nodes = tuple(OnnxNode(num, n.op_type, tuple(n.input), tuple(n.output), {x.name:attribute_parse(x) for x in n.attribute})
                                                                                               ^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 47, in attribute_parse
    return attribute_types[onnx_attribute.type](onnx_attribute)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 20, in <lambda>
    4: lambda a: buffer_parse(a.t),
                 ^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad_repo/extra/onnx.py", line 72, in buffer_parse
    ret = Tensor(ret.item(), dtype=dtype).reshape(shape)
                 ^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4385, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 328, in item
    return self.data()[(0,) * len(self.shape)]
           ^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 316, in data
    return self._buffer().as_typed_buffer(self.shape)
           ^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 302, in _buffer
    def _buffer(self) -> Buffer: return cast(Buffer, self.cast(self.dtype.base).contiguous().to("CPU").realize().uop.base.buffer)
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 4360, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
                                           ^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/tensor.py", line 269, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 192, in run_schedule
    for si, ei in lower_schedule(schedule):
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 185, in lower_schedule
    raise e
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 179, in lower_schedule
    try: yield (si, lower_schedule_item(si))
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 174, in lower_schedule_item
    return ExecItem(*cast(tuple[Runner,list], si_lowerer.rewrite(si.ast, si.bufs)), si.metadata, si.fixedvars)
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/uop/ops.py", line 731, in rewrite
    if (ret:=match(uop, ctx)) is not None and ret is not uop: return ret
             ^^^^^^^^^^^^^^^
  File "<string>", line 3, in compiled_match
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 167, in <lambda>
    (UPat(Ops.SINK, name="sink"), lambda ctx,sink: (runner:=get_runner(ctx[0].device, sink), [ctx[x] for x in runner.p.globals])),
                                                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 135, in get_runner
    method_cache[ckey] = method_cache[bkey] = ret = CompiledRunner(replace(prg, device=device))
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/engine/realize.py", line 66, in __init__
    self.lib:bytes = precompiled if precompiled is not None else Device[p.device].compiler.compile_cached(p.src)
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/device.py", line 341, in compile_cached
    lib = self.compile(src)
          ^^^^^^^^^^^^^^^^^
  File "/home/batman/openpilot/tinygrad/runtime/ops_gpu.py", line 27, in compile
    raise CompileError(f"OpenCL Compile Error\n\n{mstr.value.decode()}")
tinygrad.device.CompileError: OpenCL Compile Error

Compilation started
1:1:26: warning: unsupported OpenCL extension 'cl_khr_fp16' - ignoring
#pragma OPENCL EXTENSION cl_khr_fp16 : enable
                         ^
1:3:8: error: declaring variable of type 'half' is not allowed
  half val0 = *(data1+0);
       ^
Compilation failed

scons: *** [selfdrive/modeld/models/dmonitoring_model_tinygrad.pkl] Error 1
scons: building terminated because of errors.

sshane added a commit that referenced this pull request Jul 12, 2025
sshane added a commit that referenced this pull request Jul 12, 2025
Revert "modeld: autodetect tinygrad backend (#35405)"

This reverts commit ce92fd1.
@andiradulescu
Copy link
Contributor Author

@sshane can you please paste the output of:

clinfo -l
lsb_release -a
# if you don't have clinfo:
sudo apt install clinfo

I will make the next PR always fallback to CPU, when the detected backend fails for any reason.

@sshane
Copy link
Contributor

sshane commented Jul 12, 2025

batman@workstation-shane:~$ clinfo -l
Platform #0: Intel(R) CPU Runtime for OpenCL(TM) Applications
 `-- Device #0: AMD Ryzen Threadripper PRO 3955WX 16-Cores     
Platform #1: NVIDIA CUDA
 `-- Device #0: NVIDIA GeForce GTX 1080 Ti

batman@workstation-shane:~$ lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 24.04.2 LTS
Release:	24.04
Codename:	noble

@andiradulescu
Copy link
Contributor Author

andiradulescu commented Jul 12, 2025

@sshane can you also paste for?

nvidia-smi

I’m interested in the CUDA info/version. last one. thanks 🙏

@sshane
Copy link
Contributor

sshane commented Jul 13, 2025

batman@workstation-shane:~$ nvidia-smi
Sat Jul 12 21:26:33 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.57.01              Driver Version: 565.57.01      CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1080 Ti     On  |   00000000:61:00.0  On |                  N/A |
| 40%   65C    P0             63W /  250W |    3367MiB /  11264MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants