Skip to content

Object of type BitsAndBytesConfig is not JSON serializable error with TensorBoard integration #37518

@astefanutti

Description

@astefanutti

System Info

transformers==4.51.3
Python version: 3.11

Who can help?

@zach-huggingface @SunMarc @MekkCyber

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

When using SFTTrainer with BitsAndBytes and TensorBoard integration, the TrainingArguments are serialized to JSON but fails with:

[rank0]: Traceback (most recent call last):
[rank0]:     main({'model_name_or_path': 'meta-llama/Llama-4-Scout-17B-16E-Instruct', 'model_revision': 'main', 'torch_dtype': 'bfloat16', 'attn_implementation': 'flex_attention', 'use_liger': False, 'use_peft': False, 'lora_r': 16, 'lora_alpha': 8, 'lora_dropout': 0.05, 'lora_target_modules': ['q_proj', 'v_proj', 'k_proj', 'o_proj', 'gate_proj', 'up_proj', 'down_proj'], 'lora_modules_to_save': [], 'load_in_4bit': False, 'load_in_8bit': True, 'dataset_name': 'gsm8k', 'dataset_config': 'main', 'dataset_train_split': 'train', 'dataset_test_split': 'test', 'dataset_text_field': 'text', 'dataset_kwargs': {'add_special_tokens': False, 'append_concat_token': False}, 'max_seq_length': 512, 'dataset_batch_size': 1000, 'packing': False, 'num_train_epochs': 10, 'per_device_train_batch_size': 1, 'per_device_eval_batch_size': 1, 'auto_find_batch_size': False, 'eval_strategy': 'epoch', 'bf16': True, 'tf32': False, 'learning_rate': 0.0002, 'warmup_steps': 10, 'lr_scheduler_type': 'inverse_sqrt', 'optim': 'adamw_torch_fused', 'max_grad_norm': 1.0, 'seed': 42, 'gradient_accumulation_steps': 1, 'gradient_checkpointing': False, 'gradient_checkpointing_kwargs': {'use_reentrant': False}, 'fsdp': 'full_shard auto_wrap', 'fsdp_config': {'activation_checkpointing': True, 'cpu_ram_efficient_loading': False, 'sync_module_states': True, 'use_orig_params': True, 'limit_all_gathers': False}, 'save_strategy': 'epoch', 'save_total_limit': 1, 'resume_from_checkpoint': False, 'log_level': 'info', 'logging_strategy': 'steps', 'logging_steps': 1, 'report_to': ['tensorboard'], 'output_dir': '/mnt/shared/Llama-4-Scout-17B-16E-Instruct'})
[rank0]:   File "/tmp/tmp.jsNRcydokN/ephemeral_script.py", line 126, in main
[rank0]:     trainer.train(resume_from_checkpoint=checkpoint)
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2238, in train
[rank0]:     return inner_training_loop(
[rank0]:            ^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer.py", line 2462, in _inner_training_loop
[rank0]:     self.control = self.callback_handler.on_train_begin(args, self.state, self.control)
[rank0]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer_callback.py", line 506, in on_train_begin
[rank0]:     return self.call_event("on_train_begin", args, state, control)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/trainer_callback.py", line 556, in call_event
[rank0]:     result = getattr(callback, event)(
[rank0]:              ^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/integrations/integration_utils.py", line 698, in on_train_begin
[rank0]:     self.tb_writer.add_text("args", args.to_json_string())
[rank0]:                                     ^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/opt/app-root/lib64/python3.11/site-packages/transformers/training_args.py", line 2509, in to_json_string
[rank0]:     return json.dumps(self.to_dict(), indent=2)
[rank0]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank0]:   File "/usr/lib64/python3.11/json/__init__.py", line 238, in dumps
[rank0]:     **kw).encode(obj)
[rank0]:           ^^^^^^^^^^^
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 202, in encode
[rank0]:     chunks = list(chunks)
[rank0]:              ^^^^^^^^^^^^
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 432, in _iterencode
[rank0]:     yield from _iterencode_dict(o, _current_indent_level)
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 406, in _iterencode_dict
[rank0]:     yield from chunks
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 406, in _iterencode_dict
[rank0]:     yield from chunks
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 439, in _iterencode
[rank0]:     o = _default(o)
[rank0]:         ^^^^^^^^^^^
[rank0]:   File "/usr/lib64/python3.11/json/encoder.py", line 180, in default
[rank0]:     raise TypeError(f'Object of type {o.__class__.__name__} '
[rank0]: TypeError: Object of type BitsAndBytesConfig is not JSON serializable

Expected behavior

The BitsAndBytesConfig should be converted to dict before TrainingArguments are serialized.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions