Skip to content

Commit 90df093

Browse files
authored
Merge branch 'master' into fix-missing-bmi2-avx-cpus-cuda
2 parents 9889226 + 3244ccc commit 90df093

File tree

16 files changed

+5559
-203
lines changed

16 files changed

+5559
-203
lines changed

.github/workflows/dependabot_auto.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ jobs:
1414
steps:
1515
- name: Dependabot metadata
1616
id: metadata
17-
uses: dependabot/fetch-metadata@v2.4.0
17+
uses: dependabot/fetch-metadata@v2.5.0
1818
with:
1919
github-token: "${{ secrets.GITHUB_TOKEN }}"
2020
skip-commit-verification: true

AGENTS.md

Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,19 @@
22

33
Building and testing the project depends on the components involved and the platform where development is taking place. Due to the amount of context required it's usually best not to try building or testing the project unless the user requests it. If you must build the project then inspect the Makefile in the project root and the Makefiles of any backends that are effected by changes you are making. In addition the workflows in .github/workflows can be used as a reference when it is unclear how to build or test a component. The primary Makefile contains targets for building inside or outside Docker, if the user has not previously specified a preference then ask which they would like to use.
44

5+
## Building a specified backend
6+
7+
Let's say the user wants to build a particular backend for a given platform. For example let's say they want to build bark for ROCM/hipblas
8+
9+
- The Makefile has targets like `docker-build-bark` created with `generate-docker-build-target` at the time of writing. Recently added backends may require a new target.
10+
- At a minimum we need to set the BUILD_TYPE, BASE_IMAGE build-args
11+
- Use .github/workflows/backend.yml as a reference it lists the needed args in the `include` job strategy matrix
12+
- l4t and cublas also requires the CUDA major and minor version
13+
- You can pretty print a command like `DOCKER_MAKEFLAGS=-j$(nproc --ignore=1) BUILD_TYPE=hipblas BASE_IMAGE=rocm/dev-ubuntu-24.04:6.4.4 make docker-build-bark`
14+
- Unless the user specifies that they want you to run the command, then just print it because not all agent frontends handle long running jobs well and the output may overflow your context
15+
- The user may say they want to build AMD or ROCM instead of hipblas, or Intel instead of SYCL or NVIDIA insted of l4t or cublas. Ask for confirmation if there is ambiguity.
16+
- Sometimes the user may need extra parameters to be added to `docker build` (e.g. `--platform` for cross-platform builds or `--progress` to view the full logs), in which case you can generate the `docker build` command directly.
17+
518
# Coding style
619

720
- The project has the following .editorconfig
@@ -77,3 +90,49 @@ When fixing compilation errors after upstream changes:
7790
- HTTP server uses `server_routes` with HTTP handlers
7891
- Both use the same `server_context` and task queue infrastructure
7992
- gRPC methods: `LoadModel`, `Predict`, `PredictStream`, `Embedding`, `Rerank`, `TokenizeString`, `GetMetrics`, `Health`
93+
94+
## Tool Call Parsing Maintenance
95+
96+
When working on JSON/XML tool call parsing functionality, always check llama.cpp for reference implementation and updates:
97+
98+
### Checking for XML Parsing Changes
99+
100+
1. **Review XML Format Definitions**: Check `llama.cpp/common/chat-parser-xml-toolcall.h` for `xml_tool_call_format` struct changes
101+
2. **Review Parsing Logic**: Check `llama.cpp/common/chat-parser-xml-toolcall.cpp` for parsing algorithm updates
102+
3. **Review Format Presets**: Check `llama.cpp/common/chat-parser.cpp` for new XML format presets (search for `xml_tool_call_format form`)
103+
4. **Review Model Lists**: Check `llama.cpp/common/chat.h` for `COMMON_CHAT_FORMAT_*` enum values that use XML parsing:
104+
- `COMMON_CHAT_FORMAT_GLM_4_5`
105+
- `COMMON_CHAT_FORMAT_MINIMAX_M2`
106+
- `COMMON_CHAT_FORMAT_KIMI_K2`
107+
- `COMMON_CHAT_FORMAT_QWEN3_CODER_XML`
108+
- `COMMON_CHAT_FORMAT_APRIEL_1_5`
109+
- `COMMON_CHAT_FORMAT_XIAOMI_MIMO`
110+
- Any new formats added
111+
112+
### Model Configuration Options
113+
114+
Always check `llama.cpp` for new model configuration options that should be supported in LocalAI:
115+
116+
1. **Check Server Context**: Review `llama.cpp/tools/server/server-context.cpp` for new parameters
117+
2. **Check Chat Params**: Review `llama.cpp/common/chat.h` for `common_chat_params` struct changes
118+
3. **Check Server Options**: Review `llama.cpp/tools/server/server.cpp` for command-line argument changes
119+
4. **Examples of options to check**:
120+
- `ctx_shift` - Context shifting support
121+
- `parallel_tool_calls` - Parallel tool calling
122+
- `reasoning_format` - Reasoning format options
123+
- Any new flags or parameters
124+
125+
### Implementation Guidelines
126+
127+
1. **Feature Parity**: Always aim for feature parity with llama.cpp's implementation
128+
2. **Test Coverage**: Add tests for new features matching llama.cpp's behavior
129+
3. **Documentation**: Update relevant documentation when adding new formats or options
130+
4. **Backward Compatibility**: Ensure changes don't break existing functionality
131+
132+
### Files to Monitor
133+
134+
- `llama.cpp/common/chat-parser-xml-toolcall.h` - Format definitions
135+
- `llama.cpp/common/chat-parser-xml-toolcall.cpp` - Parsing logic
136+
- `llama.cpp/common/chat-parser.cpp` - Format presets and model-specific handlers
137+
- `llama.cpp/common/chat.h` - Format enums and parameter structures
138+
- `llama.cpp/tools/server/server-context.cpp` - Server configuration options

0 commit comments

Comments
 (0)