@FIR-1991 - Fix std::filesystem link failure in CI by ensuring proper…#56
Merged
Conversation
… compiler usage in Go/cgo build
atrivedi-tsavoritesi
approved these changes
Jun 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes CI build failures caused by unresolved std::filesystem symbols during the final linking stage of the Ollama binary. The issue occurred because the Go/cgo linker in CI was falling back to GCC 8 (/usr/lib64/ccache/g++), which does not automatically link std::filesystem and requires the explicit -lstdc++fs flag. Since this flag was missing, builds failed with multiple undefined reference to std::filesystem errors. To resolve this, a safe fallback was added in CI using export CGO_LDFLAGS="${CGO_LDFLAGS:-} -lstdc++fs" to ensure compatibility with GCC 8 environments. In addition, the longer-term fix aligns the toolchain by ensuring Go/cgo uses the same compiler as the native build (GCC 13), preventing fallback to older system compilers and avoiding similar issues in the future. This change stabilizes CI builds and ensures consistent behavior across mixed compiler environments.
Build ollama images at wssw01 & wspd0 and tested at tsisim
#########
LOG
root@tsisim:/tsi/tsi-sw/anoop_ollama# pwd
/tsi/tsi-sw/anoop_ollama
root@tsisim:/tsi/tsi-sw/anoop_ollama# ls -lrt
total 35916
-rw-r--r-- 1 root root 36775163 Jun 22 19:00 ollama-arm64-release.tar.gz
root@tsisim:/tsi/tsi-sw/anoop_ollama# chmod 777 *
root@tsisim:/tsi/tsi-sw/anoop_ollama# tar -zxvf ollama-arm64-release.tar.gz
ollama-arm64-release/
ollama-arm64-release/bin/
ollama-arm64-release/bin/ollama
ollama-arm64-release/bin/libggml-base.so
ollama-arm64-release/bin/libggml-cpu.so
ollama-arm64-release/bin/libggml-tsavorite.so
ollama-arm64-release/bin/tsavorite-model-deployment.yaml
ollama-arm64-release/lib/
ollama-arm64-release/lib/libggml-base.so
ollama-arm64-release/lib/libggml-cpu.so
ollama-arm64-release/lib/libggml-tsavorite.so
ollama-arm64-release/blobs/
ollama-arm64-release/blobs/txe_mul_mat_tile_f32_k128.blob
ollama-arm64-release/blobs/txe_mul_mat_tile_f32_k64.blob
ollama-arm64-release/blobs/txe_mul_mat_tile_f32_k32.blob
ollama-arm64-release/blobs/txe_add.blob
ollama-arm64-release/blobs/txe_sub.blob
ollama-arm64-release/blobs/txe_neg.blob
ollama-arm64-release/blobs/txe_sqrt.blob
ollama-arm64-release/blobs/txe_neg_16.blob
ollama-arm64-release/blobs/txe_sqrt_16.blob
ollama-arm64-release/blobs/txe_mult.blob
ollama-arm64-release/blobs/txe_div.blob
ollama-arm64-release/blobs/txe_abs.blob
ollama-arm64-release/blobs/txe_sqr.blob
ollama-arm64-release/blobs/txe_inv.blob
ollama-arm64-release/blobs/txe_sin.blob
ollama-arm64-release/blobs/txe_sigmoid.blob
ollama-arm64-release/blobs/txe_silu.blob
ollama-arm64-release/blobs/txe_swiglu.blob
ollama-arm64-release/blobs/txe_rms_norm.blob
ollama-arm64-release/blobs/txe_soft_max.blob
ollama-arm64-release/blobs/txe_add_16.blob
ollama-arm64-release/blobs/txe_sub_16.blob
ollama-arm64-release/blobs/txe_mult_16.blob
ollama-arm64-release/blobs/txe_div_16.blob
ollama-arm64-release/blobs/txe_abs_16.blob
ollama-arm64-release/blobs/txe_sqr_16.blob
ollama-arm64-release/blobs/txe_inv_16.blob
ollama-arm64-release/blobs/txe_sin_16.blob
ollama-arm64-release/blobs/txe_sigmoid_16.blob
ollama-arm64-release/blobs/txe_silu_16.blob
ollama-arm64-release/blobs/txe_swiglu_16.blob
ollama-arm64-release/blobs/txe_rms_norm_16.blob
ollama-arm64-release/blobs/txe_triton_add/
ollama-arm64-release/blobs/txe_triton_add/txe_blob_0.blob
ollama-arm64-release/README.md
ollama-arm64-release/tsi-ggml/
ollama-arm64-release/tsi-ggml/blobs/
ollama-arm64-release/tsi-ggml/blobs/txe_mul_mat_tile_f32_k128.blob
ollama-arm64-release/tsi-ggml/blobs/txe_mul_mat_tile_f32_k64.blob
ollama-arm64-release/tsi-ggml/blobs/txe_mul_mat_tile_f32_k32.blob
ollama-arm64-release/tsi-ggml/blobs/txe_add.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sub.blob
ollama-arm64-release/tsi-ggml/blobs/txe_neg.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sqrt.blob
ollama-arm64-release/tsi-ggml/blobs/txe_neg_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sqrt_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_mult.blob
ollama-arm64-release/tsi-ggml/blobs/txe_div.blob
ollama-arm64-release/tsi-ggml/blobs/txe_abs.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sqr.blob
ollama-arm64-release/tsi-ggml/blobs/txe_inv.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sin.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sigmoid.blob
ollama-arm64-release/tsi-ggml/blobs/txe_silu.blob
ollama-arm64-release/tsi-ggml/blobs/txe_swiglu.blob
ollama-arm64-release/tsi-ggml/blobs/txe_rms_norm.blob
ollama-arm64-release/tsi-ggml/blobs/txe_soft_max.blob
ollama-arm64-release/tsi-ggml/blobs/txe_add_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sub_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_mult_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_div_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_abs_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sqr_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_inv_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sin_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_sigmoid_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_silu_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_swiglu_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_rms_norm_16.blob
ollama-arm64-release/tsi-ggml/blobs/txe_triton_add/
ollama-arm64-release/tsi-ggml/blobs/txe_triton_add/txe_blob_0.blob
ollama-arm64-release/tsi-ggml/ggml.sh
root@tsisim:/tsi/tsi-sw/anoop_ollama# ls -lrt
total 35928
drwxr-xr-x 6 100041 100003 4096 Jun 22 18:55 ollama-arm64-release
-rwxrwxrwx 1 root root 36775163 Jun 22 19:00 ollama-arm64-release.tar.gz
-rwxrwxrwx 1 root root 6466 Jun 22 19:03 tsi-ollama-install.sh
root@tsisim:/tsi/tsi-sw/anoop_ollama# ./tsi-ollama-install.sh
Amazon is a vast, diverse landmass that covers over 75 countries. It is known for its lush forests,
crystal-clear rivers, and stunning natural landscapes.
root@tsisim:/tsi/tsi-sw/anoop_ollama# ollama run Gemma3:270M "Where is Amazon river?"
pulling manifest
pulling 735af2139dc6: 100% ▕██████████████████████████████████████████████████▏ 291 MB
pulling 4b19ac7dd2fb: 100% ▕██████████████████████████████████████████████████▏ 476 B
pulling 3e2c24001f9e: 100% ▕██████████████████████████████████████████████████▏ 8.4 KB
pulling 339e884a40f6: 100% ▕██████████████████████████████████████████████████▏ 61 B
pulling 74156d92caf6: 100% ▕██████████████████████████████████████████████████▏ 490 B
verifying sha256 digest
writing manifest
success
Amazon River is located in Brazil.
root@tsisim:/tsi/tsi-sw/anoop_ollama#
root@tsisim:/tsi/tsi-sw/anoop_ollama#
root@tsisim:/tsi/tsi-sw/anoop_ollama#
#######
root@tsisim:
## journalctl -u ollama -froot@tsisim:
Jun 22 19:04:28 tsisim ollama[1357]: [GIN-debug] POST /v1/chat/completions --> github.com/ollama/ollama/server.(*Server).ChatHandler-fm (6 handlers)
Jun 22 19:04:28 tsisim ollama[1357]: [GIN-debug] POST /v1/completions --> github.com/ollama/ollama/server.(*Server).GenerateHandler-fm (6 handlers)
Jun 22 19:04:28 tsisim ollama[1357]: [GIN-debug] POST /v1/embeddings --> github.com/ollama/ollama/server.(*Server).EmbedHandler-fm (6 handlers)
Jun 22 19:04:28 tsisim ollama[1357]: [GIN-debug] GET /v1/models --> github.com/ollama/ollama/server.(*Server).ListHandler-fm (6 handlers)
Jun 22 19:04:28 tsisim ollama[1357]: [GIN-debug] GET /v1/models/:model --> github.com/ollama/ollama/server.(*Server).ShowHandler-fm (6 handlers)
Jun 22 19:04:28 tsisim ollama[1357]: time=2026-06-22T19:04:28.171Z level=INFO source=routes.go:1569 msg="Listening on [::]:11434 (version 0.0.0)"
Jun 22 19:04:28 tsisim ollama[1357]: time=2026-06-22T19:04:28.184Z level=INFO source=runner.go:80 msg="discovering available GPUs..."
Jun 22 19:04:29 tsisim ollama[1357]: time=2026-06-22T19:04:29.096Z level=INFO source=runner.go:551 msg="failure during GPU discovery" OLLAMA_LIBRARY_PATH=[/tsi/tsi-sw/anoop_ollama/ollama-arm64-release/bin] extra_envs=[] error="llamarunner free vram reporting not supported"
Jun 22 19:04:29 tsisim ollama[1357]: time=2026-06-22T19:04:29.105Z level=INFO source=types.go:129 msg="inference compute" id=cpu library=cpu compute="" name=cpu description=cpu libdirs=ollama driver="" pci_id="" type="" total="3.3 GiB" available="3.0 GiB"
Jun 22 19:04:29 tsisim ollama[1357]: time=2026-06-22T19:04:29.105Z level=INFO source=routes.go:1610 msg="entering low vram mode" "total vram"="0 B" threshold="20.0 GiB"
Jun 22 19:05:46 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:05:46 | 200 | 2.108059ms | 127.0.0.1 | HEAD "/"
Jun 22 19:05:46 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:05:46 | 404 | 8.658635ms | 127.0.0.1 | POST "/api/show"
Jun 22 19:05:47 tsisim ollama[1357]: time=2026-06-22T19:05:47.846Z level=INFO source=download.go:177 msg="downloading fad2a06e4cc7 in 4 100 MB part(s)"
Jun 22 19:06:24 tsisim ollama[1357]: time=2026-06-22T19:06:24.156Z level=INFO source=download.go:177 msg="downloading 41c2cf8c272f in 1 7.3 KB part(s)"
Jun 22 19:06:25 tsisim ollama[1357]: time=2026-06-22T19:06:25.466Z level=INFO source=download.go:177 msg="downloading 1da0581fd4ce in 1 130 B part(s)"
Jun 22 19:06:26 tsisim ollama[1357]: time=2026-06-22T19:06:26.776Z level=INFO source=download.go:177 msg="downloading f02dd72bb242 in 1 59 B part(s)"
Jun 22 19:06:28 tsisim ollama[1357]: time=2026-06-22T19:06:28.098Z level=INFO source=download.go:177 msg="downloading ea0a531a015b in 1 485 B part(s)"
Jun 22 19:06:31 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:06:31 | 200 | 44.938625408s | 127.0.0.1 | POST "/api/pull"
Jun 22 19:06:32 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:06:32 | 200 | 567.513118ms | 127.0.0.1 | POST "/api/show"
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /tsi/ollama-models/blobs/sha256-fad2a06e4cc705c2fa8bec5477ddb00dc0c859ac184c34dcc5586663774161ca (version GGUF V3 (latest))
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 0: general.architecture str = qwen2
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 1: general.name str = Qwen2-beta-0_5B-Chat
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 2: qwen2.block_count u32 = 24
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 4: qwen2.embedding_length u32 = 1024
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 2816
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 16
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 16
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 8: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 9: qwen2.use_parallel_residual bool = true
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 10: tokenizer.ggml.model str = gpt2
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 13: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 151643
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 15: tokenizer.ggml.padding_token_id u32 = 151643
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 151643
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 17: tokenizer.chat_template str = {% for message in messages %}{% if lo...
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 18: general.quantization_version u32 = 2
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - kv 19: general.file_type u32 = 2
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - type f32: 121 tensors
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - type q4_0: 169 tensors
Jun 22 19:06:34 tsisim ollama[1357]: llama_model_loader: - type q6_K: 1 tensors
Jun 22 19:06:34 tsisim ollama[1357]: print_info: file format = GGUF V3 (latest)
Jun 22 19:06:34 tsisim ollama[1357]: print_info: file type = Q4_0
Jun 22 19:06:34 tsisim ollama[1357]: print_info: file size = 371.02 MiB (5.02 BPW)
Jun 22 19:06:35 tsisim ollama[1357]: load: missing or unrecognized pre-tokenizer type, using: 'default'
Jun 22 19:06:35 tsisim ollama[1357]: load: printing all EOG tokens:
Jun 22 19:06:35 tsisim ollama[1357]: load: - 151643 ('<|endoftext|>')
Jun 22 19:06:35 tsisim ollama[1357]: load: - 151645 ('<|im_end|>')
Jun 22 19:06:35 tsisim ollama[1357]: load: special tokens cache size = 293
Jun 22 19:06:36 tsisim ollama[1357]: load: token to piece cache size = 0.9338 MB
Jun 22 19:06:36 tsisim ollama[1357]: print_info: arch = qwen2
Jun 22 19:06:36 tsisim ollama[1357]: print_info: vocab_only = 1
Jun 22 19:06:36 tsisim ollama[1357]: print_info: model type = ?B
Jun 22 19:06:36 tsisim ollama[1357]: print_info: model params = 619.57 M
Jun 22 19:06:36 tsisim ollama[1357]: print_info: general.name = Qwen2-beta-0_5B-Chat
Jun 22 19:06:36 tsisim ollama[1357]: print_info: vocab type = BPE
Jun 22 19:06:36 tsisim ollama[1357]: print_info: n_vocab = 151936
Jun 22 19:06:36 tsisim ollama[1357]: print_info: n_merges = 151387
Jun 22 19:06:36 tsisim ollama[1357]: print_info: BOS token = 151643 '<|endoftext|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: EOS token = 151643 '<|endoftext|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: EOT token = 151645 '<|im_end|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: PAD token = 151643 '<|endoftext|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: LF token = 198 'Ċ'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: EOG token = 151643 '<|endoftext|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: EOG token = 151645 '<|im_end|>'
Jun 22 19:06:36 tsisim ollama[1357]: print_info: max token length = 256
Jun 22 19:06:36 tsisim ollama[1357]: llama_model_load: vocab only - skipping tensors
Jun 22 19:06:36 tsisim ollama[1357]: time=2026-06-22T19:06:36.049Z level=INFO source=server.go:402 msg="starting runner" cmd="/tsi/tsi-sw/anoop_ollama/ollama-arm64-release/bin/ollama runner --model /tsi/ollama-models/blobs/sha256-fad2a06e4cc705c2fa8bec5477ddb00dc0c859ac184c34dcc5586663774161ca --port 42995"
Jun 22 19:06:36 tsisim ollama[1357]: time=2026-06-22T19:06:36.078Z level=INFO source=server.go:507 msg="system memory" total="3.3 GiB" free="2.9 GiB" free_swap="8.0 GiB"
Jun 22 19:06:36 tsisim ollama[1357]: time=2026-06-22T19:06:36.096Z level=INFO source=memory.go:36 msg="new model will fit in available VRAM across minimum required GPUs, loading" model=/tsi/ollama-models/blobs/sha256-fad2a06e4cc705c2fa8bec5477ddb00dc0c859ac184c34dcc5586663774161ca library=cpu parallel=1 required="0 B" gpus=1
Jun 22 19:06:36 tsisim ollama[1357]: time=2026-06-22T19:06:36.106Z level=INFO source=server.go:547 msg=offload library=cpu layers.requested=-1 layers.model=25 layers.offload=0 layers.split=[] memory.available="[3.0 GiB]" memory.gpu_overhead="0 B" memory.required.full="993.2 MiB" memory.required.partial="0 B" memory.required.kv="384.0 MiB" memory.required.allocations="[993.2 MiB]" memory.weights.total="287.6 MiB" memory.weights.repeating="165.8 MiB" memory.weights.nonrepeating="121.7 MiB" memory.graph.full="298.8 MiB" memory.graph.partial="420.5 MiB"
Jun 22 19:06:36 tsisim ollama[1357]: time=2026-06-22T19:06:36.312Z level=INFO source=runner.go:907 msg="starting llama runner"
Jun 22 19:06:39 tsisim ollama[1357]: time=2026-06-22T19:06:39.091Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Jun 22 19:06:39 tsisim ollama[1357]: time=2026-06-22T19:06:39.108Z level=INFO source=runner.go:967 msg="Server listening on 127.0.0.1:42995"
Jun 22 19:06:39 tsisim ollama[1357]: time=2026-06-22T19:06:39.120Z level=INFO source=runner.go:830 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:false KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jun 22 19:06:39 tsisim ollama[1357]: time=2026-06-22T19:06:39.125Z level=INFO source=server.go:1274 msg="waiting for llama runner to start responding"
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_load_from_file_impl: using device Tsavorite (txe) (unknown id) - 128 MiB free
Jun 22 19:06:39 tsisim ollama[1357]: time=2026-06-22T19:06:39.137Z level=INFO source=server.go:1308 msg="waiting for server to become available" status="llm server loading model"
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: loaded meta data with 20 key-value pairs and 291 tensors from /tsi/ollama-models/blobs/sha256-fad2a06e4cc705c2fa8bec5477ddb00dc0c859ac184c34dcc5586663774161ca (version GGUF V3 (latest))
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 0: general.architecture str = qwen2
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 1: general.name str = Qwen2-beta-0_5B-Chat
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 2: qwen2.block_count u32 = 24
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 3: qwen2.context_length u32 = 32768
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 4: qwen2.embedding_length u32 = 1024
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 5: qwen2.feed_forward_length u32 = 2816
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 6: qwen2.attention.head_count u32 = 16
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 7: qwen2.attention.head_count_kv u32 = 16
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 8: qwen2.attention.layer_norm_rms_epsilon f32 = 0.000001
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 9: qwen2.use_parallel_residual bool = true
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 10: tokenizer.ggml.model str = gpt2
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 11: tokenizer.ggml.tokens arr[str,151936] = ["!", """, "#", "$", "%", "&", "'", ...
Jun 22 19:06:39 tsisim ollama[1357]: llama_model_loader: - kv 12: tokenizer.ggml.token_type arr[i32,151936] = [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 13: tokenizer.ggml.merges arr[str,151387] = ["Ġ Ġ", "ĠĠ ĠĠ", "i n", "Ġ t",...
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 14: tokenizer.ggml.eos_token_id u32 = 151643
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 15: tokenizer.ggml.padding_token_id u32 = 151643
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 16: tokenizer.ggml.bos_token_id u32 = 151643
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 17: tokenizer.chat_template str = {% for message in messages %}{% if lo...
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 18: general.quantization_version u32 = 2
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - kv 19: general.file_type u32 = 2
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - type f32: 121 tensors
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - type q4_0: 169 tensors
Jun 22 19:06:40 tsisim ollama[1357]: llama_model_loader: - type q6_K: 1 tensors
Jun 22 19:06:40 tsisim ollama[1357]: print_info: file format = GGUF V3 (latest)
Jun 22 19:06:40 tsisim ollama[1357]: print_info: file type = Q4_0
Jun 22 19:06:40 tsisim ollama[1357]: print_info: file size = 371.02 MiB (5.02 BPW)
Jun 22 19:06:40 tsisim ollama[1357]: load: missing or unrecognized pre-tokenizer type, using: 'default'
Jun 22 19:06:41 tsisim ollama[1357]: load: printing all EOG tokens:
Jun 22 19:06:41 tsisim ollama[1357]: load: - 151643 ('<|endoftext|>')
Jun 22 19:06:41 tsisim ollama[1357]: load: - 151645 ('<|im_end|>')
Jun 22 19:06:41 tsisim ollama[1357]: load: special tokens cache size = 293
Jun 22 19:06:41 tsisim ollama[1357]: load: token to piece cache size = 0.9338 MB
Jun 22 19:06:41 tsisim ollama[1357]: print_info: arch = qwen2
Jun 22 19:06:41 tsisim ollama[1357]: print_info: vocab_only = 0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_ctx_train = 32768
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_embd = 1024
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_layer = 24
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_head = 16
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_head_kv = 16
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_rot = 64
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_swa = 0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: is_swa_any = 0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_embd_head_k = 64
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_embd_head_v = 64
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_gqa = 1
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_embd_k_gqa = 1024
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_embd_v_gqa = 1024
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_norm_eps = 0.0e+00
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_norm_rms_eps = 1.0e-06
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_clamp_kqv = 0.0e+00
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_max_alibi_bias = 0.0e+00
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_logit_scale = 0.0e+00
Jun 22 19:06:41 tsisim ollama[1357]: print_info: f_attn_scale = 0.0e+00
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_ff = 2816
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_expert = 0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_expert_used = 0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: causal attn = 1
Jun 22 19:06:41 tsisim ollama[1357]: print_info: pooling type = -1
Jun 22 19:06:41 tsisim ollama[1357]: print_info: rope type = 2
Jun 22 19:06:41 tsisim ollama[1357]: print_info: rope scaling = linear
Jun 22 19:06:41 tsisim ollama[1357]: print_info: freq_base_train = 10000.0
Jun 22 19:06:41 tsisim ollama[1357]: print_info: freq_scale_train = 1
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_ctx_orig_yarn = 32768
Jun 22 19:06:41 tsisim ollama[1357]: print_info: rope_finetuned = unknown
Jun 22 19:06:41 tsisim ollama[1357]: print_info: model type = 0.5B
Jun 22 19:06:41 tsisim ollama[1357]: print_info: model params = 619.57 M
Jun 22 19:06:41 tsisim ollama[1357]: print_info: general.name = Qwen2-beta-0_5B-Chat
Jun 22 19:06:41 tsisim ollama[1357]: print_info: vocab type = BPE
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_vocab = 151936
Jun 22 19:06:41 tsisim ollama[1357]: print_info: n_merges = 151387
Jun 22 19:06:41 tsisim ollama[1357]: print_info: BOS token = 151643 '<|endoftext|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: EOS token = 151643 '<|endoftext|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: EOT token = 151645 '<|im_end|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: PAD token = 151643 '<|endoftext|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: LF token = 198 'Ċ'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: EOG token = 151643 '<|endoftext|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: EOG token = 151645 '<|im_end|>'
Jun 22 19:06:41 tsisim ollama[1357]: print_info: max token length = 256
Jun 22 19:06:41 tsisim ollama[1357]: load_tensors: loading model tensors, this can take a while... (mmap = false)
Jun 22 19:06:41 tsisim ollama[1357]: load_tensors: offloading 0 repeating layers to GPU
Jun 22 19:06:41 tsisim ollama[1357]: load_tensors: offloaded 0/25 layers to GPU
Jun 22 19:06:41 tsisim ollama[1357]: load_tensors: CPU model buffer size = 371.02 MiB
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: constructing llama_context
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_seq_max = 1
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_ctx = 4096
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_ctx_per_seq = 4096
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_batch = 512
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_ubatch = 512
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: causal_attn = 1
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: flash_attn = disabled
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: kv_unified = false
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: freq_base = 10000.0
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: freq_scale = 1
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
Jun 22 19:06:46 tsisim ollama[1357]: llama_context: CPU output buffer size = 0.58 MiB
Jun 22 19:06:46 tsisim ollama[1357]: llama_kv_cache: CPU KV buffer size = 384.00 MiB
Jun 22 19:06:51 tsisim ollama[1357]: llama_kv_cache: size = 384.00 MiB ( 4096 cells, 24 layers, 1/1 seqs), K (f16): 192.00 MiB, V (f16): 192.00 MiB
Jun 22 19:06:51 tsisim ollama[1357]: llama_context: tsavorite compute buffer size = 20.50 MiB
Jun 22 19:06:51 tsisim ollama[1357]: llama_context: CPU compute buffer size = 298.75 MiB
Jun 22 19:06:51 tsisim ollama[1357]: llama_context: graph nodes = 942
Jun 22 19:06:51 tsisim ollama[1357]: llama_context: graph splits = 196
Jun 22 19:06:51 tsisim ollama[1357]: time=2026-06-22T19:06:51.649Z level=INFO source=server.go:1312 msg="llama runner started in 15.60 seconds"
Jun 22 19:06:51 tsisim ollama[1357]: time=2026-06-22T19:06:51.650Z level=INFO source=sched.go:485 msg="loaded runners" count=1
Jun 22 19:06:51 tsisim ollama[1357]: time=2026-06-22T19:06:51.650Z level=INFO source=server.go:1274 msg="waiting for llama runner to start responding"
Jun 22 19:06:51 tsisim ollama[1357]: time=2026-06-22T19:06:51.653Z level=INFO source=server.go:1312 msg="llama runner started in 15.60 seconds"
Jun 22 19:11:49 tsisim ollama[1472]: TSI deploy yaml=/tsi/tsi-sw/anoop_ollama/ollama-arm64-release/bin/tsavorite-model-deployment.yaml txe_count=1 multi_thread_enable=0
Jun 22 19:11:49 tsisim ollama[1472]: finalize 4
Jun 22 19:11:49 tsisim ollama[1472]: OPU Profiling Results:
Jun 22 19:11:49 tsisim ollama[1472]: Profiler disabled
Jun 22 19:11:49 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:11:49 | 200 | 5m17s | 127.0.0.1 | POST "/api/generate"
Jun 22 19:15:19 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:15:19 | 200 | 99.2µs | 127.0.0.1 | HEAD "/"
Jun 22 19:15:19 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:15:19 | 404 | 3.538958ms | 127.0.0.1 | POST "/api/show"
Jun 22 19:15:20 tsisim ollama[1357]: time=2026-06-22T19:15:20.839Z level=INFO source=download.go:177 msg="downloading 735af2139dc6 in 3 100 MB part(s)"
Jun 22 19:15:47 tsisim ollama[1357]: time=2026-06-22T19:15:47.109Z level=INFO source=download.go:177 msg="downloading 4b19ac7dd2fb in 1 476 B part(s)"
Jun 22 19:15:48 tsisim ollama[1357]: time=2026-06-22T19:15:48.353Z level=INFO source=download.go:177 msg="downloading 3e2c24001f9e in 1 8.4 KB part(s)"
Jun 22 19:15:49 tsisim ollama[1357]: time=2026-06-22T19:15:49.590Z level=INFO source=download.go:177 msg="downloading 339e884a40f6 in 1 61 B part(s)"
Jun 22 19:15:50 tsisim ollama[1357]: time=2026-06-22T19:15:50.851Z level=INFO source=download.go:177 msg="downloading 74156d92caf6 in 1 490 B part(s)"
Jun 22 19:15:53 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:15:53 | 200 | 33.74036263s | 127.0.0.1 | POST "/api/pull"
Jun 22 19:15:54 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:15:54 | 200 | 699.225569ms | 127.0.0.1 | POST "/api/show"
Jun 22 19:15:55 tsisim ollama[1357]: time=2026-06-22T19:15:55.534Z level=WARN source=server.go:1757 msg="llama server stopped" pid=1472
Jun 22 19:15:55 tsisim ollama[1357]: time=2026-06-22T19:15:55.870Z level=INFO source=sched.go:548 msg="updated VRAM based on existing loaded models" gpu=0 library=cpu total="3.3 GiB" available="2.9 GiB"
Jun 22 19:15:56 tsisim ollama[1357]: time=2026-06-22T19:15:56.557Z level=INFO source=server.go:218 msg="enabling flash attention"
Jun 22 19:15:56 tsisim ollama[1357]: time=2026-06-22T19:15:56.561Z level=INFO source=server.go:402 msg="starting runner" cmd="/tsi/tsi-sw/anoop_ollama/ollama-arm64-release/bin/ollama runner --ollama-engine --model /tsi/ollama-models/blobs/sha256-735af2139dc652bf01112746474883d79a52fa1c19038265d363e3d42556f7a2 --port 36747"
Jun 22 19:15:56 tsisim ollama[1357]: time=2026-06-22T19:15:56.566Z level=INFO source=server.go:678 msg="loading model" "model layers"=19 requested=-1
Jun 22 19:15:56 tsisim ollama[1357]: time=2026-06-22T19:15:56.571Z level=INFO source=server.go:684 msg="system memory" total="3.3 GiB" free="2.9 GiB" free_swap="8.0 GiB"
Jun 22 19:15:56 tsisim ollama[1357]: time=2026-06-22T19:15:56.759Z level=INFO source=runner.go:907 msg="starting llama runner"
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.348Z level=INFO source=ggml.go:104 msg=system CPU.0.NEON=1 CPU.0.ARM_FMA=1 CPU.0.LLAMAFILE=1 compiler=cgo(gcc)
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.355Z level=INFO source=runner.go:967 msg="Server listening on 127.0.0.1:36747"
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.378Z level=INFO source=runner.go:830 msg=load request="{Operation:fit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.382Z level=INFO source=runner.go:830 msg=load request="{Operation:alloc LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.387Z level=INFO source=sched.go:485 msg="loaded runners" count=2
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.387Z level=INFO source=server.go:1274 msg="waiting for llama runner to start responding"
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.385Z level=INFO source=runner.go:830 msg=load request="{Operation:commit LoraPath:[] Parallel:1 BatchSize:512 FlashAttention:true KvSize:4096 KvCacheType: NumThreads:4 GPULayers:[] MultiUserCache:false ProjectorPath: MainGPU:0 UseMmap:false}"
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_load_from_file_impl: using device Tsavorite (txe) (unknown id) - 128 MiB free
Jun 22 19:15:59 tsisim ollama[1357]: time=2026-06-22T19:15:59.390Z level=INFO source=server.go:1308 msg="waiting for server to become available" status="llm server loading model"
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: loaded meta data with 36 key-value pairs and 236 tensors from /tsi/ollama-models/blobs/sha256-735af2139dc652bf01112746474883d79a52fa1c19038265d363e3d42556f7a2 (version GGUF V3 (latest))
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 0: general.architecture str = gemma3
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 1: general.type str = model
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 2: general.size_label str = 268M
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 3: general.license str = gemma
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 4: general.base_model.count u32 = 1
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 5: general.base_model.0.name str = Gemma 3 270m
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 6: general.base_model.0.organization str = Google
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 7: general.base_model.0.repo_url str = https://huggingface.co/google/gemma-3...
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 8: general.tags arr[str,4] = ["gemma3", "gemma", "google", "text-g...
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 9: gemma3.context_length u32 = 32768
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 10: gemma3.embedding_length u32 = 640
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 11: gemma3.block_count u32 = 18
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 12: gemma3.feed_forward_length u32 = 2048
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 13: gemma3.attention.head_count u32 = 4
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 14: gemma3.attention.layer_norm_rms_epsilon f32 = 0.000001
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 15: gemma3.attention.key_length u32 = 256
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 16: gemma3.attention.value_length u32 = 256
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 17: gemma3.rope.freq_base f32 = 1000000.000000
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 18: gemma3.attention.sliding_window u32 = 512
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 19: gemma3.attention.head_count_kv u32 = 1
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 20: tokenizer.ggml.model str = llama
Jun 22 19:15:59 tsisim ollama[1357]: llama_model_loader: - kv 21: tokenizer.ggml.pre str = default
Jun 22 19:16:00 tsisim ollama[1357]: llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,262144] = ["", "", "", "", ...
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 23: tokenizer.ggml.scores arr[f32,262144] = [-1000.000000, -1000.000000, -1000.00...
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 24: tokenizer.ggml.token_type arr[i32,262144] = [3, 3, 3, 3, 3, 4, 3, 3, 3, 3, 3, 3, ...
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 25: tokenizer.ggml.bos_token_id u32 = 2
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 26: tokenizer.ggml.eos_token_id u32 = 1
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 27: tokenizer.ggml.unknown_token_id u32 = 3
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 28: tokenizer.ggml.padding_token_id u32 = 0
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 29: tokenizer.ggml.add_bos_token bool = true
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 30: tokenizer.ggml.add_sep_token bool = false
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 31: tokenizer.ggml.add_eos_token bool = false
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 32: tokenizer.chat_template str = {{ bos_token }}\n{%- if messages[0]['r...
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 33: tokenizer.ggml.add_space_prefix bool = false
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 34: general.quantization_version u32 = 2
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - kv 35: general.file_type u32 = 7
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - type f32: 109 tensors
Jun 22 19:16:01 tsisim ollama[1357]: llama_model_loader: - type q8_0: 127 tensors
Jun 22 19:16:01 tsisim ollama[1357]: print_info: file format = GGUF V3 (latest)
Jun 22 19:16:01 tsisim ollama[1357]: print_info: file type = Q8_0
Jun 22 19:16:01 tsisim ollama[1357]: print_info: file size = 271.81 MiB (8.50 BPW)
Jun 22 19:16:02 tsisim ollama[1357]: load: special_eos_id is not in special_eog_ids - the tokenizer config may be incorrect
Jun 22 19:16:02 tsisim ollama[1357]: load: printing all EOG tokens:
Jun 22 19:16:02 tsisim ollama[1357]: load: - 1 ('')
Jun 22 19:16:02 tsisim ollama[1357]: load: - 106 ('<end_of_turn>')
Jun 22 19:16:02 tsisim ollama[1357]: load: special tokens cache size = 6414
Jun 22 19:16:02 tsisim ollama[1357]: load: token to piece cache size = 1.9446 MB
Jun 22 19:16:02 tsisim ollama[1357]: print_info: arch = gemma3
Jun 22 19:16:02 tsisim ollama[1357]: print_info: vocab_only = 0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_ctx_train = 32768
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_embd = 640
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_layer = 18
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_head = 4
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_head_kv = 1
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_rot = 256
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_swa = 512
Jun 22 19:16:02 tsisim ollama[1357]: print_info: is_swa_any = 1
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_embd_head_k = 256
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_embd_head_v = 256
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_gqa = 4
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_embd_k_gqa = 256
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_embd_v_gqa = 256
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_norm_eps = 0.0e+00
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_norm_rms_eps = 1.0e-06
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_clamp_kqv = 0.0e+00
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_max_alibi_bias = 0.0e+00
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_logit_scale = 0.0e+00
Jun 22 19:16:02 tsisim ollama[1357]: print_info: f_attn_scale = 6.2e-02
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_ff = 2048
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_expert = 0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_expert_used = 0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: causal attn = 1
Jun 22 19:16:02 tsisim ollama[1357]: print_info: pooling type = 0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: rope type = 2
Jun 22 19:16:02 tsisim ollama[1357]: print_info: rope scaling = linear
Jun 22 19:16:02 tsisim ollama[1357]: print_info: freq_base_train = 1000000.0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: freq_scale_train = 1
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_ctx_orig_yarn = 32768
Jun 22 19:16:02 tsisim ollama[1357]: print_info: rope_finetuned = unknown
Jun 22 19:16:02 tsisim ollama[1357]: print_info: model type = 270M
Jun 22 19:16:02 tsisim ollama[1357]: print_info: model params = 268.10 M
Jun 22 19:16:02 tsisim ollama[1357]: print_info: general.name = n/a
Jun 22 19:16:02 tsisim ollama[1357]: print_info: vocab type = SPM
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_vocab = 262144
Jun 22 19:16:02 tsisim ollama[1357]: print_info: n_merges = 0
Jun 22 19:16:02 tsisim ollama[1357]: print_info: BOS token = 2 ''
Jun 22 19:16:02 tsisim ollama[1357]: print_info: EOS token = 1 ''
Jun 22 19:16:02 tsisim ollama[1357]: print_info: EOT token = 106 '<end_of_turn>'
Jun 22 19:16:02 tsisim ollama[1357]: print_info: UNK token = 3 ''
Jun 22 19:16:02 tsisim ollama[1357]: print_info: PAD token = 0 ''
Jun 22 19:16:02 tsisim ollama[1357]: print_info: LF token = 248 '<0x0A>'
Jun 22 19:16:02 tsisim ollama[1357]: print_info: EOG token = 1 ''
Jun 22 19:16:02 tsisim ollama[1357]: print_info: EOG token = 106 '<end_of_turn>'
Jun 22 19:16:02 tsisim ollama[1357]: print_info: max token length = 48
Jun 22 19:16:02 tsisim ollama[1357]: load_tensors: loading model tensors, this can take a while... (mmap = false)
Jun 22 19:16:02 tsisim ollama[1357]: load_tensors: offloading 0 repeating layers to GPU
Jun 22 19:16:02 tsisim ollama[1357]: load_tensors: offloaded 0/19 layers to GPU
Jun 22 19:16:02 tsisim ollama[1357]: load_tensors: CPU model buffer size = 271.81 MiB
Jun 22 19:16:06 tsisim ollama[1357]: llama_init_from_model: model default pooling_type is [0], but [-1] was specified
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: constructing llama_context
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_seq_max = 1
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_ctx = 4096
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_ctx_per_seq = 4096
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_batch = 512
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_ubatch = 512
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: causal_attn = 1
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: flash_attn = enabled
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: kv_unified = false
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: freq_base = 1000000.0
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: freq_scale = 1
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: n_ctx_per_seq (4096) < n_ctx_train (32768) -- the full capacity of the model will not be utilized
Jun 22 19:16:06 tsisim ollama[1357]: llama_context: CPU output buffer size = 1.00 MiB
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache_iswa: using full-size SWA cache (ref: https://github.com/ggml-org/llama.cpp/pull/13194#issuecomment-2868343055)
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache_iswa: creating non-SWA KV cache, size = 4096 cells
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache: CPU KV buffer size = 12.00 MiB
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache: size = 12.00 MiB ( 4096 cells, 3 layers, 1/1 seqs), K (f16): 6.00 MiB, V (f16): 6.00 MiB
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache_iswa: creating SWA KV cache, size = 4096 cells
Jun 22 19:16:06 tsisim ollama[1357]: llama_kv_cache: CPU KV buffer size = 60.00 MiB
Jun 22 19:16:07 tsisim ollama[1357]: llama_kv_cache: size = 60.00 MiB ( 4096 cells, 15 layers, 1/1 seqs), K (f16): 30.00 MiB, V (f16): 30.00 MiB
Jun 22 19:16:07 tsisim ollama[1357]: llama_context: tsavorite compute buffer size = 3.25 MiB
Jun 22 19:16:07 tsisim ollama[1357]: llama_context: CPU compute buffer size = 513.25 MiB
Jun 22 19:16:07 tsisim ollama[1357]: llama_context: graph nodes = 729
Jun 22 19:16:07 tsisim ollama[1357]: llama_context: graph splits = 258
Jun 22 19:16:07 tsisim ollama[1357]: time=2026-06-22T19:16:07.289Z level=INFO source=server.go:1312 msg="llama runner started in 10.73 seconds"
Jun 22 19:16:49 tsisim ollama[1357]: time=2026-06-22T19:16:49.772Z level=WARN source=server.go:1757 msg="llama server stopped" pid=1472
Jun 22 19:16:51 tsisim ollama[25460]: TSI deploy yaml=/tsi/tsi-sw/anoop_ollama/ollama-arm64-release/bin/tsavorite-model-deployment.yaml txe_count=1 multi_thread_enable=0
Jun 22 19:16:51 tsisim ollama[25460]: finalize 4
Jun 22 19:16:51 tsisim ollama[25460]: OPU Profiling Results:
Jun 22 19:16:51 tsisim ollama[25460]: Profiler disabled
Jun 22 19:16:51 tsisim ollama[1357]: [GIN] 2026/06/22 - 19:16:51 | 200 | 56.851042482s | 127.0.0.1 | POST "/api/generate"