Ollama - Package Updates

nebulon

[0.1.0]

Package Updates

[0.2.0]

Update ollama to 0.12.6
Full Changelog
Ollama's app now supports searching when running DeepSeek-V3.1, Qwen3 and other models that support tool calling.
Flash attention is now enabled by default for Gemma 3, improving performance and memory utilization
Fixed issue where Ollama would hang while generating responses
Fixed issue where qwen3-coder would act in raw mode when using /api/generate or ollama run qwen3-coder <prompt>
Fixed qwen3-embedding providing invalid results
Ollama will now evict models correctly when num_gpu is set
Fixed issue where tool_index with a value of 0 would not be sent to the model
Thinking models now support structured outputs when using the /api/chat API
Ollama's app will now wait until Ollama is running to allow for a conversation to be started
Fixed issue where "think": false would show an error instead of being silently ignored

Package Updates

[0.3.0]

Package Updates

[0.3.1]

Update ollama to 0.12.7
Full Changelog
Qwen3-VL is now available in all parameter sizes ranging from 2B to 235B
MiniMax-M2: a 230 Billion parameter model built for coding & agentic workflows available on Ollama's cloud
Ollama's new app now includes a way to add one or many files when prompting the model:
For better responses, thinking levels can now be adjusted for the gpt-oss models:
New API documentation is available for Ollama's API: https://docs.ollama.com/api
Model load failures now include more information on Windows
Fixed embedding results being incorrect when running embeddinggemma
Fixed gemma3n on Vulkan backend
Increased time allocated for ROCm to discover devices
Fixed truncation error when generating embeddings

Package Updates

[0.4.0]

Package Updates

[0.5.0]

Breaking: Move /api and /v1 endpoints to the main domain, to avoid requirement for a secondary domain.
Breaking: Use OpenAI compatible API key instead of JWT token. See docs for info on how to use that.

Package Updates

[1.0.0]

Package Updates

[1.0.1]

Update ollama to 0.12.10
Full Changelog
ollama run now works with embedding models
Fixed errors when running qwen3-vl:235b and qwen3-vl:235b-instruct
Enable flash attention for Vulkan (currently needs to be built from source)
Add Vulkan memory detection for Intel GPU using DXGI+PDH
Ollama will now return tool call IDs from the /api/chat API
Fixed hanging due to CPU discovery
Ollama will now show login instructions when switching to a cloud model in interactive mode
Fix reading stale VRAM data

Package Updates

[1.0.2]

Update ollama to 0.12.11
Full Changelog
Ollama's API and the OpenAI-compatible API now supports Logprobs
Ollama's new app now supports WebP images
Improved rendering performance in Ollama's new app, especially when rendering code
The "required" field in tool definitions will now be omitted if not specified
Fixed issue where "tool_call_id" would be omitted when using the OpenAI-compatible API.
Fixed issue where ollama create would import data from both consolidated.safetensors and other safetensor files.
Ollama will now prefer dedicated GPUs over iGPUs when scheduling models
Vulkan can now be enabled by setting OLLAMA_VULKAN=1. For example: OLLAMA_VULKAN=1 ollama serve

Package Updates

[1.1.0]

Package Updates

[1.1.1]

Package Updates

[1.1.2]

Package Updates

[1.1.3]

Package Updates

[1.1.4]

Package Updates

[1.1.5]

Update ollama to 0.13.4
Full Changelog
Nemotron 3 Nano: A new Standard for Efficient, Open, and Intelligent Agentic Models
Olmo 3 and Olmo 3.1: A series of Open language models designed to enable the science of language models. These models are pre-trained on the Dolma 3 dataset and post-trained on the Dolci datasets.
Enable Flash Attention automatically for models by default
Fixed handling of long contexts with Gemma 3 models
Fixed issue that would occur with Gemma 3 QAT models or other models imported with the Gemma 3 architecture

Package Updates

[1.1.6]

Update ollama to 0.13.5
Full Changelog
Google's FunctionGemma is now available on Ollama
bert architecture models now run on Ollama's engine
Added built-in renderer & tool parsing capabilities for DeepSeek-V3.1
Fixed issue where nested properties in tools may not have been rendered properly

Package Updates

[1.2.0]

Update ollama to 0.14.0
Full Changelog
ollama run --experimental CLI will now open a new Ollama CLI that includes an agent loop and the bash tool
Anthropic API compatibility: support for the /v1/messages API
A new REQUIRES command for the Modelfile allows declaring which version of Ollama is required for the model
For older models, Ollama will avoid an integer underflow on low VRAM systems during memory estimation
More accurate VRAM measurements for AMD iGPUs
Ollama's app will now highlight swift soure code
An error will now return when embeddings return NaN or -Inf
Ollama's Linux install bundles files now use zst compression
New experimental support for image generation models, powered by MLX

Package Updates

[1.2.1]

Package Updates

[1.2.3]

Update ollama to 0.14.3
Full Changelog
Z-Image Turbo: 6 billion parameter text-to-image model from Alibabas Tongyi Lab. It generates high-quality photorealistic images.
Flux.2 Klein: Black Forest Labs fastest image-generation models to date.
Fixed issue where Ollama's macOS app would interrupt system shutdown
Fixed ollama create and ollama show commands for experimental models
The /api/generate API can now be used for image generation
Fixed minor issues in Nemotron-3-Nano tool parsing
Fixed issue where removing an image generation model would cause it to first load
Fixed issue where ollama rm would only stop the first model in the list if it were running

Package Updates

[1.3.0]

Update ollama to 0.15.0
Full Changelog
A new ollama launch command to use Ollama's models with Claude Code, Codex, OpenCode, and Droid without separate configuration.
New ollama launch command for Claude Code, Codex, OpenCode, and Droid
Fixed issue where creating multi-line strings with """ would not work when using ollama run
<kbd>Ctrl</kbd>+<kbd>J</kbd> and <kbd>Shift</kbd>+<kbd>Enter</kbd> now work for inserting newlines in ollama run
Reduced memory usage for GLM-4.7-Flash models

Cloudron Forum