Ollama is an open-source platform and toolkit for running large language models (LLMs) locally on your machine (macOS, Linux, or Windows). It lets you download, manage, customize, and run models like LLaMA 3.3, Gemma 3, Phi-4, DeepSeek-R1, Mistral, and more, without reliance on cloud APIs. Ollama is free and open-source under the MIT License.
Do I need an internet connection to use Ollama?
Only to download models. Once downloaded, models run entirely offline, giving you full privacy and local control.
Can I use Ollama with my own models?
Yes, Ollama supports custom models via GGUF format and lets you tweak prompts and parameters through a Modelfile.
Features
- Local-first and lightweight: Install and run LLMs via a simple CLI (ollama run
), leveraging your own CPU/GPU instead of cloud servers. - Extensible and developer-friendly: Integrates well with Docker, REST API, and libraries in Python and JavaScript.
- Customizable models and prompts: Supports tweaking parameters, system messages (via "Modelfile"), and importing custom GGUF/Safetensors models.
What's New
- Fixed issue where tool calls without parameters would not be returned correctly
- Fixed does not support generate errors
- Fixed issue where some special tokens would not be tokenized properly for some model architectures
System Requirements
Platform & OS
- macOS: Big Sur (11) or later
- Linux: Ubuntu 18.04 / 20.04 / 22.04 or later
- Windows: Full support via native installer or WSL 2
CPU
- Minimum: Any modern x86-64 CPU supporting AVX2 (Intel/AMD)
- Recommended: 11th-gen Intel or AMD Zen 4 with AVX512 and ideally DDR5 memory
Memory RAM
- Minimum: 8GB
- Recommended: 16GB
GPU (Optional, but boosts performance significantly)
- Not required, but speeds inference dramatically
- Minimum recommendation: Nvidia or AMD GPU with 8 GB VRAM for 7B – 13B models
For large models:
- 30B model: 16 GB VRAM
- 65B model: 32 GB VRAM
- Very large models (70B+): 48 GB or more, possibly multi-GPU