Local AI Models
Run AI text actions privately on your own machine — no cloud, no API key, no cost
Why not built-in?
We would love to offer a one-click “Download & Run” experience for local AI models — and we will, as soon as the quality is there. Today’s small local models (2–7 billion parameters) are not yet good enough to meet our quality standard:
| Task | Cloud AI (GPT-4, Claude) | Local 7B | Local 27–35B (24+ GB VRAM or 32 GB RAM; RTX 5090 recommended) |
|---|---|---|---|
| Grammar correction (English, German, French) | Excellent | Decent | Good |
| Grammar correction (smaller languages) | Very good | Unreliable | Acceptable |
| Translation (common pairs like EN↔DE) | Excellent | Acceptable | Good |
| Translation (rare pairs like FI↔KO) | Good | Often wrong | Acceptable |
| Rewriting & summarization | Excellent | Basic | Good |
| Complex instructions | Excellent | Struggles | Decent |
We refuse to ship a built-in experience that gives you mediocre results. When local models reach cloud-level quality, we will add one-click support. Until then, you can set up local AI yourself in about five minutes using Ollama — a free, open-source tool that runs AI models on your machine.
When local AI makes sense
- Privacy — Your text never leaves your machine. No cloud contact for anything you process.
- Offline use — Works without an internet connection for AI. (A Pro license still re-checks online about once a month to stay valid; no usage or content is sent.)
- No API costs — No API key needed, no per-request charges.
- Experimentation — Try different open-source models and compare results.
Setup Guide: Ollama + PasteSuiteAI
This guide takes you from zero to a working local AI connection in about five minutes. Pick your operating system below:
Download Ollama
Visit ollama.com/download/windows and download the installer. Run it — no configuration needed, just click through.
Download a model
Open a terminal (press Win+R, type cmd, press Enter) and run:
This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.
Verify Ollama is running
Ollama runs as a background service automatically after installation. Verify it by running:
You should see the model you just downloaded in the list. If you get an error, start Ollama from the Start menu.
Connect PasteSuiteAI
Now open PasteSuiteAI and create a new connection:
- Go to Settings > Connections
- Click Add Connection and select Ollama (Local)
- The endpoint URL and auth settings are pre-filled automatically. Just fill in:
| Field | Value |
|---|---|
| Connection Name | Ollama Local |
| Model ID | qwen2.5:7b (or whichever model you downloaded) |
| Capabilities | Enable LLM (and Vision if using a vision model) |
Test it
Click Test Connection in the connection editor. You should see a success message. If the test works, click Done — you can now use local AI with any action in PasteSuiteAI.
Install Ollama
Visit ollama.com/download/mac and download the app. Drag it to your Applications folder and open it once to complete setup.
Alternatively, install via Homebrew:
Download a model
Open Terminal and run:
This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.
Verify Ollama is running
Ollama runs as a background service after installation. Verify:
You should see your downloaded model. If Ollama isn’t running, open it from Applications.
Connect PasteSuiteAI
Open PasteSuiteAI and create a new connection:
- Go to Settings > Connections
- Click Add Connection and select Ollama (Local)
- The endpoint URL and auth settings are pre-filled automatically. Just fill in:
| Field | Value |
|---|---|
| Connection Name | Ollama Local |
| Model ID | qwen2.5:7b (or whichever model you downloaded) |
| Capabilities | Enable LLM (and Vision if using a vision model) |
Test it
Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.
Install Ollama
Open a terminal and run the official installer:
This installs Ollama and sets it up as a systemd service that starts automatically.
Download a model
In the same terminal, run:
This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.
Verify Ollama is running
The service should start automatically. Verify:
If the service isn’t running, start it manually:
Connect PasteSuiteAI
Open PasteSuiteAI and create a new connection:
- Go to Settings > Connections
- Click Add Connection and select Ollama (Local)
- The endpoint URL and auth settings are pre-filled automatically. Just fill in:
| Field | Value |
|---|---|
| Connection Name | Ollama Local |
| Model ID | qwen2.5:7b (or whichever model you downloaded) |
| Capabilities | Enable LLM (and Vision if using a vision model) |
Test it
Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.
Browse All Models
Ollama hosts hundreds of open-source models. Browse the full catalog at ollama.com/search to find the best model for your needs. You can filter by size, capability, and popularity.
Recommended Models
Standard Hardware (8–16 GB RAM, no dedicated GPU)
| Need | Model | Capabilities | Command | Download |
|---|---|---|---|---|
| Translation & correction (many languages) | Qwen 2.5 7B | LLM | ollama pull qwen2.5:7b |
4.7 GB |
| Best English quality | Llama 3.1 8B | LLM | ollama pull llama3.1:8b |
4.9 GB |
| Small download, 140+ languages | Gemma 3 4B | LLM, Vision | ollama pull gemma3:4b |
3.3 GB |
Power User: NVIDIA RTX 4090 (24 GB VRAM)
With a high-end GPU like the RTX 4090, you can run much larger models that deliver quality approaching cloud AI. These models run entirely in GPU memory for fast inference:
| # | Model | Capabilities | Command | Size | Why |
|---|---|---|---|---|---|
| 1 | Qwen 2.5 32B | LLM | ollama pull qwen2.5:32b |
20 GB | Battle-tested multilingual champion. Near cloud-level for translation and correction. |
| 2 | GPT-OSS 20B | LLM | ollama pull gpt-oss:20b |
14 GB | OpenAI’s first open model. Strong at English and reasoning tasks. |
| 3 | Gemma 3 27B | LLM, Vision | ollama pull gemma3:27b |
17 GB | Google model, 140+ languages. |
| 4 | Qwen3.5 27B | LLM, Vision | ollama pull qwen3.5:27b |
17 GB | Multimodal with optional thinking mode. Strong multilingual, 256K context. |
| 5 | Qwen3.5 35B-A3B | LLM, Vision | ollama pull qwen3.5:35b-a3b |
24 GB | MoE: 35B total, only 3B active per token — fast inference. Multimodal. |
LM Studio (Alternative)
If you prefer a graphical interface over the terminal, LM Studio is an excellent alternative. It provides a desktop app where you can browse, download, and run models with a few clicks.
- Download LM Studio from lmstudio.ai and install it
- Search for a model (e.g., “Qwen 2.5 7B”) and click Download
- Go to the Local Server tab and click Start Server
- In PasteSuiteAI, create a LM Studio (Local) connection — the endpoint URL is pre-filled automatically
- Enter your model name as Model ID
Troubleshooting
| Problem | Solution |
|---|---|
| Connection test fails with “connection refused” | Ollama is not running. Start it from the Start menu (Windows), Applications (macOS), or sudo systemctl start ollama (Linux). |
| Very slow responses | Your model may be too large for your RAM and is swapping to disk. Try a smaller model like qwen3:4b. |
| Model not found error | The Model ID in PasteSuiteAI must match the exact name shown by ollama list, including the tag (e.g., qwen2.5:7b, not just qwen2.5). |
| Poor translation or correction quality | This is a known limitation of small local models. Try a larger model (14B, 32B) if your hardware allows it, or use a cloud connection for tasks that require high accuracy. |
| Ollama uses too much memory | Ollama keeps the model in memory for fast responses. If you need to free RAM, run ollama stop <model> to unload it. |
Related Topics
- Connections — Full connection configuration reference
- Actions — How actions use connections for AI routing
- Speech to Text — STT setup and recording actions
- Getting Started — First-time setup guide