Local AI Models

Run AI text actions privately on your own machine — no cloud, no API key, no cost

Why not built-in?

We would love to offer a one-click “Download & Run” experience for local AI models — and we will, as soon as the quality is there. Today’s small local models (2–7 billion parameters) are not yet good enough to meet our quality standard:

Task Cloud AI (GPT-4, Claude) Local 7B Local 27–35B
(24+ GB VRAM or 32 GB RAM;
RTX 5090 recommended)
Grammar correction (English, German, French) Excellent Decent Good
Grammar correction (smaller languages) Very good Unreliable Acceptable
Translation (common pairs like EN↔DE) Excellent Acceptable Good
Translation (rare pairs like FI↔KO) Good Often wrong Acceptable
Rewriting & summarization Excellent Basic Good
Complex instructions Excellent Struggles Decent

We refuse to ship a built-in experience that gives you mediocre results. When local models reach cloud-level quality, we will add one-click support. Until then, you can set up local AI yourself in about five minutes using Ollama — a free, open-source tool that runs AI models on your machine.

Good news: Local speech-to-text (Whisper) is a different story — it already matches cloud quality. PasteSuiteAI includes built-in one-click Whisper support for local transcription. See Speech to Text for setup details.

When local AI makes sense

Hardware requirements: Local AI models need significant RAM. A 7B model needs at least 8 GB of RAM (16 GB recommended). A dedicated GPU speeds things up considerably but is not strictly required — CPU-only inference works, just slower.

Setup Guide: Ollama + PasteSuiteAI

This guide takes you from zero to a working local AI connection in about five minutes. Pick your operating system below:

1

Download Ollama

Visit ollama.com/download/windows and download the installer. Run it — no configuration needed, just click through.

2

Download a model

Open a terminal (press Win+R, type cmd, press Enter) and run:

> ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

3

Verify Ollama is running

Ollama runs as a background service automatically after installation. Verify it by running:

> ollama list

You should see the model you just downloaded in the list. If you get an error, start Ollama from the Start menu.

4

Connect PasteSuiteAI

Now open PasteSuiteAI and create a new connection:

  1. Go to Settings > Connections
  2. Click Add Connection and select Ollama (Local)
  3. The endpoint URL and auth settings are pre-filled automatically. Just fill in:
FieldValue
Connection NameOllama Local
Model IDqwen2.5:7b (or whichever model you downloaded)
CapabilitiesEnable LLM (and Vision if using a vision model)
5

Test it

Click Test Connection in the connection editor. You should see a success message. If the test works, click Done — you can now use local AI with any action in PasteSuiteAI.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.
Note: PasteSuiteAI is currently Windows-only. These instructions help you set up Ollama on macOS — useful if you run Ollama on a Mac and connect to it from a Windows machine over the network.
1

Install Ollama

Visit ollama.com/download/mac and download the app. Drag it to your Applications folder and open it once to complete setup.

Alternatively, install via Homebrew:

$ brew install ollama
2

Download a model

Open Terminal and run:

$ ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

Apple Silicon Macs (M1/M2/M3/M4) run local models very well thanks to their unified memory architecture. Even a MacBook Air with 16 GB RAM handles 7B models comfortably.
3

Verify Ollama is running

Ollama runs as a background service after installation. Verify:

$ ollama list

You should see your downloaded model. If Ollama isn’t running, open it from Applications.

4

Connect PasteSuiteAI

Open PasteSuiteAI and create a new connection:

  1. Go to Settings > Connections
  2. Click Add Connection and select Ollama (Local)
  3. The endpoint URL and auth settings are pre-filled automatically. Just fill in:
FieldValue
Connection NameOllama Local
Model IDqwen2.5:7b (or whichever model you downloaded)
CapabilitiesEnable LLM (and Vision if using a vision model)
5

Test it

Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.
Note: PasteSuiteAI is currently Windows-only. These instructions help you set up Ollama on Linux — useful if you run Ollama on a Linux server and connect to it from a Windows machine over the network.
1

Install Ollama

Open a terminal and run the official installer:

$ curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service that starts automatically.

2

Download a model

In the same terminal, run:

$ ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

If you have an NVIDIA GPU, Ollama automatically uses CUDA acceleration. For AMD GPUs, ROCm support is available on supported distributions.
3

Verify Ollama is running

The service should start automatically. Verify:

$ ollama list

If the service isn’t running, start it manually:

$ sudo systemctl start ollama
4

Connect PasteSuiteAI

Open PasteSuiteAI and create a new connection:

  1. Go to Settings > Connections
  2. Click Add Connection and select Ollama (Local)
  3. The endpoint URL and auth settings are pre-filled automatically. Just fill in:
FieldValue
Connection NameOllama Local
Model IDqwen2.5:7b (or whichever model you downloaded)
CapabilitiesEnable LLM (and Vision if using a vision model)
5

Test it

Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.

Browse All Models

Ollama hosts hundreds of open-source models. Browse the full catalog at ollama.com/search to find the best model for your needs. You can filter by size, capability, and popularity.

Standard Hardware (8–16 GB RAM, no dedicated GPU)

Need Model Capabilities Command Download
Translation & correction (many languages) Qwen 2.5 7B LLM ollama pull qwen2.5:7b 4.7 GB
Best English quality Llama 3.1 8B LLM ollama pull llama3.1:8b 4.9 GB
Small download, 140+ languages Gemma 3 4B LLM, Vision ollama pull gemma3:4b 3.3 GB

Power User: NVIDIA RTX 4090 (24 GB VRAM)

With a high-end GPU like the RTX 4090, you can run much larger models that deliver quality approaching cloud AI. These models run entirely in GPU memory for fast inference:

# Model Capabilities Command Size Why
1 Qwen 2.5 32B LLM ollama pull qwen2.5:32b 20 GB Battle-tested multilingual champion. Near cloud-level for translation and correction.
2 GPT-OSS 20B LLM ollama pull gpt-oss:20b 14 GB OpenAI’s first open model. Strong at English and reasoning tasks.
3 Gemma 3 27B LLM, Vision ollama pull gemma3:27b 17 GB Google model, 140+ languages.
4 Qwen3.5 27B LLM, Vision ollama pull qwen3.5:27b 17 GB Multimodal with optional thinking mode. Strong multilingual, 256K context.
5 Qwen3.5 35B-A3B LLM, Vision ollama pull qwen3.5:35b-a3b 24 GB MoE: 35B total, only 3B active per token — fast inference. Multimodal.
Rule of thumb: Bigger models = better results. A 27B+ model running locally on a good GPU will give you noticeably better corrections and translations than a 7B model. If your GPU has the VRAM for it, always pick the larger model.
Don’t see a model that fits? Browse the complete catalog at ollama.com/search — hundreds of models available, new ones added every week. Filter by size to find what fits your hardware.

LM Studio (Alternative)

If you prefer a graphical interface over the terminal, LM Studio is an excellent alternative. It provides a desktop app where you can browse, download, and run models with a few clicks.

  1. Download LM Studio from lmstudio.ai and install it
  2. Search for a model (e.g., “Qwen 2.5 7B”) and click Download
  3. Go to the Local Server tab and click Start Server
  4. In PasteSuiteAI, create a LM Studio (Local) connection — the endpoint URL is pre-filled automatically
  5. Enter your model name as Model ID
LM Studio’s default port is 1234, while Ollama uses 11434. Make sure the endpoint URL matches the tool you are using.

Troubleshooting

Problem Solution
Connection test fails with “connection refused” Ollama is not running. Start it from the Start menu (Windows), Applications (macOS), or sudo systemctl start ollama (Linux).
Very slow responses Your model may be too large for your RAM and is swapping to disk. Try a smaller model like qwen3:4b.
Model not found error The Model ID in PasteSuiteAI must match the exact name shown by ollama list, including the tag (e.g., qwen2.5:7b, not just qwen2.5).
Poor translation or correction quality This is a known limitation of small local models. Try a larger model (14B, 32B) if your hardware allows it, or use a cloud connection for tasks that require high accuracy.
Ollama uses too much memory Ollama keeps the model in memory for fast responses. If you need to free RAM, run ollama stop <model> to unload it.

Related Topics