Local AI Models

Run AI text actions privately on your own machine — no cloud, no API key, no cost

Why not built-in?

We would love to offer a one-click “Download & Run” experience for local AI models — and we will, as soon as the quality is there. Today’s small local models (2–7 billion parameters) are not yet good enough to meet our quality standard:

Task	Cloud AI (GPT-4, Claude)	Local 7B	Local 27–35B (24+ GB VRAM or 32 GB RAM; RTX 5090 recommended)
Grammar correction (English, German, French)	Excellent	Decent	Good
Grammar correction (smaller languages)	Very good	Unreliable	Acceptable
Translation (common pairs like EN↔DE)	Excellent	Acceptable	Good
Translation (rare pairs like FI↔KO)	Good	Often wrong	Acceptable
Rewriting & summarization	Excellent	Basic	Good
Complex instructions	Excellent	Struggles	Decent

We refuse to ship a built-in experience that gives you mediocre results. When local models reach cloud-level quality, we will add one-click support. Until then, you can set up local AI yourself in about five minutes using Ollama — a free, open-source tool that runs AI models on your machine.

Good news: Local speech-to-text (Whisper) is a different story — it already matches cloud quality. PasteSuiteAI includes built-in one-click Whisper support for local transcription. See Speech to Text for setup details.

When local AI makes sense

Privacy — Your text never leaves your machine. No cloud contact for anything you process.
Offline use — Works without an internet connection for AI. (A Pro license still re-checks online about once a month to stay valid; no usage or content is sent.)
No API costs — No API key needed, no per-request charges.
Experimentation — Try different open-source models and compare results.

Hardware requirements: Local AI models need significant RAM. A 7B model needs at least 8 GB of RAM (16 GB recommended). A dedicated GPU speeds things up considerably but is not strictly required — CPU-only inference works, just slower.

Setup Guide: Ollama + PasteSuiteAI

This guide takes you from zero to a working local AI connection in about five minutes. Pick your operating system below:

Download Ollama

Visit ollama.com/download/windows and download the installer. Run it — no configuration needed, just click through.

Download a model

Open a terminal (press Win+R, type cmd, press Enter) and run:

> ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

Verify Ollama is running

Ollama runs as a background service automatically after installation. Verify it by running:

> ollama list

You should see the model you just downloaded in the list. If you get an error, start Ollama from the Start menu.

Connect PasteSuiteAI

Now open PasteSuiteAI and create a new connection:

Go to Settings > Connections
Click Add Connection and select Ollama (Local)
The endpoint URL and auth settings are pre-filled automatically. Just fill in:

Field	Value
Connection Name	Ollama Local
Model ID	`qwen2.5:7b` (or whichever model you downloaded)
Capabilities	Enable LLM (and Vision if using a vision model)

Test it

Click Test Connection in the connection editor. You should see a success message. If the test works, click Done — you can now use local AI with any action in PasteSuiteAI.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.

Note: PasteSuiteAI is currently Windows-only. These instructions help you set up Ollama on macOS — useful if you run Ollama on a Mac and connect to it from a Windows machine over the network.

Install Ollama

Visit ollama.com/download/mac and download the app. Drag it to your Applications folder and open it once to complete setup.

Alternatively, install via Homebrew:

$ brew install ollama

Download a model

Open Terminal and run:

$ ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

Apple Silicon Macs (M1/M2/M3/M4) run local models very well thanks to their unified memory architecture. Even a MacBook Air with 16 GB RAM handles 7B models comfortably.

Verify Ollama is running

Ollama runs as a background service after installation. Verify:

$ ollama list

You should see your downloaded model. If Ollama isn’t running, open it from Applications.

Connect PasteSuiteAI

Open PasteSuiteAI and create a new connection:

Go to Settings > Connections
Click Add Connection and select Ollama (Local)
The endpoint URL and auth settings are pre-filled automatically. Just fill in:

Field	Value
Connection Name	Ollama Local
Model ID	`qwen2.5:7b` (or whichever model you downloaded)
Capabilities	Enable LLM (and Vision if using a vision model)

Test it

Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.

Note: PasteSuiteAI is currently Windows-only. These instructions help you set up Ollama on Linux — useful if you run Ollama on a Linux server and connect to it from a Windows machine over the network.

Install Ollama

Open a terminal and run the official installer:

$ curl -fsSL https://ollama.com/install.sh | sh

This installs Ollama and sets it up as a systemd service that starts automatically.

Download a model

In the same terminal, run:

$ ollama pull qwen2.5:7b

This downloads a multilingual 7B model (~4.7 GB). It only needs to download once. See Recommended Models below for alternative choices based on your hardware.

If you have an NVIDIA GPU, Ollama automatically uses CUDA acceleration. For AMD GPUs, ROCm support is available on supported distributions.

Verify Ollama is running

The service should start automatically. Verify:

$ ollama list

If the service isn’t running, start it manually:

$ sudo systemctl start ollama

Connect PasteSuiteAI

Open PasteSuiteAI and create a new connection:

Go to Settings > Connections
Click Add Connection and select Ollama (Local)
The endpoint URL and auth settings are pre-filled automatically. Just fill in:

Field	Value
Connection Name	Ollama Local
Model ID	`qwen2.5:7b` (or whichever model you downloaded)
Capabilities	Enable LLM (and Vision if using a vision model)

Test it

Click Test Connection in the connection editor. You should see a success message. Click Done — local AI is now ready to use with any action.

If you want Ollama to be your default for all AI actions, click the LLM star (★) on the new connection to make it the default LLM provider.

Browse All Models

Ollama hosts hundreds of open-source models. Browse the full catalog at ollama.com/search to find the best model for your needs. You can filter by size, capability, and popularity.

Recommended Models

Standard Hardware (8–16 GB RAM, no dedicated GPU)

Need	Model	Capabilities	Command	Download
Translation & correction (many languages)	Qwen 2.5 7B	LLM	`ollama pull qwen2.5:7b`	4.7 GB
Best English quality	Llama 3.1 8B	LLM	`ollama pull llama3.1:8b`	4.9 GB
Small download, 140+ languages	Gemma 3 4B	LLM, Vision	`ollama pull gemma3:4b`	3.3 GB

Power User: NVIDIA RTX 4090 (24 GB VRAM)

With a high-end GPU like the RTX 4090, you can run much larger models that deliver quality approaching cloud AI. These models run entirely in GPU memory for fast inference:

#	Model	Capabilities	Command	Size	Why
1	Qwen 2.5 32B	LLM	`ollama pull qwen2.5:32b`	20 GB	Battle-tested multilingual champion. Near cloud-level for translation and correction.
2	GPT-OSS 20B	LLM	`ollama pull gpt-oss:20b`	14 GB	OpenAI’s first open model. Strong at English and reasoning tasks.
3	Gemma 3 27B	LLM, Vision	`ollama pull gemma3:27b`	17 GB	Google model, 140+ languages.
4	Qwen3.5 27B	LLM, Vision	`ollama pull qwen3.5:27b`	17 GB	Multimodal with optional thinking mode. Strong multilingual, 256K context.
5	Qwen3.5 35B-A3B	LLM, Vision	`ollama pull qwen3.5:35b-a3b`	24 GB	MoE: 35B total, only 3B active per token — fast inference. Multimodal.

Rule of thumb: Bigger models = better results. A 27B+ model running locally on a good GPU will give you noticeably better corrections and translations than a 7B model. If your GPU has the VRAM for it, always pick the larger model.

Don’t see a model that fits? Browse the complete catalog at ollama.com/search — hundreds of models available, new ones added every week. Filter by size to find what fits your hardware.

LM Studio (Alternative)

If you prefer a graphical interface over the terminal, LM Studio is an excellent alternative. It provides a desktop app where you can browse, download, and run models with a few clicks.

Download LM Studio from lmstudio.ai and install it
Search for a model (e.g., “Qwen 2.5 7B”) and click Download
Go to the Local Server tab and click Start Server
In PasteSuiteAI, create a LM Studio (Local) connection — the endpoint URL is pre-filled automatically
Enter your model name as Model ID

LM Studio’s default port is 1234, while Ollama uses 11434. Make sure the endpoint URL matches the tool you are using.

Troubleshooting

Problem	Solution
Connection test fails with “connection refused”	Ollama is not running. Start it from the Start menu (Windows), Applications (macOS), or `sudo systemctl start ollama` (Linux).
Very slow responses	Your model may be too large for your RAM and is swapping to disk. Try a smaller model like `qwen3:4b`.
Model not found error	The Model ID in PasteSuiteAI must match the exact name shown by `ollama list`, including the tag (e.g., `qwen2.5:7b`, not just `qwen2.5`).
Poor translation or correction quality	This is a known limitation of small local models. Try a larger model (14B, 32B) if your hardware allows it, or use a cloud connection for tasks that require high accuracy.
Ollama uses too much memory	Ollama keeps the model in memory for fast responses. If you need to free RAM, run `ollama stop <model>` to unload it.

Local AI Models

Why not built-in?

When local AI makes sense

Setup Guide: Ollama + PasteSuiteAI

Download Ollama

Download a model

Verify Ollama is running

Connect PasteSuiteAI

Test it

Install Ollama

Download a model

Verify Ollama is running

Connect PasteSuiteAI

Test it

Install Ollama

Download a model

Verify Ollama is running

Connect PasteSuiteAI

Test it

Browse All Models

Recommended Models

Standard Hardware (8–16 GB RAM, no dedicated GPU)

Power User: NVIDIA RTX 4090 (24 GB VRAM)

LM Studio (Alternative)

Troubleshooting

Related Topics