Question 1

How does Ceven connect to a local Ollama instance?

Accepted Answer

Ceven connects to Ollama via the local REST API. Since Ollama typically runs on localhost port 11434, you need to ensure your network configuration allows the Ceven agent to reach that endpoint. If you are running Ollama on a separate server in your private cloud, you must set the OLLAMA_HOST environment variable to 0.0.0.0 on that machine to accept external connections. Once the network path is open, Ceven sends standard HTTP requests to the Ollama endpoints. No complex authentication is required by default, but we recommend placing a reverse proxy with basic auth in front of your Ollama instance if you are exposing it across a wider internal network.

Question 2

Can I use different models for different steps in one workflow?

Accepted Answer

Yes. You can design a workflow that uses a small, fast model like Phi 3 for initial classification and a larger model like Llama 3 for final synthesis. In the Ceven workflow builder, you simply specify the model name for each individual Ollama action. This allows you to optimize for speed and hardware resource usage. For example, the agent can use a lightweight model to determine if an incoming email is a complaint, and only trigger the heavy reasoning model if a complex resolution is required. This approach prevents your GPU memory from being choked by oversized models for simple tasks.

Question 3

Does Ollama have rate limits like OpenAI?

Accepted Answer

Ollama does not have artificial rate limits or token quotas because it runs on your own hardware. However, you will encounter physical hardware limits. If you send too many concurrent requests, you will notice a significant increase in time to first token as the Ollama server queues requests. A specific quirk of Ollama is how it manages model loading in VRAM. If you switch between many different models in a single workflow, you may experience a delay while Ollama offloads the current model and loads the next one into your GPU memory. This can lead to temporary timeouts if your hardware is slow.

Question 4

Is my data sent to the cloud when using Ollama?

Accepted Answer

No. When you use the Ollama integration, the prompt and the resulting completion stay between the Ceven agent and your local Ollama server. The primary benefit of this integration is data residency. Unlike cloud LLMs, your inputs are not used to train future versions of the model, and no data is logged by a third party provider. This makes it the ideal choice for processing legal documents, medical records, or trade secrets. You maintain full control over the model weights and the execution environment, ensuring that your intellectual property never leaves your controlled infrastructure.

Question 5

What happens if the Ollama server is offline?

Accepted Answer

If the Ollama server is unreachable, the Ceven workflow will trigger an error state for that specific action. You can configure retry logic within the workflow to attempt the call again after a short delay, or you can set up a fallback path. For instance, if your local Ollama instance is down, you can route the request to a backup local server or alert an administrator via Slack. Because the connection is direct, Ceven provides real time feedback on whether the local API is responding or if the request timed out due to a hardware crash or network interruption.

Question 6

How do I add new models to Ollama for Ceven to use?

Accepted Answer

You add models using the Ollama command line interface on the machine where Ollama is installed. By running the ollama pull command followed by the model name, you download the weights to your local library. Once the download is complete, the model becomes immediately available to Ceven. You can verify the model is ready by using the List Models action within Ceven. If you have created a custom Modelfile with a specific system prompt or temperature setting, that custom model will also appear in the list and can be selected for any text generation or chat action in your workflows.

Question 7

Can I use OpenAI compatible tools with Ollama?

Accepted Answer

Yes. Ollama provides an OpenAI compatible API layer that allows you to use tools designed for GPT models with your local models. Ceven leverages this by providing specific OpenAI compatible actions. This is particularly useful if you have existing prompt templates or application logic that expects the OpenAI JSON format. You can simply point those requests to your local Ollama endpoint. The agent handles the mapping of the chat completion request to the local model, allowing you to benefit from the ecosystem of OpenAI tools while keeping the actual execution and data processing entirely local on your own hardware.

Ollama

Try Ollama in Ceven

Why use Ceven?

AI native Ollama integration

Managed auth

Agent optimized design

Enterprise grade security

Supported tools

Frequently asked questions

Related integrations

Alternatives to Ollama

Try Ceven on your stack