Ollama

Connects your local LLM instances to your SaaS stack to process sensitive data without it leaving your infrastructure, automates text generation, and manages your local model library.

Try Ollama in Ceven

Ask Ceven anything
Standard

Why use Ceven?

  1. AI native Ollama integration

    • Describe the outcome and Ceven picks the right Ollama calls, fills the parameters, and checks the result.
    • Structured, agent friendly tool schemas so each call runs reliably instead of by guesswork.
    • Rich coverage for reading, writing, and querying your Ollama data, across all 8 of its actions.
  2. Managed auth

    • Built in OAuth with automatic token refresh and rotation.
    • One place to manage, scope, and revoke Ollama access.
    • Per user and per environment credentials instead of shared keys.
  3. Agent optimized design

    • Actions are tuned from real success and error rates so reliability climbs over time.
    • Full execution logs so you always know what ran in Ollama, when, and on whose behalf.
    • The agent pauses and asks when Ollama is unclear instead of plowing ahead.
  4. Enterprise grade security

    • Fine grained access so you control which agents and people can reach Ollama.
    • Least privilege by default, read scopes first and only the writes a workflow needs.
    • A full audit trail of every Ollama action to support review and sign off.

Supported tools

Every action Ceven's agents can run on Ollama, and when to use it.

Chat with model
Use this when you need a multi turn conversation with an LLM. Pass the conversation history to maintain context across turns.
Generate text
Use this for single prompt responses. Set raw to true to bypass prompt templates for custom processing.
List models
Pull all installed models including their size, last modified date, and digest information.
OpenAI chat completion
Use this to get conversational responses using the OpenAI API format for compatibility with existing prompt libraries.
OpenAI text completion
Use this for non chat text generation following the OpenAI API specification.
OpenAI list models
Retrieve the list of available local models formatted for OpenAI compatible clients.
Show model info
Pull detailed metadata for a specific model including the system prompt, parameters, and license.
Get version
Check the current version of the Ollama server to ensure compatibility with new model features.
Chat with Ollama model
Tool to send a chat message with conversation history to Ollama. Use when you need to have a multi turn conversation with an LLM model.
Generate Text with Ollama
Tool to generate text responses from Ollama models with optional raw mode. Use raw=true to bypass prompt templating when you need full control over the prompt for debugging or custom processing. Note that raw mode will not return a context.
OpenAI Compatible Chat Completion
Tool to create OpenAI compatible chat completions using Ollama models. Use when you need conversational AI responses with OpenAI API format compatibility.
OpenAI Compatible Text Completion
Tool to create OpenAI compatible text completions using Ollama models. Use when you need text generation with OpenAI API format compatibility beyond chat based interactions.
List Models (OpenAI Compatible)
Tool to list available models using OpenAI compatible API format. Use when you need to retrieve locally available Ollama models with metadata following OpenAI's model list format.
Show Model Information
Tool to show comprehensive information about an Ollama model. Use when you need to retrieve model details, parameters, template, license, or system prompt.
Get Ollama Version
Tool to get the version of Ollama running locally. Use to check which version of Ollama is currently installed.

15 actions · scroll to see them all

Frequently asked questions

Ceven connects to Ollama via the local REST API. Since Ollama typically runs on localhost port 11434, you need to ensure your network configuration allows the Ceven agent to reach that endpoint. If you are running Ollama on a separate server in your private cloud, you must set the OLLAMA_HOST environment variable to 0.0.0.0 on that machine to accept external connections. Once the network path is open, Ceven sends standard HTTP requests to the Ollama endpoints. No complex authentication is required by default, but we recommend placing a reverse proxy with basic auth in front of your Ollama instance if you are exposing it across a wider internal network.
Yes. You can design a workflow that uses a small, fast model like Phi 3 for initial classification and a larger model like Llama 3 for final synthesis. In the Ceven workflow builder, you simply specify the model name for each individual Ollama action. This allows you to optimize for speed and hardware resource usage. For example, the agent can use a lightweight model to determine if an incoming email is a complaint, and only trigger the heavy reasoning model if a complex resolution is required. This approach prevents your GPU memory from being choked by oversized models for simple tasks.
Ollama does not have artificial rate limits or token quotas because it runs on your own hardware. However, you will encounter physical hardware limits. If you send too many concurrent requests, you will notice a significant increase in time to first token as the Ollama server queues requests. A specific quirk of Ollama is how it manages model loading in VRAM. If you switch between many different models in a single workflow, you may experience a delay while Ollama offloads the current model and loads the next one into your GPU memory. This can lead to temporary timeouts if your hardware is slow.
No. When you use the Ollama integration, the prompt and the resulting completion stay between the Ceven agent and your local Ollama server. The primary benefit of this integration is data residency. Unlike cloud LLMs, your inputs are not used to train future versions of the model, and no data is logged by a third party provider. This makes it the ideal choice for processing legal documents, medical records, or trade secrets. You maintain full control over the model weights and the execution environment, ensuring that your intellectual property never leaves your controlled infrastructure.
If the Ollama server is unreachable, the Ceven workflow will trigger an error state for that specific action. You can configure retry logic within the workflow to attempt the call again after a short delay, or you can set up a fallback path. For instance, if your local Ollama instance is down, you can route the request to a backup local server or alert an administrator via Slack. Because the connection is direct, Ceven provides real time feedback on whether the local API is responding or if the request timed out due to a hardware crash or network interruption.
You add models using the Ollama command line interface on the machine where Ollama is installed. By running the ollama pull command followed by the model name, you download the weights to your local library. Once the download is complete, the model becomes immediately available to Ceven. You can verify the model is ready by using the List Models action within Ceven. If you have created a custom Modelfile with a specific system prompt or temperature setting, that custom model will also appear in the list and can be selected for any text generation or chat action in your workflows.
Yes. Ollama provides an OpenAI compatible API layer that allows you to use tools designed for GPT models with your local models. Ceven leverages this by providing specific OpenAI compatible actions. This is particularly useful if you have existing prompt templates or application logic that expects the OpenAI JSON format. You can simply point those requests to your local Ollama endpoint. The agent handles the mapping of the chat completion request to the local model, allowing you to benefit from the ecosystem of OpenAI tools while keeping the actual execution and data processing entirely local on your own hardware.

Alternatives to Ollama

Other tools that solve a similar problem. Ceven supports these too, so you can switch or run more than one at once.

LM Studio logoLM StudioLocalAI logoLocalAIvLLM logovLLM

Try Ceven on your stack

Plug Ceven on top of the tools you already run. Connect Ollama and the rest of your stack, describe the outcome, and its agents handle the work end to end, days of it in minutes.

Get started for free