Honeyhive

Streams model inputs and outputs into your evaluation pipelines, automates the creation of gold datasets from production logs, and triggers evaluation runs when performance metrics dip.

Try Honeyhive in Ceven

Ask Ceven anything
Standard

Why use Ceven?

  1. AI native Honeyhive integration

    • Describe the outcome and Ceven picks the right Honeyhive calls, fills the parameters, and checks the result.
    • Structured, agent friendly tool schemas so each call runs reliably instead of by guesswork.
    • Rich coverage for reading, writing, and querying your Honeyhive data, across all 42 of its actions.
  2. Managed auth

    • Built in OAuth with automatic token refresh and rotation.
    • One place to manage, scope, and revoke Honeyhive access.
    • Per user and per environment credentials instead of shared keys.
  3. Agent optimized design

    • Actions are tuned from real success and error rates so reliability climbs over time.
    • Full execution logs so you always know what ran in Honeyhive, when, and on whose behalf.
    • The agent pauses and asks when Honeyhive is unclear instead of plowing ahead.
  4. Enterprise grade security

    • Fine grained access so you control which agents and people can reach Honeyhive.
    • Least privilege by default, read scopes first and only the writes a workflow needs.
    • A full audit trail of every Honeyhive action to support review and sign off.

Supported tools

Every action Ceven's agents can run on Honeyhive, and when to use it.

Add datapoints to dataset
Use this when you need to append multiple entries with specified input, ground truth, and history mappings to an existing set.
Create batch model events
Use this when you need to log a batch of model interactions to HoneyHive in one request to save on API overhead.
Create batch tool events
Use this to record multiple external API calls as tool events after gathering all event data.
Create dataset
Use this when you need to initialize a new dataset within a project for a new evaluation cycle.
Create tool
Use this when you need to register a new function or plugin for invocation tracking.
Delete datapoint
Use this when you need to remove a specific datapoint from HoneyHive after confirming its identifier.
Get datasets
Pull a list of datasets for a specific project with optional filters to find the right test set.
Get metrics
Retrieve all metrics for a specific project after obtaining the project context.
Retrieve events
Pull events based on filter criteria, date range, and pagination for analysis or export.
Retrieve experiment result
Pull the status, metrics, and datapoint level details of a completed experiment run.
Start evaluation run
Use this to initiate an evaluation run using external datasets and linked events.
Start session
Use this to initiate a new tracking session and retrieve a session id for event grouping.
Delete Dataset
Tool to delete a dataset by ID. Use when you need to remove a dataset after confirming its ID.
End Evaluation Run
Tool to mark an evaluation run as completed. Use after finishing manual evaluations to update the run status to completed.
Get Configurations
Tool to retrieve a list of configurations. Use when you need to fetch all configurations for a specific project before making changes.
Get Projects
Tool to retrieve projects. Use when you need to list all available projects.
List Tools
Tool to list all available Honeyhive tools. Use when you need to discover which functions or plugins are registered for use.
Retrieve Datapoint
Tool to retrieve a specific datapoint by its ID. Use when you have a datapoint ID and need its full details.
Retrieve Datapoints
Tool to retrieve a list of datapoints. Use when you need to fetch datapoints for a project with optional filters.
Update Datapoint
Tool to update a specific datapoint. Use when you need to modify fields of an existing datapoint.
Update Dataset
Tool to update an existing dataset. Use when you need to modify a dataset's details (name, description, datapoints, linked evaluations, or metadata) after confirming its ID.
Update Event
Tool to update an event. Use when updating event details by ID.
Update Metric
Tool to update an existing metric. Use when you need to modify a metric’s properties after creation. Ensure you retrieve the metric first to verify its current state.
Update Project
Tool to update a project's name or description. Use when you need to modify an existing project by its ID after creation.

24 actions · scroll to see them all

Frequently asked questions

Ceven uses a secure API key mechanism to connect to your HoneyHive workspace. You provide your project API key within the Ceven integration settings, and we store it using AES 256 encryption at rest. This key is only injected into the request header when the agent performs an action on your behalf. We never log the API key in our internal telemetry or expose it to the LLM during the prompt construction phase. You can rotate your key in the HoneyHive dashboard at any time, which will require a quick update in the Ceven settings to restore connectivity.
Yes. Ceven can be configured to pull unlabelled datapoints from a HoneyHive dataset, send them to a more capable model or a human in the loop via another tool, and then use the Update Datapoint action to write the ground truth back into HoneyHive. This allows you to build a high quality evaluation set without manual copy pasting. You can set up a workflow that triggers every time a certain number of new events are logged, ensuring your test sets grow organically as your production data evolves.
Ceven respects all HoneyHive rate limits. It is important to note that HoneyHive applies different rate limits based on your plan tier, particularly regarding the frequency of batch event uploads. If Ceven hits a 429 too many requests error, it implements an exponential backoff strategy to retry the request. For very high volume production environments, we recommend using the Create Batch Model Events action instead of individual event calls to maximize throughput and avoid hitting these tier gated limits prematurely during peak traffic.
Ceven uses separate actions for model and tool events to match the HoneyHive data model. Model events track the prompt and completion cycle, while tool events track the input and output of external functions called by the agent. By using both, Ceven can build a full trace of a complex request. For example, if a bot searches a database and then answers, Ceven logs one tool event for the search and one model event for the final response, linking them via the session id for a complete view.
Yes. You can build a workflow in Ceven that polls the Get Metrics action at a set interval. If the returned value for a specific metric, such as faithfulness or relevance, falls below a predefined threshold, Ceven can trigger a series of actions. This might include sending an alert to your engineering team, starting a new evaluation run to isolate the cause, or even rolling back a model version in your deployment pipeline to a known stable state.
Ceven can manage multiple HoneyHive projects by storing multiple API keys or by dynamically passing the project ID in the request. You can set up a single workflow that aggregates metrics across different projects to get a bird eye view of your AI portfolio. The agent can switch context between projects to move datapoints from a staging project to a production project once they have passed a certain quality bar in your evaluation runs.
Absolutely. You can create a maintenance workflow that uses Get Datasets to list all sets created before a certain date and then uses Delete Dataset to remove those that are no longer needed. This is useful for keeping your workspace clean and avoiding clutter in your project view. You can even add a confirmation step where the agent lists the datasets it intends to delete and waits for a human sign off before executing the deletion.
When Ceven starts an evaluation run, it monitors the status of that run via the Retrieve Experiment Result action. If the run enters a failed state, Ceven can capture the error message from HoneyHive and notify the owner. Depending on your workflow, it can attempt to restart the run or flag the specific datapoints that caused the failure for manual review, ensuring that a failed test does not silently block your deployment pipeline.

Alternatives to Honeyhive

Other tools that solve a similar problem. Ceven supports these too, so you can switch or run more than one at once.

Arize Phoenix logoArize PhoenixLangSmith logoLangSmithWeights & Biases logoWeights & Biases

Try Ceven on your stack

Plug Ceven on top of the tools you already run. Connect Honeyhive and the rest of your stack, describe the outcome, and its agents handle the work end to end, days of it in minutes.

Get started for free