Rosette Text Analytics

Processes unstructured text across hundreds of languages to resolve entity identities, detect languages, and score similarity between names and addresses.

Try Rosette Text Analytics in Ceven

Ask Ceven anything
Standard

Why use Ceven?

  1. AI native Rosette Text Analytics integration

    • Describe the outcome and Ceven picks the right Rosette Text Analytics calls, fills the parameters, and checks the result.
    • Structured, agent friendly tool schemas so each call runs reliably instead of by guesswork.
    • Rich coverage for reading, writing, and querying your Rosette Text Analytics data, across all 3 of its actions.
  2. Managed auth

    • Built in OAuth with automatic token refresh and rotation.
    • One place to manage, scope, and revoke Rosette Text Analytics access.
    • Per user and per environment credentials instead of shared keys.
  3. Agent optimized design

    • Actions are tuned from real success and error rates so reliability climbs over time.
    • Full execution logs so you always know what ran in Rosette Text Analytics, when, and on whose behalf.
    • The agent pauses and asks when Rosette Text Analytics is unclear instead of plowing ahead.
  4. Enterprise grade security

    • Fine grained access so you control which agents and people can reach Rosette Text Analytics.
    • Least privilege by default, read scopes first and only the writes a workflow needs.
    • A full audit trail of every Rosette Text Analytics action to support review and sign off.

Supported tools

Every action Ceven's agents can run on Rosette Text Analytics, and when to use it.

Compare name similarity
Use this when you need to determine if two entity names refer to the same person or organization across different languages or scripts.
Score address similarity
Compare two address strings or objects to get a similarity score. Use this for deduplicating customer records.
Identify language
Pull the detected language and confidence score for a given block of text to route it to the correct translation workflow.
Resolve entity
Map a raw text mention to a known entity ID in your master data management system.
Normalize address
Convert a raw address string into a structured format based on the detected region.
Transliterate text
Convert text from one script to another, such as Cyrillic to Latin, to prepare it for similarity scoring.
Extract entities
Pull names, locations, and organizations out of a block of unstructured text.
Validate language code
Check if a specific ISO language code is supported by the current Rosette engine version.
Batch process text
Send a large set of documents for language identification and entity extraction in one request.
Update similarity threshold
Adjust the confidence score required to trigger a match for specific entity types.
Search entity index
Query the indexed entities to find potential matches based on a provided name string.
Clear processing cache
Remove temporary text analysis artifacts to ensure the next run uses fresh model weights.
Address Similarity
Compares two addresses and returns a similarity score. addresses can be provided as single strings or as structured objects. the tool is optimized for english, simplified chinese, and traditional chinese addresses.

13 actions · scroll to see them all

Frequently asked questions

Rosette uses advanced transliteration and phonetic modeling to compare names across different scripts. When you compare a name in Arabic and a name in English, the agent does not just look for direct translations. It converts the sounds and characters into a common internal representation. This allows the similarity score to reflect how the names are actually pronounced and written in their respective scripts. Ceven manages this by first calling the transliteration tool and then passing the normalized strings into the similarity engine to ensure the highest possible match accuracy across the 364 supported combinations.
Yes. Every similarity call returns a score between 0 and 1. In your Ceven workflow, you can define the exact threshold that triggers a downstream action. For example, you might decide that a name similarity score of 0.9 triggers an automatic merge, while a score between 0.7 and 0.89 triggers a manual review task for a human analyst. This fine grained control allows you to balance the risk of false positives against the effort required for manual verification based on the sensitivity of your specific data set.
Ceven enables real time processing by connecting Rosette to your event streams. As text arrives from a webhook or a message queue, the agent immediately sends the payload to the Rosette API for language detection and entity extraction. Because the API calls are lightweight, the latency is minimal. You can build a workflow that flags a high risk entity in milliseconds, allowing you to block a transaction or freeze an account before the process completes. This turns a batch analytics tool into a real time defensive layer for your business.
When the language identification tool cannot determine a language with high confidence, it returns a low confidence score and often a generic label. You can configure your Ceven workflow to handle these edge cases specifically. For instance, if the confidence score is below 0.5, the agent can route the text to a human linguist for manual tagging. This ensures that your data pipeline does not fail silently and that every piece of unstructured text is eventually categorized correctly regardless of the initial model confidence.
Yes. Rosette imposes strict character limits on individual API requests to maintain performance. If you attempt to send a massive document in a single call, the API will return a 413 Request Entity Too Large error. To solve this, Ceven automatically chunks large documents into smaller, overlapping segments. The agent processes each segment individually and then aggregates the entity and similarity results at the end. This ensures you can analyze long legal contracts or detailed reports without hitting the hard limits of the underlying API.
Address similarity is optimized for the structural variations of postal data, such as abbreviations for street or road and differing regional formats. While name similarity focuses on phonetics and script variations, address similarity looks for patterns in house numbers, street names, and postal codes. The agent uses different Rosette models for each task. When you pass data to the address tool, it ignores common stop words and focuses on the unique identifiers of a location to provide a score that reflects geographic proximity and formatting differences.
Rosette Text Analytics generally operates as a processing engine rather than a storage system. When Ceven sends text to the API, the platform analyzes the string and returns the results. Depending on your specific deployment model, whether on premise or cloud, the data retention policies vary. In most standard API configurations, the text is processed in memory and not persisted long term. You should review your specific service level agreement with Babel Street to confirm the exact data handling and privacy protocols for your account.
Rosette provides a similarity score, not a definitive identity match. It tells you how likely it is that two strings refer to the same entity. It cannot know if two different people happen to have the same name unless you provide additional context. To solve this, you should use Ceven to combine Rosette similarity scores with other data points, such as date of birth or address similarity. By creating a weighted score across multiple Rosette tools, the agent can distinguish between two people with the same name living in different cities.

Alternatives to Rosette Text Analytics

Other tools that solve a similar problem. Ceven supports these too, so you can switch or run more than one at once.

Try Ceven on your stack

Plug Ceven on top of the tools you already run. Connect Rosette Text Analytics and the rest of your stack, describe the outcome, and its agents handle the work end to end, days of it in minutes.

Get started for free