Question 1

How does Rosette handle non Latin scripts?

Accepted Answer

Rosette uses advanced transliteration and phonetic modeling to compare names across different scripts. When you compare a name in Arabic and a name in English, the agent does not just look for direct translations. It converts the sounds and characters into a common internal representation. This allows the similarity score to reflect how the names are actually pronounced and written in their respective scripts. Ceven manages this by first calling the transliteration tool and then passing the normalized strings into the similarity engine to ensure the highest possible match accuracy across the 364 supported combinations.

Question 2

Can I set my own thresholds for similarity matches?

Accepted Answer

Yes. Every similarity call returns a score between 0 and 1. In your Ceven workflow, you can define the exact threshold that triggers a downstream action. For example, you might decide that a name similarity score of 0.9 triggers an automatic merge, while a score between 0.7 and 0.89 triggers a manual review task for a human analyst. This fine grained control allows you to balance the risk of false positives against the effort required for manual verification based on the sensitivity of your specific data set.

Question 3

Does Rosette support real time streaming data?

Accepted Answer

Ceven enables real time processing by connecting Rosette to your event streams. As text arrives from a webhook or a message queue, the agent immediately sends the payload to the Rosette API for language detection and entity extraction. Because the API calls are lightweight, the latency is minimal. You can build a workflow that flags a high risk entity in milliseconds, allowing you to block a transaction or freeze an account before the process completes. This turns a batch analytics tool into a real time defensive layer for your business.

Question 4

What happens if Rosette cannot identify a language?

Accepted Answer

When the language identification tool cannot determine a language with high confidence, it returns a low confidence score and often a generic label. You can configure your Ceven workflow to handle these edge cases specifically. For instance, if the confidence score is below 0.5, the agent can route the text to a human linguist for manual tagging. This ensures that your data pipeline does not fail silently and that every piece of unstructured text is eventually categorized correctly regardless of the initial model confidence.

Question 5

Are there limits to the amount of text I can process at once?

Accepted Answer

Yes. Rosette imposes strict character limits on individual API requests to maintain performance. If you attempt to send a massive document in a single call, the API will return a 413 Request Entity Too Large error. To solve this, Ceven automatically chunks large documents into smaller, overlapping segments. The agent processes each segment individually and then aggregates the entity and similarity results at the end. This ensures you can analyze long legal contracts or detailed reports without hitting the hard limits of the underlying API.

Question 6

How does address similarity differ from name similarity?

Accepted Answer

Address similarity is optimized for the structural variations of postal data, such as abbreviations for street or road and differing regional formats. While name similarity focuses on phonetics and script variations, address similarity looks for patterns in house numbers, street names, and postal codes. The agent uses different Rosette models for each task. When you pass data to the address tool, it ignores common stop words and focuses on the unique identifiers of a location to provide a score that reflects geographic proximity and formatting differences.

Question 7

Does Rosette store the text sent for analysis?

Accepted Answer

Rosette Text Analytics generally operates as a processing engine rather than a storage system. When Ceven sends text to the API, the platform analyzes the string and returns the results. Depending on your specific deployment model, whether on premise or cloud, the data retention policies vary. In most standard API configurations, the text is processed in memory and not persisted long term. You should review your specific service level agreement with Babel Street to confirm the exact data handling and privacy protocols for your account.

Question 8

Can Rosette distinguish between similar names of different people?

Accepted Answer

Rosette provides a similarity score, not a definitive identity match. It tells you how likely it is that two strings refer to the same entity. It cannot know if two different people happen to have the same name unless you provide additional context. To solve this, you should use Ceven to combine Rosette similarity scores with other data points, such as date of birth or address similarity. By creating a weighted score across multiple Rosette tools, the agent can distinguish between two people with the same name living in different cities.

Rosette Text Analytics

Try Rosette Text Analytics in Ceven

Why use Ceven?

AI native Rosette Text Analytics integration

Managed auth

Agent optimized design

Enterprise grade security

Supported tools

Frequently asked questions

Related integrations

Alternatives to Rosette Text Analytics

Try Ceven on your stack