Question 1

How does Ceven handle large PDF files with OCR.space?

Accepted Answer

Ceven manages large files by utilizing the asynchronous processing mode of the OCR.space API. When a multi page PDF is submitted, the agent does not wait for a synchronous response which would likely time out. Instead, it submits the file, receives a job ID, and then polls the status endpoint at regular intervals. Once the engine signals that the text extraction is complete, Ceven pulls the full JSON result and continues the workflow. This ensures that documents with dozens of pages are processed reliably without breaking the automation chain or causing a gateway timeout in the agent environment.

Question 2

What happens if the image quality is poor?

Accepted Answer

The accuracy of the extraction depends entirely on the resolution and clarity of the source image. When OCR.space returns a result, it often includes a confidence score for the detected text. Ceven can be configured to check this score. If the confidence falls below a certain threshold, the agent can trigger a human in the loop event. This means instead of passing bad data into your database, the workflow pauses and sends a notification to a user to manually verify the text. This prevents garbage data from polluting your records while still automating the clear cases.

Question 3

Are there limits to how many documents I can process?

Accepted Answer

Yes, OCR.space enforces strict rate limits depending on your API key tier. The free tier has a limited number of requests per minute and a maximum file size limit. If a Ceven workflow hits these limits, the API returns a specific error code. Ceven handles this by implementing an exponential backoff strategy, meaning the agent will wait a few seconds before retrying the request. For high volume enterprise users, we recommend using a paid OCR.space key to avoid these throttles and to unlock larger file size uploads and faster processing speeds.

Question 4

Can Ceven extract data from handwritten notes?

Accepted Answer

OCR.space is primarily optimized for printed text. While it can attempt to read handwriting, the accuracy is significantly lower than it is for typed documents. If your workflow requires heavy handwriting recognition, you may see more errors in the extracted text. To mitigate this, we recommend using the agent to flag documents that contain non standard characters or low confidence scores. The agent can then route these specific files to a manual review queue while the printed invoices and forms continue to flow through the automated pipeline without any human intervention.

Question 5

Does OCR.space support languages other than English?

Accepted Answer

Yes, the service supports a wide array of languages. When Ceven calls the OCR.space API, it can specify the language code in the request. If you are processing documents from global vendors, you can build a workflow that first detects the language of the document or uses a predefined mapping based on the sender email. Once the language is identified, the agent tells the engine which character set to use, which greatly improves the accuracy of the extraction for non Latin scripts or accented characters common in European and Asian languages.

Question 6

How is the data secured during the OCR process?

Accepted Answer

When Ceven sends a document to OCR.space, the file is transmitted over an encrypted HTTPS connection. The service processes the image and returns the text in JSON format. You can configure the API request to tell OCR.space not to store the image on their servers after the processing is complete. By enabling the no store option, the file exists only in memory during the extraction process and is deleted immediately after the response is sent back to Ceven. This is critical for users handling sensitive financial or personal identity documents.

Question 7

Can I use OCR.space to read tables and grids?

Accepted Answer

The engine can extract text from tables, but it returns the data as a stream of text or with positional coordinates rather than a perfect spreadsheet. Ceven solves this by taking the raw coordinate data and using a large language model to reconstruct the table. The agent looks at the x and y positions of the words to determine which pieces of text belong in the same row or column. This allows you to turn a picture of a table into a structured CSV or a database entry, though extremely complex layouts may still require manual adjustment.

Question 8

What file formats are supported for extraction?

Accepted Answer

OCR.space supports the most common image and document formats including JPG, PNG, TIFF, and PDF. Ceven can handle these files whether they are uploaded directly as binary data or provided as a public URL. If you are using a cloud storage provider like S3 or Google Drive, the agent can pass the direct link to the API. One quirk to note is that password protected PDFs cannot be processed by the API; the workflow will fail if the file is encrypted, so you must ensure documents are decrypted before they reach the OCR stage.

OCR.space

Try OCR.space in Ceven

Why use Ceven?

AI native OCR.space integration

Managed auth

Agent optimized design

Enterprise grade security

Supported tools

Frequently asked questions

Related integrations

Alternatives to OCR.space

Try Ceven on your stack