Transcription API
The Transcription API allows extracting text from images/PDFs using automatic recognition (ASR/OCR). It supports sending one or more files and, when applicable, automatically merges the front/back of the same document to return a single result.
Authentication
Use the unified X-CapData-Token header. The value can be the API Key of an Owner, an Agency, or an Agent's token.
X-CapData-Token: your_api_key_or_token
Endpoint
POST /api/transcribe
Receives one or more files and returns their transcriptions. The request must be
multipart/form-data.
Parameters (multipart/form-data)
files[](required): one or more files to transcribe.language(optional): preferred language code (e.g.,es,en). If omitted, automatic detection is attempted.
Example: one file
curl -X POST "https://capdata.es/api/transcribe" \
-H "X-CapData-Token: YOUR_TOKEN" \
-F "files=@/path/to/customer_call.m4a" \
-F "language=en"
Response (200 OK)
{
"results": [
{
"filename": "customer_call.m4a",
"text": "Full transcription of the call...",
"language_detected": "en",
"duration_seconds": 184.2,
"merged": false
}
]
}
Example: front/back merge (two files)
If you upload two files representing the two sides of the same document (e.g., front/back), the system will try to merge them automatically.
curl -X POST "https://capdata.es/api/transcribe" \
-H "X-CapData-Token: YOUR_TOKEN" \
-F "files[]=@/path/to/booking_front.pdf" \
-F "files[]=@/path/to/booking_back.pdf"
Response (200 OK) with merge
{
"results": [
{
"filename": "booking",
"text": "Unified text from front and back in reading order...",
"language_detected": "es",
"merged": true,
"merged_from": ["booking_front.pdf", "booking_back.pdf"]
}
]
}
_front/_back, -front/-back, _anverso/_reverso, or -anverso/-reverso,
the service merges them into a single result. The content is concatenated in logical order (front → back).
Specialized Invoice Transcription
This service automates the reading of invoices, transforms them into JSON data, and manages the lifecycle of the document and the associated supplier.
POST /api/transcribe-invoice
Sends one or more invoice files for processing. This action consumes 1 token for each processed file; if it is a PDF, 1 token for each page.
Request Body (multipart/form-data)
The request must be of type multipart/form-data. Files must be sent under the files key.
curl -X POST "https://capdata.es/api/transcribe-invoice" \
-H "X-CapData-Token: your_api_key_or_token" \
-F "files=@/path/to/invoice_A.pdf" \
-F "files=@/path/to/invoice_B.jpg"
Successful Response (200 OK)
Returns a results array with the structured data, a base64 preview, and a secure URL for the full document view.
{
"results": [
{
"source_file": "invoice_A.pdf",
"supplier_contact_id": 42,
"supplier_invoice_id": 101,
"data": {
"invoice_number": "INV-2025-001",
"supplier_tax_id": "B12345678",
"invoice_total": 121.00,
// ...
},
"preview_image_base64": "data:application/pdf;base64,JVBERi0x...",
"full_view_url": "https://capdata.es/portal/client-slug/serve/invoice_blob/101"
}
],
"tokens_remaining": 98
}
Viewing the Original Document (2-Step Flow)
The returned full_view_url must be called from your backend (server-to-server), adding the same authentication header. Your server will receive the binary file (PDF, JPG) and must re-stream it to your end-user.
This "proxy" pattern is crucial to never expose your API Key on the client side.
Interactive Practical Example
Use this tool to test the complete workflow of the Invoice Transcription API.
Transcription Request
The following HTML and JavaScript code shows how to create a form for the user to select an invoice and how to send it securely to the CapData API.
Form HTML
Create a simple form with a file input, a button, and an area to display the results.
<!-- Container for invoice upload -->
<div>
<label for="invoiceFileInput">Select Invoice:</label>
<input type="file" id="invoiceFileInput" accept=".pdf,.jpg,.jpeg,.png">
<button id="uploadButton">Transcribe Invoice</button>
</div>
<!-- Area to display status and results -->
<div id="statusArea" style="margin-top: 15px;"></div>
<!-- Container for the JSON response (hidden by default) -->
<pre id="responseContainer" style="display:none;"><code class="language-json"></code></pre>
JavaScript (using `fetch`)
This script listens for the button click, constructs the `multipart/form-data` request, and handles the API response.
document.addEventListener("DOMContentLoaded", () => {
const fileInput = document.getElementById("invoiceFileInput");
const uploadButton = document.getElementById("uploadButton");
const statusArea = document.getElementById("statusArea");
const responseContainer = document.getElementById("responseContainer");
const jsonCodeBlock = responseContainer.querySelector("code");
// IMPORTANT: Manage your API Key securely.
// Never expose it directly in frontend code in production.
const API_KEY = "YOUR_X-CAPDATA-TOKEN_HERE";
const API_ENDPOINT = "https://capdata.es/api/transcribe-invoice";
uploadButton.addEventListener("click", async () => {
const file = fileInput.files[0];
if (!file) {
statusArea.textContent = "Please select a file first.";
return;
}
// --- 1. Prepare the request ---
const formData = new FormData();
formData.append("files", file); // The key must be "files"
// --- 2. Update UI and make the call ---
statusArea.textContent = "Transcribing, please wait...";
uploadButton.disabled = true;
responseContainer.style.display = "none";
try {
const response = await fetch(API_ENDPOINT, {
method: "POST",
headers: {
"X-CapData-Token": API_KEY
},
body: formData
});
const responseData = await response.json();
if (!response.ok) {
// If the API returns an error (4xx, 5xx), we throw it
throw new Error(responseData.error || `HTTP Error: ${response.status}`);
}
// --- 3. Process the successful response ---
statusArea.textContent = "Transcription completed successfully!";
jsonCodeBlock.textContent = JSON.stringify(responseData, null, 2);
responseContainer.style.display = "block";
// Optional: Highlight syntax if you use Prism.js
if (window.Prism) {
Prism.highlightElement(jsonCodeBlock);
}
// Here you would save `responseData.results[0].full_view_url`
// to use in Phase 2.
} catch (error) {
statusArea.textContent = `Error: ${error.message}`;
console.error("Error in transcription:", error);
} finally {
uploadButton.disabled = false;
}
});
});
Phase 2: Secure Viewing (Backend)
Clicking on the preview image simulates the next step. In a real application, your frontend does not call
the full_view_url directly. Instead, it calls an endpoint on your own backend, which
acts as a "proxy" to securely fetch the file, as shown in the following code example.
import requests
from flask import Blueprint, Response, abort
erp_bp = Blueprint("erp_routes", __name__)
# This is a placeholder function: you must implement how your ERP
# retrieves the URL and API Key saved after Phase 1.
def get_capdata_info_from_your_db(invoice_id):
# Logic to look up in your database...
# return {"full_view_url": "...", "api_key": "..."}
pass
@erp_bp.route("/view-capdata-invoice/<int:capdata_invoice_id>")
def view_capdata_invoice(capdata_invoice_id):
"""
This endpoint acts as a proxy to securely fetch a file
from CapData and serve it to the end-user.
"""
# 1. Get the URL and API Key from your database
capdata_info = get_capdata_info_from_your_db(capdata_invoice_id)
if not capdata_info:
abort(404, "Invoice information not found.")
capdata_url = capdata_info["full_view_url"]
api_key = capdata_info["api_key"]
headers = {"X-CapData-Token": api_key}
try:
# 2. Make the request to CapData in "stream" mode
response_from_capdata = requests.get(
capdata_url, headers=headers, stream=True, timeout=60
)
response_from_capdata.raise_for_status()
# 3. Stream the response (the file) back to the end-user
return Response(
response_from_capdata.iter_content(chunk_size=8192),
content_type=response_from_capdata.headers["Content-Type"]
)
except requests.exceptions.RequestException as e:
status_code = e.response.status_code if e.response is not None else 503
abort(status_code, "Could not retrieve the document from CapData.")
Formats and Limits
- Supported formats (indicative): image/PDF for OCR (
.pdf,.jpg,.jpeg,.png). If you send an unsupported format, you will get a validation error. - Maximum size: if your deployment applies limits, uploads exceeding the maximum will return a validation error (
400).
Token Consumption and Behavior
- Consumption: 1 token is consumed per processed file. When front/back merging applies, both sides count as a single logical file and therefore consume 1 token in total.
- Balance: if the operation is successful, the updated balance may be included in the response according to server configuration.
Errors and Failure Responses
402 Payment Required (insufficient tokens)
Returned when the actor does not have sufficient balance to process the requested files. The response includes the current balance and the tokens needed to complete the operation.
{
"error": "Insufficient tokens.",
"tokens_remaining": 0,
"tokens_needed": 2
}
400 Bad Request (validation)
Returned when the request fails validation (e.g., no files were sent,
the file type is invalid, the size exceeds the limit, or there are invalid combinations).
{
"error": "Validation failed.",
"details": [
{"field": "files", "message": "You must attach at least one file."},
{"field": "files[0]", "message": "Unsupported format (.exe)."},
{"field": "files[1]", "message": "Size exceeds the maximum allowed limit."}
]
}
500 Internal Server Error
Unexpected error during file processing or with an external provider. Retry after a few minutes.
{
"error": "Internal server error. Please try again later."
}
402, 400, and 500 statuses in the client.
Display clear messages and, in case of 402, offer options to reload balance or reduce the number of files.