Transcription API

The Transcription API allows extracting text from images/PDFs using automatic recognition (ASR/OCR). It supports sending one or more files and, when applicable, automatically merges the front/back of the same document to return a single result.

Authentication

Use the unified X-CapData-Token header. The value can be the API Key of an Owner, an Agency, or an Agent's token.

Header Example
X-CapData-Token: your_api_key_or_token

Endpoint

POST /api/transcribe

Receives one or more files and returns their transcriptions. The request must be multipart/form-data.

Parameters (multipart/form-data)

Example: one file

cURL (multipart/form-data)
curl -X POST "https://capdata.es/api/transcribe" \
  -H "X-CapData-Token: YOUR_TOKEN" \
  -F "files=@/path/to/customer_call.m4a" \
  -F "language=en"

Response (200 OK)

application/json
{
  "results": [
    {
      "filename": "customer_call.m4a",
      "text": "Full transcription of the call...",
      "language_detected": "en",
      "duration_seconds": 184.2,
      "merged": false
    }
  ]
}

Example: front/back merge (two files)

If you upload two files representing the two sides of the same document (e.g., front/back), the system will try to merge them automatically.

cURL (two files: front and back)
curl -X POST "https://capdata.es/api/transcribe" \
  -H "X-CapData-Token: YOUR_TOKEN" \
  -F "files[]=@/path/to/booking_front.pdf" \
  -F "files[]=@/path/to/booking_back.pdf"

Response (200 OK) with merge

application/json
{
  "results": [
    {
      "filename": "booking",
      "text": "Unified text from front and back in reading order...",
      "language_detected": "es",
      "merged": true,
      "merged_from": ["booking_front.pdf", "booking_back.pdf"]
    }
  ]
}
How we detect front/back: when there are two files whose base name matches and contain suffixes like _front/_back, -front/-back, _anverso/_reverso, or -anverso/-reverso, the service merges them into a single result. The content is concatenated in logical order (front → back).

Specialized Invoice Transcription

This service automates the reading of invoices, transforms them into JSON data, and manages the lifecycle of the document and the associated supplier.

POST /api/transcribe-invoice

Sends one or more invoice files for processing. This action consumes 1 token for each processed file; if it is a PDF, 1 token for each page.

Request Body (multipart/form-data)

The request must be of type multipart/form-data. Files must be sent under the files key.

Example (cURL)
curl -X POST "https://capdata.es/api/transcribe-invoice" \
     -H "X-CapData-Token: your_api_key_or_token" \
     -F "files=@/path/to/invoice_A.pdf" \
     -F "files=@/path/to/invoice_B.jpg"

Successful Response (200 OK)

Returns a results array with the structured data, a base64 preview, and a secure URL for the full document view.

application/json
{
    "results": [
        {
            "source_file": "invoice_A.pdf",
            "supplier_contact_id": 42,
            "supplier_invoice_id": 101,
            "data": {
                "invoice_number": "INV-2025-001",
                "supplier_tax_id": "B12345678",
                "invoice_total": 121.00,
                // ...
            },
            "preview_image_base64": "data:application/pdf;base64,JVBERi0x...",
            "full_view_url": "https://capdata.es/portal/client-slug/serve/invoice_blob/101"
        }
    ],
    "tokens_remaining": 98
}

Viewing the Original Document (2-Step Flow)

The returned full_view_url must be called from your backend (server-to-server), adding the same authentication header. Your server will receive the binary file (PDF, JPG) and must re-stream it to your end-user.

This "proxy" pattern is crucial to never expose your API Key on the client side.


Interactive Practical Example

Use this tool to test the complete workflow of the Invoice Transcription API.

Transcription Request

The following HTML and JavaScript code shows how to create a form for the user to select an invoice and how to send it securely to the CapData API.

Form HTML

Create a simple form with a file input, a button, and an area to display the results.

HTML
<!-- Container for invoice upload -->
<div>
    <label for="invoiceFileInput">Select Invoice:</label>
    <input type="file" id="invoiceFileInput" accept=".pdf,.jpg,.jpeg,.png">
    <button id="uploadButton">Transcribe Invoice</button>
</div>

<!-- Area to display status and results -->
<div id="statusArea" style="margin-top: 15px;"></div>

<!-- Container for the JSON response (hidden by default) -->
<pre id="responseContainer" style="display:none;"><code class="language-json"></code></pre>

JavaScript (using `fetch`)

This script listens for the button click, constructs the `multipart/form-data` request, and handles the API response.

JavaScript
document.addEventListener("DOMContentLoaded", () => {
    const fileInput = document.getElementById("invoiceFileInput");
    const uploadButton = document.getElementById("uploadButton");
    const statusArea = document.getElementById("statusArea");
    const responseContainer = document.getElementById("responseContainer");
    const jsonCodeBlock = responseContainer.querySelector("code");

    // IMPORTANT: Manage your API Key securely.
    // Never expose it directly in frontend code in production.
    const API_KEY = "YOUR_X-CAPDATA-TOKEN_HERE";
    const API_ENDPOINT = "https://capdata.es/api/transcribe-invoice";

    uploadButton.addEventListener("click", async () => {
        const file = fileInput.files[0];

        if (!file) {
            statusArea.textContent = "Please select a file first.";
            return;
        }

        // --- 1. Prepare the request ---
        const formData = new FormData();
        formData.append("files", file); // The key must be "files"

        // --- 2. Update UI and make the call ---
        statusArea.textContent = "Transcribing, please wait...";
        uploadButton.disabled = true;
        responseContainer.style.display = "none";

        try {
            const response = await fetch(API_ENDPOINT, {
                method: "POST",
                headers: {
                    "X-CapData-Token": API_KEY
                },
                body: formData
            });

            const responseData = await response.json();

            if (!response.ok) {
                // If the API returns an error (4xx, 5xx), we throw it
                throw new Error(responseData.error || `HTTP Error: ${response.status}`);
            }

            // --- 3. Process the successful response ---
            statusArea.textContent = "Transcription completed successfully!";
            jsonCodeBlock.textContent = JSON.stringify(responseData, null, 2);
            responseContainer.style.display = "block";

            // Optional: Highlight syntax if you use Prism.js
            if (window.Prism) {
                Prism.highlightElement(jsonCodeBlock);
            }
            
            // Here you would save `responseData.results[0].full_view_url`
            // to use in Phase 2.

        } catch (error) {
            statusArea.textContent = `Error: ${error.message}`;
            console.error("Error in transcription:", error);
        } finally {
            uploadButton.disabled = false;
        }
    });
});

Phase 2: Secure Viewing (Backend)

Clicking on the preview image simulates the next step. In a real application, your frontend does not call the full_view_url directly. Instead, it calls an endpoint on your own backend, which acts as a "proxy" to securely fetch the file, as shown in the following code example.

Python (Flask) - Backend Proxy Example
import requests
from flask import Blueprint, Response, abort

erp_bp = Blueprint("erp_routes", __name__)

# This is a placeholder function: you must implement how your ERP
# retrieves the URL and API Key saved after Phase 1.
def get_capdata_info_from_your_db(invoice_id):
    # Logic to look up in your database...
    # return {"full_view_url": "...", "api_key": "..."}
    pass

@erp_bp.route("/view-capdata-invoice/<int:capdata_invoice_id>")
def view_capdata_invoice(capdata_invoice_id):
    """
    This endpoint acts as a proxy to securely fetch a file
    from CapData and serve it to the end-user.
    """
    # 1. Get the URL and API Key from your database
    capdata_info = get_capdata_info_from_your_db(capdata_invoice_id)
    if not capdata_info:
        abort(404, "Invoice information not found.")

    capdata_url = capdata_info["full_view_url"]
    api_key = capdata_info["api_key"]

    headers = {"X-CapData-Token": api_key}

    try:
        # 2. Make the request to CapData in "stream" mode
        response_from_capdata = requests.get(
            capdata_url, headers=headers, stream=True, timeout=60
        )
        response_from_capdata.raise_for_status()

        # 3. Stream the response (the file) back to the end-user
        return Response(
            response_from_capdata.iter_content(chunk_size=8192),
            content_type=response_from_capdata.headers["Content-Type"]
        )

    except requests.exceptions.RequestException as e:
        status_code = e.response.status_code if e.response is not None else 503
        abort(status_code, "Could not retrieve the document from CapData.")

Formats and Limits


Token Consumption and Behavior


Errors and Failure Responses

402 Payment Required (insufficient tokens)

Returned when the actor does not have sufficient balance to process the requested files. The response includes the current balance and the tokens needed to complete the operation.

application/json
{
  "error": "Insufficient tokens.",
  "tokens_remaining": 0,
  "tokens_needed": 2
}

400 Bad Request (validation)

Returned when the request fails validation (e.g., no files were sent, the file type is invalid, the size exceeds the limit, or there are invalid combinations).

application/json
{
  "error": "Validation failed.",
  "details": [
    {"field": "files", "message": "You must attach at least one file."},
    {"field": "files[0]", "message": "Unsupported format (.exe)."},
    {"field": "files[1]", "message": "Size exceeds the maximum allowed limit."}
  ]
}

500 Internal Server Error

Unexpected error during file processing or with an external provider. Retry after a few minutes.

application/json
{
  "error": "Internal server error. Please try again later."
}
Recommendation: explicitly handle the 402, 400, and 500 statuses in the client. Display clear messages and, in case of 402, offer options to reload balance or reduce the number of files.