Voice Capture API¶

Base path: /api/v1/capture/voice

The voice capture module accepts audio file uploads, runs asynchronous Whisper-based transcription, and saves the resulting transcript as a KnowledgeEntry. A lightweight synchronous transcription endpoint is also available for ephemeral use cases such as voice answers in interview flows.

See API Reference for auth, errors, and pagination.

POST /capture/voice/upload¶

Auth: JWT required · Plan: Any

Upload an audio file. The server validates it, saves it to storage, creates a CaptureJob, and enqueues an async Whisper transcription task. When the Celery task completes, the transcript is saved as a KnowledgeEntry with source voice_note and status needs_review.

Request — multipart/form-data

Field	Type	Required	Description
`file`	binary	Yes	Audio file. Field name must be exactly `file`.

Accepted formats

Extension	MIME types
`.wav`	`audio/wav`, `audio/x-wav`
`.mp3` / `.mpeg` / `.mpga`	`audio/mpeg`, `audio/mp3`
`.m4a`	`audio/mp4`, `audio/x-m4a`, `audio/m4a`
`.webm`	`audio/webm`, `video/webm`
`.ogg`	`audio/ogg`

Validation passes if either the file extension or the Content-Type matches an allowed value.

Constraints

Constraint	Limit
Maximum file size	25 MB
Maximum audio duration	60 minutes

Response — 201 Created

{
  "job_id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
  "status": "pending",
  "message": "Audio uploaded successfully. Transcription is in progress."
}

Field	Type	Description
`job_id`	UUID string	ID of the created `CaptureJob`. Poll with GET /jobs/{job_id}.
`status`	string	Always `pending` on creation.
`message`	string	Human-readable confirmation.

Errors

Status	Code	Cause
`400`	`BAD_REQUEST`	`file` field missing or filename is empty.
`400`	`INVALID_AUDIO_FILE`	Unsupported format or audio duration exceeds 60 minutes.
`401`	`UNAUTHORIZED`	Missing or invalid JWT.
`402`	`SUBSCRIPTION_REQUIRED`	Organisation subscription inactive or trial expired.
`413`	`AUDIO_FILE_TOO_LARGE`	File exceeds 25 MB.

curl -X POST https://api.knora.io/api/v1/capture/voice/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@/path/to/recording.mp3"

GET /capture/voice/jobs¶

Auth: JWT required · Plan: Any

Returns a paginated list of voice capture jobs belonging to the authenticated user within their organisation.

Query parameters

Parameter	Type	Default	Max	Description
`page`	integer	`1`	—	Page number (1-based).
`per_page`	integer	`20`	`100`	Results per page. Values above 100 are silently clamped.

Response — 200 OK

{
  "jobs": [
    {
      "id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
      "org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
      "type": "voice",
      "status": "completed",
      "source_filename": "meeting-notes.mp3",
      "file_size": 4194304,
      "mime_type": "audio/mpeg",
      "created_by": "ffff0000-1111-4222-8333-444455556666",
      "created_at": "2026-05-30T09:00:00Z",
      "updated_at": "2026-05-30T09:01:22Z",
      "completed_at": "2026-05-30T09:01:22Z",
      "error_message": null,
      "result_entry_id": "11112222-3333-4444-8555-666677778888",
      "metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
    }
  ],
  "total": 42,
  "page": 1,
  "per_page": 20
}

Envelope fields

Field	Type	Description
`jobs`	array	List of `CaptureJob` objects (see below).
`total`	integer	Total number of jobs across all pages.
`page`	integer	Current page as requested.
`per_page`	integer	Page size applied (clamped to 100).

CaptureJob fields

Field	Type	Nullable	Description
`id`	UUID string	No	Job UUID.
`org_id`	UUID string	No	Organisation this job belongs to.
`type`	string	No	Always `"voice"` for this module.
`status`	string	No	`pending`, `processing`, `completed`, or `failed`.
`source_filename`	string	No	Original filename as uploaded.
`file_size`	integer	No	File size in bytes.
`mime_type`	string	No	MIME type detected at upload time.
`created_by`	UUID string	No	UUID of the user who uploaded the file.
`created_at`	ISO 8601 string	No	Timestamp when the job was created.
`updated_at`	ISO 8601 string	No	Timestamp of the last status change.
`completed_at`	ISO 8601 string	Yes	Timestamp when transcription finished. `null` until completed.
`error_message`	string	Yes	Error description if status is `failed`.
`result_entry_id`	UUID string	Yes	UUID of the resulting `KnowledgeEntry`. `null` until completed.
`metadata_json`	JSON string	Yes	Serialised Whisper output: `detected_language` (ISO 639-1) and `duration_seconds`.

Job status lifecycle

pending → processing → completed
                     → failed

Errors

Status	Code	Cause
`401`	`UNAUTHORIZED`	Missing or invalid JWT.
`402`	`SUBSCRIPTION_REQUIRED`	Organisation subscription inactive or trial expired.

curl "https://api.knora.io/api/v1/capture/voice/jobs?page=1&per_page=20" \
  -H "Authorization: Bearer <token>"

GET /capture/voice/jobs/{job_id}¶

Auth: JWT required · Plan: Any

Retrieve the current status and result of a single voice capture job. Poll this endpoint after uploading to determine when transcription has finished.

Path parameters

Parameter	Type	Description
`job_id`	UUID string	The `job_id` returned by the upload endpoint.

Response — 200 OK

Returns a single CaptureJob object. See the field table in GET /jobs for field descriptions.

{
  "id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
  "org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
  "type": "voice",
  "status": "completed",
  "source_filename": "meeting-notes.mp3",
  "file_size": 4194304,
  "mime_type": "audio/mpeg",
  "created_by": "ffff0000-1111-4222-8333-444455556666",
  "created_at": "2026-05-30T09:00:00Z",
  "updated_at": "2026-05-30T09:01:22Z",
  "completed_at": "2026-05-30T09:01:22Z",
  "error_message": null,
  "result_entry_id": "11112222-3333-4444-8555-666677778888",
  "metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
}

Errors

Status	Code	Cause
`400`	`BAD_REQUEST`	`job_id` is not a valid UUID.
`401`	`UNAUTHORIZED`	Missing or invalid JWT.
`402`	`SUBSCRIPTION_REQUIRED`	Organisation subscription inactive or trial expired.
`404`	`VOICE_JOB_NOT_FOUND`	No job with this ID for this user and organisation. Cross-tenant access is silently treated as not found.

curl "https://api.knora.io/api/v1/capture/voice/jobs/d4a1c2b3-0001-4f2e-8abc-111122223333" \
  -H "Authorization: Bearer <token>"

POST /capture/voice/transcribe¶

Auth: JWT required · Plan: Any

Synchronously transcribe an audio file via Whisper and return the raw transcript text. Unlike the main upload flow, this endpoint does not create a CaptureJob or KnowledgeEntry. The temporary file is deleted from storage immediately after transcription.

Intended for ephemeral use cases such as voice answers in interview flows, where the transcript is embedded into another object rather than stored as a standalone knowledge entry.

Request — multipart/form-data

Field	Type	Required	Description
`file`	binary	Yes	Audio file. Field name must be exactly `file`.

Accepted formats and size/duration limits are identical to POST /upload.

Response — 200 OK

{
  "transcript": "Hello, this is a voice note about the Q2 planning session..."
}

Field	Type	Description
`transcript`	string	Raw text returned by Whisper. Language is auto-detected; no language metadata is included in this response.

Errors

Status	Code	Cause
`400`	`BAD_REQUEST`	`file` field missing or filename is empty.
`400`	`INVALID_AUDIO_FILE`	Unsupported format or audio exceeds 60-minute duration limit.
`401`	`UNAUTHORIZED`	Missing or invalid JWT.
`402`	`SUBSCRIPTION_REQUIRED`	Organisation subscription inactive or trial expired.
`413`	`AUDIO_FILE_TOO_LARGE`	File exceeds 25 MB.
`500`	`TRANSCRIPTION_FAILED`	Whisper API call failed after retries (e.g. `GROQ_API_KEY` not configured, or upstream service error).

curl -X POST https://api.knora.io/api/v1/capture/voice/transcribe \
  -H "Authorization: Bearer <token>" \
  -F "file=@/path/to/voice-answer.webm"

Polling pattern¶

After calling POST /upload, poll GET /jobs/{job_id} until status is completed or failed. A reasonable interval is 2–5 seconds.

POST /upload  →  job_id
  ↓
GET /jobs/{job_id}   (status: pending)
  ↓  wait 2–5s
GET /jobs/{job_id}   (status: processing)
  ↓  wait 2–5s
GET /jobs/{job_id}   (status: completed)
  ↓
result_entry_id  →  fetch KnowledgeEntry

If status is failed, the error_message field describes the reason.

Transcription backend¶

Audio is transcribed using Groq Whisper large-v3 with automatic language detection. The detected language code (ISO 639-1) and audio duration in seconds are stored in metadata_json on the CaptureJob.

Whisper language code	`KnowledgeLanguage`
`en`	`en`
`ar`	`ar`
anything else	`mixed`

The Whisper API call is retried up to 3 times with exponential backoff on transient errors (rate limits, connection errors, 5xx responses). 4xx responses from the upstream API are not retried.