Voice Capture API¶
Base path: /api/v1/capture/voice
The voice capture module accepts audio file uploads, runs asynchronous Whisper-based transcription, and saves the resulting transcript as a KnowledgeEntry. A lightweight synchronous transcription endpoint is also available for ephemeral use cases such as voice answers in interview flows.
See API Reference for auth, errors, and pagination.
POST /capture/voice/upload¶
Auth: JWT required · Plan: Any
Upload an audio file. The server validates it, saves it to storage, creates a CaptureJob, and enqueues an async Whisper transcription task. When the Celery task completes, the transcript is saved as a KnowledgeEntry with source voice_note and status needs_review.
Request — multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file |
binary | Yes | Audio file. Field name must be exactly file. |
Accepted formats
| Extension | MIME types |
|---|---|
.wav |
audio/wav, audio/x-wav |
.mp3 / .mpeg / .mpga |
audio/mpeg, audio/mp3 |
.m4a |
audio/mp4, audio/x-m4a, audio/m4a |
.webm |
audio/webm, video/webm |
.ogg |
audio/ogg |
Validation passes if either the file extension or the Content-Type matches an allowed value.
Constraints
| Constraint | Limit |
|---|---|
| Maximum file size | 25 MB |
| Maximum audio duration | 60 minutes |
Response — 201 Created
{
"job_id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
"status": "pending",
"message": "Audio uploaded successfully. Transcription is in progress."
}
| Field | Type | Description |
|---|---|---|
job_id |
UUID string | ID of the created CaptureJob. Poll with GET /jobs/{job_id}. |
status |
string | Always pending on creation. |
message |
string | Human-readable confirmation. |
Errors
| Status | Code | Cause |
|---|---|---|
400 |
BAD_REQUEST |
file field missing or filename is empty. |
400 |
INVALID_AUDIO_FILE |
Unsupported format or audio duration exceeds 60 minutes. |
401 |
UNAUTHORIZED |
Missing or invalid JWT. |
402 |
SUBSCRIPTION_REQUIRED |
Organisation subscription inactive or trial expired. |
413 |
AUDIO_FILE_TOO_LARGE |
File exceeds 25 MB. |
curl -X POST https://api.knora.io/api/v1/capture/voice/upload \
-H "Authorization: Bearer <token>" \
-F "file=@/path/to/recording.mp3"
GET /capture/voice/jobs¶
Auth: JWT required · Plan: Any
Returns a paginated list of voice capture jobs belonging to the authenticated user within their organisation.
Query parameters
| Parameter | Type | Default | Max | Description |
|---|---|---|---|---|
page |
integer | 1 |
— | Page number (1-based). |
per_page |
integer | 20 |
100 |
Results per page. Values above 100 are silently clamped. |
Response — 200 OK
{
"jobs": [
{
"id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
"org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
"type": "voice",
"status": "completed",
"source_filename": "meeting-notes.mp3",
"file_size": 4194304,
"mime_type": "audio/mpeg",
"created_by": "ffff0000-1111-4222-8333-444455556666",
"created_at": "2026-05-30T09:00:00Z",
"updated_at": "2026-05-30T09:01:22Z",
"completed_at": "2026-05-30T09:01:22Z",
"error_message": null,
"result_entry_id": "11112222-3333-4444-8555-666677778888",
"metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
}
],
"total": 42,
"page": 1,
"per_page": 20
}
Envelope fields
| Field | Type | Description |
|---|---|---|
jobs |
array | List of CaptureJob objects (see below). |
total |
integer | Total number of jobs across all pages. |
page |
integer | Current page as requested. |
per_page |
integer | Page size applied (clamped to 100). |
CaptureJob fields
| Field | Type | Nullable | Description |
|---|---|---|---|
id |
UUID string | No | Job UUID. |
org_id |
UUID string | No | Organisation this job belongs to. |
type |
string | No | Always "voice" for this module. |
status |
string | No | pending, processing, completed, or failed. |
source_filename |
string | No | Original filename as uploaded. |
file_size |
integer | No | File size in bytes. |
mime_type |
string | No | MIME type detected at upload time. |
created_by |
UUID string | No | UUID of the user who uploaded the file. |
created_at |
ISO 8601 string | No | Timestamp when the job was created. |
updated_at |
ISO 8601 string | No | Timestamp of the last status change. |
completed_at |
ISO 8601 string | Yes | Timestamp when transcription finished. null until completed. |
error_message |
string | Yes | Error description if status is failed. |
result_entry_id |
UUID string | Yes | UUID of the resulting KnowledgeEntry. null until completed. |
metadata_json |
JSON string | Yes | Serialised Whisper output: detected_language (ISO 639-1) and duration_seconds. |
Job status lifecycle
Errors
| Status | Code | Cause |
|---|---|---|
401 |
UNAUTHORIZED |
Missing or invalid JWT. |
402 |
SUBSCRIPTION_REQUIRED |
Organisation subscription inactive or trial expired. |
curl "https://api.knora.io/api/v1/capture/voice/jobs?page=1&per_page=20" \
-H "Authorization: Bearer <token>"
GET /capture/voice/jobs/{job_id}¶
Auth: JWT required · Plan: Any
Retrieve the current status and result of a single voice capture job. Poll this endpoint after uploading to determine when transcription has finished.
Path parameters
| Parameter | Type | Description |
|---|---|---|
job_id |
UUID string | The job_id returned by the upload endpoint. |
Response — 200 OK
Returns a single CaptureJob object. See the field table in GET /jobs for field descriptions.
{
"id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
"org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
"type": "voice",
"status": "completed",
"source_filename": "meeting-notes.mp3",
"file_size": 4194304,
"mime_type": "audio/mpeg",
"created_by": "ffff0000-1111-4222-8333-444455556666",
"created_at": "2026-05-30T09:00:00Z",
"updated_at": "2026-05-30T09:01:22Z",
"completed_at": "2026-05-30T09:01:22Z",
"error_message": null,
"result_entry_id": "11112222-3333-4444-8555-666677778888",
"metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
}
Errors
| Status | Code | Cause |
|---|---|---|
400 |
BAD_REQUEST |
job_id is not a valid UUID. |
401 |
UNAUTHORIZED |
Missing or invalid JWT. |
402 |
SUBSCRIPTION_REQUIRED |
Organisation subscription inactive or trial expired. |
404 |
VOICE_JOB_NOT_FOUND |
No job with this ID for this user and organisation. Cross-tenant access is silently treated as not found. |
curl "https://api.knora.io/api/v1/capture/voice/jobs/d4a1c2b3-0001-4f2e-8abc-111122223333" \
-H "Authorization: Bearer <token>"
POST /capture/voice/transcribe¶
Auth: JWT required · Plan: Any
Synchronously transcribe an audio file via Whisper and return the raw transcript text. Unlike the main upload flow, this endpoint does not create a CaptureJob or KnowledgeEntry. The temporary file is deleted from storage immediately after transcription.
Intended for ephemeral use cases such as voice answers in interview flows, where the transcript is embedded into another object rather than stored as a standalone knowledge entry.
Request — multipart/form-data
| Field | Type | Required | Description |
|---|---|---|---|
file |
binary | Yes | Audio file. Field name must be exactly file. |
Accepted formats and size/duration limits are identical to POST /upload.
Response — 200 OK
| Field | Type | Description |
|---|---|---|
transcript |
string | Raw text returned by Whisper. Language is auto-detected; no language metadata is included in this response. |
Errors
| Status | Code | Cause |
|---|---|---|
400 |
BAD_REQUEST |
file field missing or filename is empty. |
400 |
INVALID_AUDIO_FILE |
Unsupported format or audio exceeds 60-minute duration limit. |
401 |
UNAUTHORIZED |
Missing or invalid JWT. |
402 |
SUBSCRIPTION_REQUIRED |
Organisation subscription inactive or trial expired. |
413 |
AUDIO_FILE_TOO_LARGE |
File exceeds 25 MB. |
500 |
TRANSCRIPTION_FAILED |
Whisper API call failed after retries (e.g. GROQ_API_KEY not configured, or upstream service error). |
curl -X POST https://api.knora.io/api/v1/capture/voice/transcribe \
-H "Authorization: Bearer <token>" \
-F "file=@/path/to/voice-answer.webm"
Polling pattern¶
After calling POST /upload, poll GET /jobs/{job_id} until status is completed or failed. A reasonable interval is 2–5 seconds.
POST /upload → job_id
↓
GET /jobs/{job_id} (status: pending)
↓ wait 2–5s
GET /jobs/{job_id} (status: processing)
↓ wait 2–5s
GET /jobs/{job_id} (status: completed)
↓
result_entry_id → fetch KnowledgeEntry
If status is failed, the error_message field describes the reason.
Transcription backend¶
Audio is transcribed using Groq Whisper large-v3 with automatic language detection. The detected language code (ISO 639-1) and audio duration in seconds are stored in metadata_json on the CaptureJob.
| Whisper language code | KnowledgeLanguage |
|---|---|
en |
en |
ar |
ar |
| anything else | mixed |
The Whisper API call is retried up to 3 times with exponential backoff on transient errors (rate limits, connection errors, 5xx responses). 4xx responses from the upstream API are not retried.