Skip to content

Voice Capture API

Base path: /api/v1/capture/voice

The voice capture module accepts audio file uploads, runs asynchronous Whisper-based transcription, and saves the resulting transcript as a KnowledgeEntry. A lightweight synchronous transcription endpoint is also available for ephemeral use cases such as voice answers in interview flows.

See API Reference for auth, errors, and pagination.


POST /capture/voice/upload

Auth: JWT required · Plan: Any

Upload an audio file. The server validates it, saves it to storage, creates a CaptureJob, and enqueues an async Whisper transcription task. When the Celery task completes, the transcript is saved as a KnowledgeEntry with source voice_note and status needs_review.

Requestmultipart/form-data

Field Type Required Description
file binary Yes Audio file. Field name must be exactly file.

Accepted formats

Extension MIME types
.wav audio/wav, audio/x-wav
.mp3 / .mpeg / .mpga audio/mpeg, audio/mp3
.m4a audio/mp4, audio/x-m4a, audio/m4a
.webm audio/webm, video/webm
.ogg audio/ogg

Validation passes if either the file extension or the Content-Type matches an allowed value.

Constraints

Constraint Limit
Maximum file size 25 MB
Maximum audio duration 60 minutes

Response — 201 Created

{
  "job_id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
  "status": "pending",
  "message": "Audio uploaded successfully. Transcription is in progress."
}
Field Type Description
job_id UUID string ID of the created CaptureJob. Poll with GET /jobs/{job_id}.
status string Always pending on creation.
message string Human-readable confirmation.

Errors

Status Code Cause
400 BAD_REQUEST file field missing or filename is empty.
400 INVALID_AUDIO_FILE Unsupported format or audio duration exceeds 60 minutes.
401 UNAUTHORIZED Missing or invalid JWT.
402 SUBSCRIPTION_REQUIRED Organisation subscription inactive or trial expired.
413 AUDIO_FILE_TOO_LARGE File exceeds 25 MB.
curl -X POST https://api.knora.io/api/v1/capture/voice/upload \
  -H "Authorization: Bearer <token>" \
  -F "file=@/path/to/recording.mp3"

GET /capture/voice/jobs

Auth: JWT required · Plan: Any

Returns a paginated list of voice capture jobs belonging to the authenticated user within their organisation.

Query parameters

Parameter Type Default Max Description
page integer 1 Page number (1-based).
per_page integer 20 100 Results per page. Values above 100 are silently clamped.

Response — 200 OK

{
  "jobs": [
    {
      "id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
      "org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
      "type": "voice",
      "status": "completed",
      "source_filename": "meeting-notes.mp3",
      "file_size": 4194304,
      "mime_type": "audio/mpeg",
      "created_by": "ffff0000-1111-4222-8333-444455556666",
      "created_at": "2026-05-30T09:00:00Z",
      "updated_at": "2026-05-30T09:01:22Z",
      "completed_at": "2026-05-30T09:01:22Z",
      "error_message": null,
      "result_entry_id": "11112222-3333-4444-8555-666677778888",
      "metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
    }
  ],
  "total": 42,
  "page": 1,
  "per_page": 20
}

Envelope fields

Field Type Description
jobs array List of CaptureJob objects (see below).
total integer Total number of jobs across all pages.
page integer Current page as requested.
per_page integer Page size applied (clamped to 100).

CaptureJob fields

Field Type Nullable Description
id UUID string No Job UUID.
org_id UUID string No Organisation this job belongs to.
type string No Always "voice" for this module.
status string No pending, processing, completed, or failed.
source_filename string No Original filename as uploaded.
file_size integer No File size in bytes.
mime_type string No MIME type detected at upload time.
created_by UUID string No UUID of the user who uploaded the file.
created_at ISO 8601 string No Timestamp when the job was created.
updated_at ISO 8601 string No Timestamp of the last status change.
completed_at ISO 8601 string Yes Timestamp when transcription finished. null until completed.
error_message string Yes Error description if status is failed.
result_entry_id UUID string Yes UUID of the resulting KnowledgeEntry. null until completed.
metadata_json JSON string Yes Serialised Whisper output: detected_language (ISO 639-1) and duration_seconds.

Job status lifecycle

pending → processing → completed
                     → failed

Errors

Status Code Cause
401 UNAUTHORIZED Missing or invalid JWT.
402 SUBSCRIPTION_REQUIRED Organisation subscription inactive or trial expired.
curl "https://api.knora.io/api/v1/capture/voice/jobs?page=1&per_page=20" \
  -H "Authorization: Bearer <token>"

GET /capture/voice/jobs/{job_id}

Auth: JWT required · Plan: Any

Retrieve the current status and result of a single voice capture job. Poll this endpoint after uploading to determine when transcription has finished.

Path parameters

Parameter Type Description
job_id UUID string The job_id returned by the upload endpoint.

Response — 200 OK

Returns a single CaptureJob object. See the field table in GET /jobs for field descriptions.

{
  "id": "d4a1c2b3-0001-4f2e-8abc-111122223333",
  "org_id": "aaaabbbb-0000-4000-8000-ccccddddeeee",
  "type": "voice",
  "status": "completed",
  "source_filename": "meeting-notes.mp3",
  "file_size": 4194304,
  "mime_type": "audio/mpeg",
  "created_by": "ffff0000-1111-4222-8333-444455556666",
  "created_at": "2026-05-30T09:00:00Z",
  "updated_at": "2026-05-30T09:01:22Z",
  "completed_at": "2026-05-30T09:01:22Z",
  "error_message": null,
  "result_entry_id": "11112222-3333-4444-8555-666677778888",
  "metadata_json": "{\"detected_language\": \"en\", \"duration_seconds\": 187.4}"
}

Errors

Status Code Cause
400 BAD_REQUEST job_id is not a valid UUID.
401 UNAUTHORIZED Missing or invalid JWT.
402 SUBSCRIPTION_REQUIRED Organisation subscription inactive or trial expired.
404 VOICE_JOB_NOT_FOUND No job with this ID for this user and organisation. Cross-tenant access is silently treated as not found.
curl "https://api.knora.io/api/v1/capture/voice/jobs/d4a1c2b3-0001-4f2e-8abc-111122223333" \
  -H "Authorization: Bearer <token>"

POST /capture/voice/transcribe

Auth: JWT required · Plan: Any

Synchronously transcribe an audio file via Whisper and return the raw transcript text. Unlike the main upload flow, this endpoint does not create a CaptureJob or KnowledgeEntry. The temporary file is deleted from storage immediately after transcription.

Intended for ephemeral use cases such as voice answers in interview flows, where the transcript is embedded into another object rather than stored as a standalone knowledge entry.

Requestmultipart/form-data

Field Type Required Description
file binary Yes Audio file. Field name must be exactly file.

Accepted formats and size/duration limits are identical to POST /upload.

Response — 200 OK

{
  "transcript": "Hello, this is a voice note about the Q2 planning session..."
}
Field Type Description
transcript string Raw text returned by Whisper. Language is auto-detected; no language metadata is included in this response.

Errors

Status Code Cause
400 BAD_REQUEST file field missing or filename is empty.
400 INVALID_AUDIO_FILE Unsupported format or audio exceeds 60-minute duration limit.
401 UNAUTHORIZED Missing or invalid JWT.
402 SUBSCRIPTION_REQUIRED Organisation subscription inactive or trial expired.
413 AUDIO_FILE_TOO_LARGE File exceeds 25 MB.
500 TRANSCRIPTION_FAILED Whisper API call failed after retries (e.g. GROQ_API_KEY not configured, or upstream service error).
curl -X POST https://api.knora.io/api/v1/capture/voice/transcribe \
  -H "Authorization: Bearer <token>" \
  -F "file=@/path/to/voice-answer.webm"

Polling pattern

After calling POST /upload, poll GET /jobs/{job_id} until status is completed or failed. A reasonable interval is 2–5 seconds.

POST /upload  →  job_id
GET /jobs/{job_id}   (status: pending)
  ↓  wait 2–5s
GET /jobs/{job_id}   (status: processing)
  ↓  wait 2–5s
GET /jobs/{job_id}   (status: completed)
result_entry_id  →  fetch KnowledgeEntry

If status is failed, the error_message field describes the reason.

Transcription backend

Audio is transcribed using Groq Whisper large-v3 with automatic language detection. The detected language code (ISO 639-1) and audio duration in seconds are stored in metadata_json on the CaptureJob.

Whisper language code KnowledgeLanguage
en en
ar ar
anything else mixed

The Whisper API call is retried up to 3 times with exponential backoff on transient errors (rate limits, connection errors, 5xx responses). 4xx responses from the upstream API are not retried.