Skip to main content

Voice Gallery API

Voice Gallery management system for storing and managing available AI voice models for text-to-speech generation.

BASE API ENDPOINT

/api/v1/voice-gallery

Epic 1 Scope

In Epic 1, the Voice Gallery serves as a read-only voice selection interface for users:

  • Users browse published voices via GET /api/v1/voice-gallery?isPublished=true
  • Users select a voice for their TTS submission
  • Voice management (Create, Update, Delete) is done via API endpoints only (no web UI)

User Flow: Browse Voices → Select Voice → Submit TTS Request → Get Audio Link


Entity Schema

interface VoiceGalleryEntity {
id: string; // UUID
name: string; // Voice name (e.g., "Rachel", "Adam")
voiceCharacteristic: string; // "Male", "Female", "Accent" - Named 'voiceCharacteristic' to avoid conflicts with reserved keywords
orderNumber: number; // Display order (for sorting voices in UI)
samples: string[]; // Array of S3 URLs (uploaded as files, stored as URLs)
voiceIsAI: boolean; // Not used in first draft (future feature)
isPublished: boolean; // Whether voice is visible to users
createdAt: string; // ISO 8601 timestamp
updatedAt: string; // ISO 8601 timestamp
createdBy: {
// Identity user who created the voice
userId: string; // User ID from Microsoft Identity
userName: string; // User display name
};
updatedBy: {
// Identity user who last updated
userId: string; // User ID from Microsoft Identity
userName: string; // User display name
};
}

Notes:

  • Sample files are uploaded via multipart form data during Create/Update operations, then stored as S3 URLs in the database.
  • createdBy and updatedBy are populated from the authenticated Microsoft Identity user making the request.
  • On GET requests, both user ID and display name are returned for audit trail and user reference.
  • Items are sorted by orderNumber (ascending). If multiple items have the same orderNumber, they are sorted alphabetically by name.

Entity Example

{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Rachel",
"voiceCharacteristic": "Female",
"orderNumber": 1,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-1.mp3",
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-2.mp3"
],
"voiceIsAI": true,
"isPublished": true,
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-20T14:45:00Z",
"createdBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
},
"updatedBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
}
}

Request Body Fields Reference

The following fields are used in Create (POST) and Update (PUT) operations:

Field NameTypeDescriptionPOSTPUT
namestringVoice nameRequiredOptional
voiceCharacteristicstringVoice category: "Male", "Female", or "Accent"RequiredOptional
orderNumbernumberDisplay order for sorting (lower numbers appear first)RequiredOptional
isPublishedbooleanPublication status (default: false)OptionalOptional
samplesstring[]Array of S3 URLs for voice samples (min: 1, max: 5)RequiredOptional
Recommended Approach

Use only ONE sample URL per voice for simplicity in MVP 1. Multiple samples are supported but recommended for MVP 2.

Sample Upload Process:

  1. Request pre-signed URL via POST /api/v1/voice-gallery/upload-url
  2. Upload file directly to S3 using the pre-signed URL
  3. Use the returned fileUrl in the samples array when creating/updating voice

CRUD Endpoints

Get All Voices (with Pagination)

Endpoint: GET /api/v1/voice-gallery

Authentication: Optional (public endpoint)

Authorization Behavior:

  • Anonymous/Unauthenticated users: Can only view published voices (isPublished=true). The isPublished parameter is ignored and always defaults to true.
  • Authenticated admin users: Can view all voices (published and unpublished) by setting isPublished=false or omitting the parameter.

Query Parameters:

  • page (number, optional) - Page number (default: 1)
  • size (number, optional) - Items per page (default: 10, max: 100)
  • isPublished (boolean, optional) - Filter by published status (default: true for all users, forced to true for anonymous users)

Sorting:

  • Results are automatically sorted by orderNumber (ascending)
  • If multiple voices have the same orderNumber, they are sorted alphabetically by name

Request Example (Anonymous):

GET /api/v1/voice-gallery?page=1&size=10

Request Example (Authenticated Admin):

GET /api/v1/voice-gallery?page=1&size=10&isPublished=false
Authorization: Bearer {access-token}

Response: 200 OK

{
"success": true,
"data": {
"items": [
{
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Rachel",
"voiceCharacteristic": "Female",
"orderNumber": 1,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-1.mp3",
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-2.mp3"
],
"voiceIsAI": true,
"isPublished": true,
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-20T14:45:00Z",
"createdBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
},
"updatedBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
}
},
{
"id": "660e8400-e29b-41d4-a716-446655440001",
"name": "Adam",
"voiceCharacteristic": "Male",
"orderNumber": 2,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/adam-sample-1.mp3"
],
"voiceIsAI": true,
"isPublished": true,
"createdAt": "2024-01-15T11:00:00Z",
"updatedAt": "2024-01-15T11:00:00Z",
"createdBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
},
"updatedBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
}
}
],
"pagination": {
"page": 1,
"size": 10,
"totalItems": 8,
"totalPages": 1,
"hasNextPage": false,
"hasPreviousPage": false
}
}
}

Get Voice by ID

Endpoint: GET /api/v1/voice-gallery/{id}

Authentication: Optional (public endpoint)

Authorization Behavior:

  • Anonymous/Unauthenticated users: Can only retrieve published voices (isPublished=true). Returns 404 if the voice is unpublished.
  • Authenticated admin users: Can retrieve any voice (published or unpublished).

Path Parameters:

  • id (string, required) - Voice UUID

Request Example (Anonymous):

GET /api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000

Request Example (Authenticated Admin):

GET /api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000
Authorization: Bearer {access-token}

Response: 200 OK

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Rachel",
"voiceCharacteristic": "Female",
"orderNumber": 1,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-1.mp3",
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-2.mp3"
],
"voiceIsAI": true,
"isPublished": true,
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-20T14:45:00Z",
"createdBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
},
"updatedBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
}
}
}

Error Response: 404 Not Found

Returned when:

  • Voice ID does not exist
  • Voice is unpublished and accessed by an anonymous/unauthenticated user
{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}

File Upload Strategy: S3 Pre-Signed URLs

Voice sample files are uploaded directly to S3 using pre-signed URLs for better performance and scalability.

Why Pre-Signed URLs?

Benefits:

  • Direct to S3: Files upload directly to S3, bypassing backend
  • Faster uploads: No backend bottleneck
  • Scalable: Backend doesn't handle large file streams
  • Secure: Pre-signed URLs expire after 15 minutes
  • Progress tracking: Frontend can show upload progress

Upload Flow:

  1. Request pre-signed URL from backend
  2. Upload file directly to S3 using the pre-signed URL
  3. Create/update voice with the S3 URL

Get Pre-Signed Upload URL

Endpoint: POST /api/v1/voice-gallery/upload-url

Description: Generates a pre-signed URL for uploading voice sample files directly to S3.

Headers:

Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

Request Body:

{
"fileName": "rachel-sample-1.mp3",
"fileType": "audio/mpeg",
"fileSize": 2048576
}

Request Body Fields:

  • fileName (string, required) - Original filename with extension
  • fileType (string, required) - MIME type (must be audio/mpeg for MP3)
  • fileSize (number, required) - File size in bytes (max 5MB = 5242880 bytes)

Response: 200 OK

{
"success": true,
"data": {
"uploadUrl": "https://micdots-audio.s3.amazonaws.com/voice-samples/550e8400-sample-1.mp3?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=...",
"fileUrl": "https://micdots-audio.s3.amazonaws.com/voice-samples/550e8400-sample-1.mp3",
"fileKey": "voice-samples/550e8400-sample-1.mp3",
"expiresIn": 900
},
"message": "Pre-signed URL generated successfully. Upload expires in 15 minutes."
}

Response Fields:

  • uploadUrl - Pre-signed URL for uploading (use this with PUT request)
  • fileUrl - Final S3 URL after upload completes (use this when creating voice)
  • fileKey - S3 object key
  • expiresIn - Seconds until URL expires (900 = 15 minutes)

Error Response: 400 Bad Request (Invalid file type)

{
"success": false,
"error": {
"code": "INVALID_FILE_TYPE",
"message": "Only MP3 files are allowed. Received: audio/wav"
}
}

Error Response: 400 Bad Request (File too large)

{
"success": false,
"error": {
"code": "FILE_TOO_LARGE",
"message": "File size 7340032 bytes exceeds maximum of 5242880 bytes (5MB)"
}
}

Upload File to S3 (Client-Side)

After receiving the pre-signed URL, upload the file directly to S3:

Request: PUT {uploadUrl}

Headers:

Content-Type: audio/mpeg

Body: Binary audio file data

Example (JavaScript/TypeScript):

// Step 1: Get pre-signed URL from backend
const response = await fetch("/api/v1/voice-gallery/upload-url", {
method: "POST",
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
fileName: file.name,
fileType: file.type,
fileSize: file.size,
}),
});

const { data } = await response.json();
const { uploadUrl, fileUrl } = data;

// Step 2: Upload file directly to S3
await fetch(uploadUrl, {
method: "PUT",
headers: {
"Content-Type": file.type,
},
body: file,
});

// Step 3: Use fileUrl when creating voice
await fetch("/api/v1/voice-gallery", {
method: "POST",
headers: {
Authorization: `Bearer ${accessToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
name: "Rachel",
voiceCharacteristic: "Female",
orderNumber: 1,
isPublished: true,
samples: [fileUrl], // Use the fileUrl from step 1
}),
});

Upload with Progress Tracking:

// Using XMLHttpRequest for progress tracking
const uploadWithProgress = (
uploadUrl: string,
file: File,
onProgress: (percent: number) => void
) => {
return new Promise((resolve, reject) => {
const xhr = new XMLHttpRequest();

xhr.upload.addEventListener("progress", (e) => {
if (e.lengthComputable) {
const percentComplete = (e.loaded / e.total) * 100;
onProgress(percentComplete);
}
});

xhr.addEventListener("load", () => {
if (xhr.status === 200) {
resolve(xhr.response);
} else {
reject(new Error(`Upload failed: ${xhr.status}`));
}
});

xhr.addEventListener("error", () => reject(new Error("Upload failed")));

xhr.open("PUT", uploadUrl);
xhr.setRequestHeader("Content-Type", file.type);
xhr.send(file);
});
};

// Usage
await uploadWithProgress(uploadUrl, file, (percent) => {
console.log(`Upload progress: ${percent.toFixed(2)}%`);
});

Insert a New Voice

MVP 1 Feature

This endpoint is available in MVP 1 for creating voices via API (Postman, Insomnia, cURL).

Endpoint: POST /api/v1/voice-gallery

Headers:

Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

Request Body (JSON):

{
"name": "Daniel",
"voiceCharacteristic": "Male",
"orderNumber": 3,
"isPublished": false,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/voices/550e8400-sample-1.mp3"
]
}

Request Body Fields:

  • name (string, required) - Voice name
  • voiceCharacteristic (string, required) - "Male", "Female", or "Accent"
  • orderNumber (number, required) - Display order
  • isPublished (boolean, required) - Visibility status
  • samples (string[], required) - Array of S3 URLs (at least 1, max 5)

Processing Flow:

  1. Validate request fields
  2. Verify S3 URLs are accessible
  3. Automatically populate createdBy and updatedBy from JWT token
  4. Create voice gallery entry with S3 URLs
  5. Return created voice entity

Note: createdBy and updatedBy are NOT sent in the request. The API automatically extracts the user ID and user name from the authenticated user's JWT token and populates these fields.

Response: 201 Created

{
"success": true,
"data": {
"id": "770e8400-e29b-41d4-a716-446655440002",
"name": "Daniel",
"voiceCharacteristic": "Male",
"orderNumber": 3,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/voices/550e8400-sample-1.mp3"
],
"voiceIsAI": true,
"isPublished": false,
"createdAt": "2024-01-22T09:15:00Z",
"updatedAt": "2024-01-22T09:15:00Z",
"createdBy": {
"userId": "user-id-456",
"userName": "John Doe"
},
"updatedBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}

Error Response: 400 Bad Request

{
"success": false,
"error": {
"code": "VALIDATION_ERROR",
"message": "Validation failed",
"fields": {
"name": ["Name is required"],
"voiceCharacteristic": [
"voiceCharacteristic must be 'Male', 'Female', or 'Accent'"
],
"samples": ["At least one sample URL is required"]
}
}
}

Error Response: 400 Bad Request (Invalid S3 URL)

{
"success": false,
"error": {
"code": "INVALID_S3_URL",
"message": "One or more S3 URLs are invalid or inaccessible",
"invalidUrls": ["https://invalid-bucket.s3.amazonaws.com/file.mp3"]
}
}

Update Voice

Endpoint: PUT /api/v1/voice-gallery/{id}

Path Parameters:

  • id (string, required) - Voice UUID

Headers:

Authorization: Bearer ACCESS_TOKEN
Content-Type: application/json

Request Body (JSON):

{
"name": "Rachel (Updated)",
"voiceCharacteristic": "Female",
"orderNumber": 10,
"isPublished": true,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/voices/rachel-sample-1-updated.mp3",
"https://micdots-audio.s3.amazonaws.com/voice-samples/voices/rachel-sample-2-updated.mp3"
]
}

Request Body Fields (all optional):

  • name (string, optional) - Voice name
  • voiceCharacteristic (string, optional) - "Male", "Female", or "Accent"
  • orderNumber (number, optional) - Display order
  • isPublished (boolean, optional) - Visibility status
  • samples (string[], optional) - Array of S3 URLs (replaces existing samples)

Processing Flow:

  1. Validate request fields
  2. If samples provided, verify S3 URLs are accessible
  3. If samples provided, replace existing samples with new URLs
  4. Automatically update updatedBy from JWT token
  5. Update voice gallery entry
  6. Return updated voice entity

Note: updatedBy is NOT sent in the request. The API automatically extracts the user ID and user name from the authenticated user's JWT token and updates this field. createdBy remains unchanged.

Important: Providing samples will replace all existing samples. To add samples while keeping existing ones, retrieve current samples first, then send all samples (existing + new) in the update request.

Response: 200 OK

{
"success": true,
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"name": "Rachel (Updated)",
"voiceCharacteristic": "Female",
"orderNumber": 10,
"samples": [
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-1-updated.mp3",
"https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-2-updated.mp3"
],
"voiceIsAI": true,
"isPublished": true,
"createdAt": "2024-01-15T10:30:00Z",
"updatedAt": "2024-01-22T16:20:00Z",
"createdBy": {
"userId": "admin-user-id-123",
"userName": "Admin User"
},
"updatedBy": {
"userId": "user-id-456",
"userName": "John Doe"
}
}
}

Error Response: 404 Not Found

{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}

Delete Voice

Endpoint: DELETE /api/v1/voice-gallery/{id}

Path Parameters:

  • id (string, required) - Voice UUID

Headers:

Authorization: Bearer {access-token}

Request Example:

DELETE /api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000
Authorization: Bearer {access-token}

Response: 200 OK

{
"success": true,
"message": "Voice deleted successfully",
"data": {
"id": "550e8400-e29b-41d4-a716-446655440000",
"deletedAt": "2024-01-22T17:30:00Z"
}
}

Error Response: 404 Not Found

{
"success": false,
"error": {
"code": "VOICE_NOT_FOUND",
"message": "Voice with ID '550e8400-e29b-41d4-a716-446655440000' not found."
}
}

Error Response: 409 Conflict

{
"success": false,
"error": {
"code": "VOICE_IN_USE",
"message": "Cannot delete voice that is currently in use by existing audio generations.",
"affectedGenerations": 42
}
}

Validation Rules

Name

  • Required: Yes
  • Type: String
  • Min Length: 2 characters
  • Max Length: 100 characters
  • Pattern: Letters, numbers, spaces, hyphens, and parentheses only

Voice Characteristic

  • Field Name: voiceCharacteristic (named to avoid conflicts with reserved keywords)
  • Required: Yes
  • Type: String
  • Min Length: 3 characters
  • Max Length: 100 characters
  • Allowed Values: "Male", "Female", "Accent"
  • Case Sensitive: Yes

Samples (File Upload)

  • Required: Yes (at least one sample file)
  • Type: MP3 audio files (multipart/form-data)
  • Min Items: 1 sample file
  • Max Items: 5 sample files
  • File Type: MP3 only (.mp3 extension)
  • Max File Size: 5 MB per file
  • Field Names: sampleFile[0], sampleFile[1], sampleFile[2], sampleFile[3], sampleFile[4]
  • Storage: Files are automatically uploaded to S3 and URLs stored in samples array

IsPublished

  • Required: Yes
  • Type: Boolean
  • Default: false

Dependencies

AWS S3 Storage

Voice sample audio files are stored in AWS S3.

S3 Bucket Configuration:

  • Bucket Name: micdots-audio (shared with TTS submissions)
  • Folder: /voice-samples (voice gallery audio samples)
  • Submissions Folder: /submissions (TTS request audio files - see TTS Submission API)
  • Region: us-east-1
  • Access: Private (signed URLs for access)
  • File Format: MP3
  • Max File Size: 5 MB per sample
Single Bucket Architecture

The same S3 bucket (micdots-audio) is used for both voice gallery samples and user submissions, organized in separate folders:

  • /voice-samples - Voice gallery audio samples (this endpoint)
  • /submissions - User TTS request audio files (TTS Submission API)

This approach simplifies infrastructure management while maintaining clear separation between different types of audio files.

  • File Naming Convention: {voice-name}-sample-{number}.mp3

Sample Upload Process:

// 1. Upload sample to S3
const s3Url = await uploadToS3({
bucket: "micdots-audio",
folder: "voice-samples",
key: `${voiceName.toLowerCase()}-sample-${index + 1}.mp3`,
file: audioFile,
contentType: "audio/mpeg",
});

// 2. Add URL to samples array
samples.push(s3Url);

// 3. Create voice gallery entry
await createVoice({
name: voiceName,
voiceCharacteristic: voiceCharacteristic,
samples: samples,
isPublished: false,
});

Integration Flow


Testing Endpoints

Using cURL

Get All Voices:

curl -X GET "http://localhost:5000/api/v1/voice-gallery?page=1&size=10" \
-H "Authorization: Bearer {your-access-token}"

Get Voice by ID:

curl -X GET "http://localhost:5000/api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer {your-access-token}"

Get Pre-Signed URL for Upload:

curl -X POST "http://localhost:5000/api/v1/voice-gallery/upload-url" \
-H "Authorization: Bearer {your-access-token}" \
-H "Content-Type: application/json" \
-d '{
"fileName": "daniel-sample-1.mp3",
"fileType": "audio/mpeg",
"fileSize": 2048576
}'

Upload File to S3 (using pre-signed URL from previous response):

curl -X PUT "{upload-url-from-previous-response}" \
-H "Content-Type: audio/mpeg" \
--data-binary "@/path/to/daniel-sample-1.mp3"

Create New Voice (with S3 URL from upload):

curl -X POST "http://localhost:5000/api/v1/voice-gallery" \
-H "Authorization: Bearer {your-access-token}" \
-H "Content-Type: application/json" \
-d '{
"name": "Daniel",
"voiceCharacteristic": "Male",
"orderNumber": 3,
"isPublished": false,
"samples": ["https://micdots-audio.s3.amazonaws.com/voice-samples/voices/550e8400-sample-1.mp3"]
}'

Update Voice:

curl -X PUT "http://localhost:5000/api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer {your-access-token}" \
-H "Content-Type: application/json" \
-d '{
"name": "Rachel (Updated)",
"voiceCharacteristic": "Female",
"samples": ["https://micdots-audio.s3.amazonaws.com/voice-samples/rachel-sample-1.mp3"],
"isPublished": true
}'

Delete Voice:

curl -X DELETE "http://localhost:5000/api/v1/voice-gallery/550e8400-e29b-41d4-a716-446655440000" \
-H "Authorization: Bearer {your-access-token}"

Security Considerations

Authorization

  • All endpoints require authentication
  • Only admin users can create, update, or delete voices
  • Regular users can only read published voices

Input Validation

  • Sanitize all text inputs to prevent XSS attacks
  • Validate S3 URLs to ensure they point to the correct bucket
  • Limit file upload sizes to prevent abuse

Rate Limiting

  • Create: 10 requests per hour per user
  • Read: 100 requests per minute per user
  • Update: 20 requests per hour per user
  • Delete: 5 requests per hour per user


Future Enhancements (Not in MVP)

Future Features

The following features are planned for future releases but are NOT included in Epic 1 MVP:

  • Voice preview directly in the UI
  • Voice categories/tags for better organization
  • Voice search and filtering by characteristics
  • Voice popularity metrics
  • User ratings and reviews for voices
  • A/B testing different voice samples
  • Voice cloning functionality
  • Custom voice uploads from users