Audio Commands

The lc tool provides comprehensive audio support, enabling both speech-to-text (transcription) and text-to-speech (TTS) capabilities.

Audio Transcription

Convert audio files to text using the transcribe command.

Basic Usage

# Transcribe a single audio file
lc transcribe audio.wav

# Transcribe multiple audio files
lc transcribe file1.mp3 file2.wav file3.flac

# Use alias
lc tr recording.mp3

Options

-m, --model <MODEL> - Specify the transcription model (default: whisper-1)
-p, --provider <PROVIDER> - Specify the provider (default: openai)
-l, --language <LANG> - Specify the audio language (e.g., en, es, fr)
--prompt <TEXT> - Provide context to guide the transcription
-f, --format <FORMAT> - Output format: text, json, srt, vtt (default: text)
-t, --temperature <TEMP> - Sampling temperature (0-1, default: 0)

Supported Audio Formats

MP3, MP4, MPEG, MPGA, M4A
WAV
WebM
OGG
FLAC

Examples

# Transcribe with language hint
lc transcribe interview.mp3 --language en

# Get JSON output with timestamps
lc transcribe podcast.wav --format json

# Provide context for better accuracy
lc transcribe medical_recording.mp3 --prompt "Medical consultation discussing symptoms"

# Use a different provider
lc transcribe audio.wav --provider custom-provider

Text-to-Speech (TTS)

Convert text to speech using the tts command.

Basic Usage

# Generate speech from text
lc tts "Hello, this is a test of text to speech"

# Save to a specific file
lc tts "Welcome to our service" --output welcome.mp3

# Read from a file
lc tts --file script.txt --output narration.mp3

Options

-m, --model <MODEL> - TTS model: tts-1, tts-1-hd (default: tts-1)
-p, --provider <PROVIDER> - Specify the provider (default: openai)
-v, --voice <VOICE> - Voice selection: alloy, echo, fable, onyx, nova, shimmer (default: alloy)
-o, --output <FILE> - Output file path (default: speech_TIMESTAMP.mp3)
-f, --format <FORMAT> - Audio format: mp3, opus, aac, flac, wav, pcm (default: mp3)
-s, --speed <SPEED> - Speech speed: 0.25 to 4.0 (default: 1.0)
--file <FILE> - Read text from a file instead of command line

Voice Options

alloy - Neutral and balanced
echo - Warm and conversational
fable - Expressive and dynamic
onyx - Deep and authoritative
nova - Friendly and upbeat
shimmer - Soft and pleasant

Examples

# Use HD model for better quality
lc tts "Important announcement" --model tts-1-hd

# Generate with different voice
lc tts "Welcome message" --voice nova --output welcome.mp3

# Adjust speech speed
lc tts "Quick instructions" --speed 1.5

# Generate in different format
lc tts "Audio book chapter" --format flac --output chapter1.flac

# Read from file with specific voice
lc tts --file presentation.txt --voice onyx --output presentation.mp3

Audio Attachments in Chat

You can attach audio files to chat prompts for context-aware conversations.

Basic Usage

# Ask about audio content
lc "What is being discussed in this recording?" --audio meeting.mp3

# Multiple audio files
lc "Summarize these interviews" --audio interview1.mp3 --audio interview2.wav

# Combine with other attachments
lc "Analyze this presentation" --audio narration.mp3 --image slides.png

How It Works

Audio files are automatically transcribed using the Whisper model
Transcriptions are added to the chat context
The LLM processes both your prompt and the transcribed content

Examples

# Meeting transcription and summary
lc "Provide meeting minutes for this recording" --audio meeting_recording.mp3 -m gpt-4o

# Language translation
lc "Translate this Spanish audio to English" --audio spanish_audio.wav

# Content analysis
lc "What are the key points discussed?" --audio podcast_episode.mp3

# Multiple audio analysis
lc "Compare the topics discussed in these recordings" \
  --audio episode1.mp3 \
  --audio episode2.mp3 \
  -m claude-3-opus-20240229

Configuration

Setting Default Audio Provider

# Set default provider for audio commands
lc config set audio-provider openai

# Set default transcription model
lc config set audio-model whisper-1

# Set default TTS model
lc config set tts-model tts-1-hd

# Set default TTS voice
lc config set tts-voice nova

Provider Configuration

Audio endpoints can be configured in provider TOML files:

[provider]
name = "custom-audio"
base_url = "https://api.custom.com/v1"

# Audio transcription endpoint
audio_path = "/audio/transcriptions"

# Text-to-speech endpoint  
speech_path = "/audio/speech"

# Optional: Custom templates for request/response transformation
[audio_templates]
whisper-1 = "custom_whisper_template.tera"

[speech_templates]
tts-1 = "custom_tts_template.tera"

Tips and Best Practices

Audio Quality

Use high-quality audio files for better transcription accuracy
WAV and FLAC formats preserve quality better than compressed formats
For TTS, use tts-1-hd model for production-quality output

Performance

Transcription time depends on audio length and quality
Batch process multiple files when possible
Consider using JSON format for transcriptions if you need timestamps

Language Support

Whisper supports 50+ languages for transcription
Always specify language hint for non-English audio
TTS voices are optimized for English but support multiple languages

Cost Optimization

tts-1 is faster and cheaper than tts-1-hd
Compress audio files before transcription to reduce upload time
Use appropriate audio formats (MP3 for speech, WAV for music)

Error Handling

Common issues and solutions:

# Unsupported format error
# Solution: Convert to supported format first
ffmpeg -i audio.aiff -acodec libmp3lame audio.mp3
lc transcribe audio.mp3

# Large file error
# Solution: Split or compress the audio
ffmpeg -i large_file.wav -t 3600 part1.wav  # First hour
lc transcribe part1.wav

# API rate limits
# Solution: Add delays between requests or batch process
lc transcribe *.mp3 --delay 1

Integration Examples

Podcast Workflow

# Transcribe podcast
lc transcribe podcast.mp3 --format srt --output podcast.srt

# Generate summary
lc "Summarize this podcast transcript" --file podcast.srt

# Create promotional audio
lc tts --file summary.txt --voice nova --output promo.mp3

Meeting Assistant

# Transcribe meeting
lc transcribe meeting.m4a --output meeting.txt

# Extract action items
lc "List all action items from this meeting" --file meeting.txt

# Generate audio summary
lc tts "Meeting summary: $(cat summary.txt)" --output meeting_summary.mp3

Language Learning

# Transcribe foreign language audio
lc transcribe spanish_lesson.mp3 --language es

# Get translation
lc "Translate this to English and explain grammar" --audio spanish_lesson.mp3

# Generate pronunciation guide
lc tts "Hola, ¿cómo estás?" --voice shimmer --speed 0.8

Audio Commands

Audio Transcription​

Basic Usage​

Options​

Supported Audio Formats​

Examples​

Text-to-Speech (TTS)​

Basic Usage​

Options​

Voice Options​

Examples​

Audio Attachments in Chat​

Basic Usage​

How It Works​

Examples​

Configuration​

Setting Default Audio Provider​

Provider Configuration​

Tips and Best Practices​

Audio Quality​

Performance​

Language Support​

Cost Optimization​

Error Handling​

Integration Examples​

Podcast Workflow​

Meeting Assistant​

Language Learning​

Audio Transcription

Basic Usage

Options

Supported Audio Formats

Examples

Text-to-Speech (TTS)

Basic Usage

Options

Voice Options

Examples

Audio Attachments in Chat

Basic Usage

How It Works

Examples

Configuration

Setting Default Audio Provider

Provider Configuration

Tips and Best Practices

Audio Quality

Performance

Language Support

Cost Optimization

Error Handling

Integration Examples

Podcast Workflow

Meeting Assistant

Language Learning