Skip to main content

Audio Commands

The lc tool provides comprehensive audio support, enabling both speech-to-text (transcription) and text-to-speech (TTS) capabilities.

Audio Transcription

Convert audio files to text using the transcribe command.

Basic Usage

# Transcribe a single audio file
lc transcribe audio.wav

# Transcribe multiple audio files
lc transcribe file1.mp3 file2.wav file3.flac

# Use alias
lc tr recording.mp3

Options

  • -m, --model <MODEL> - Specify the transcription model (default: whisper-1)
  • -p, --provider <PROVIDER> - Specify the provider (default: openai)
  • -l, --language <LANG> - Specify the audio language (e.g., en, es, fr)
  • --prompt <TEXT> - Provide context to guide the transcription
  • -f, --format <FORMAT> - Output format: text, json, srt, vtt (default: text)
  • -t, --temperature <TEMP> - Sampling temperature (0-1, default: 0)

Supported Audio Formats

  • MP3, MP4, MPEG, MPGA, M4A
  • WAV
  • WebM
  • OGG
  • FLAC

Examples

# Transcribe with language hint
lc transcribe interview.mp3 --language en

# Get JSON output with timestamps
lc transcribe podcast.wav --format json

# Provide context for better accuracy
lc transcribe medical_recording.mp3 --prompt "Medical consultation discussing symptoms"

# Use a different provider
lc transcribe audio.wav --provider custom-provider

Text-to-Speech (TTS)

Convert text to speech using the tts command.

Basic Usage

# Generate speech from text
lc tts "Hello, this is a test of text to speech"

# Save to a specific file
lc tts "Welcome to our service" --output welcome.mp3

# Read from a file
lc tts --file script.txt --output narration.mp3

Options

  • -m, --model <MODEL> - TTS model: tts-1, tts-1-hd (default: tts-1)
  • -p, --provider <PROVIDER> - Specify the provider (default: openai)
  • -v, --voice <VOICE> - Voice selection: alloy, echo, fable, onyx, nova, shimmer (default: alloy)
  • -o, --output <FILE> - Output file path (default: speech_TIMESTAMP.mp3)
  • -f, --format <FORMAT> - Audio format: mp3, opus, aac, flac, wav, pcm (default: mp3)
  • -s, --speed <SPEED> - Speech speed: 0.25 to 4.0 (default: 1.0)
  • --file <FILE> - Read text from a file instead of command line

Voice Options

  • alloy - Neutral and balanced
  • echo - Warm and conversational
  • fable - Expressive and dynamic
  • onyx - Deep and authoritative
  • nova - Friendly and upbeat
  • shimmer - Soft and pleasant

Examples

# Use HD model for better quality
lc tts "Important announcement" --model tts-1-hd

# Generate with different voice
lc tts "Welcome message" --voice nova --output welcome.mp3

# Adjust speech speed
lc tts "Quick instructions" --speed 1.5

# Generate in different format
lc tts "Audio book chapter" --format flac --output chapter1.flac

# Read from file with specific voice
lc tts --file presentation.txt --voice onyx --output presentation.mp3

Audio Attachments in Chat

You can attach audio files to chat prompts for context-aware conversations.

Basic Usage

# Ask about audio content
lc "What is being discussed in this recording?" --audio meeting.mp3

# Multiple audio files
lc "Summarize these interviews" --audio interview1.mp3 --audio interview2.wav

# Combine with other attachments
lc "Analyze this presentation" --audio narration.mp3 --image slides.png

How It Works

  1. Audio files are automatically transcribed using the Whisper model
  2. Transcriptions are added to the chat context
  3. The LLM processes both your prompt and the transcribed content

Examples

# Meeting transcription and summary
lc "Provide meeting minutes for this recording" --audio meeting_recording.mp3 -m gpt-4o

# Language translation
lc "Translate this Spanish audio to English" --audio spanish_audio.wav

# Content analysis
lc "What are the key points discussed?" --audio podcast_episode.mp3

# Multiple audio analysis
lc "Compare the topics discussed in these recordings" \
--audio episode1.mp3 \
--audio episode2.mp3 \
-m claude-3-opus-20240229

Configuration

Setting Default Audio Provider

# Set default provider for audio commands
lc config set audio-provider openai

# Set default transcription model
lc config set audio-model whisper-1

# Set default TTS model
lc config set tts-model tts-1-hd

# Set default TTS voice
lc config set tts-voice nova

Provider Configuration

Audio endpoints can be configured in provider TOML files:

[provider]
name = "custom-audio"
base_url = "https://api.custom.com/v1"

# Audio transcription endpoint
audio_path = "/audio/transcriptions"

# Text-to-speech endpoint
speech_path = "/audio/speech"

# Optional: Custom templates for request/response transformation
[audio_templates]
whisper-1 = "custom_whisper_template.tera"

[speech_templates]
tts-1 = "custom_tts_template.tera"

Tips and Best Practices

Audio Quality

  • Use high-quality audio files for better transcription accuracy
  • WAV and FLAC formats preserve quality better than compressed formats
  • For TTS, use tts-1-hd model for production-quality output

Performance

  • Transcription time depends on audio length and quality
  • Batch process multiple files when possible
  • Consider using JSON format for transcriptions if you need timestamps

Language Support

  • Whisper supports 50+ languages for transcription
  • Always specify language hint for non-English audio
  • TTS voices are optimized for English but support multiple languages

Cost Optimization

  • tts-1 is faster and cheaper than tts-1-hd
  • Compress audio files before transcription to reduce upload time
  • Use appropriate audio formats (MP3 for speech, WAV for music)

Error Handling

Common issues and solutions:

# Unsupported format error
# Solution: Convert to supported format first
ffmpeg -i audio.aiff -acodec libmp3lame audio.mp3
lc transcribe audio.mp3

# Large file error
# Solution: Split or compress the audio
ffmpeg -i large_file.wav -t 3600 part1.wav # First hour
lc transcribe part1.wav

# API rate limits
# Solution: Add delays between requests or batch process
lc transcribe *.mp3 --delay 1

Integration Examples

Podcast Workflow

# Transcribe podcast
lc transcribe podcast.mp3 --format srt --output podcast.srt

# Generate summary
lc "Summarize this podcast transcript" --file podcast.srt

# Create promotional audio
lc tts --file summary.txt --voice nova --output promo.mp3

Meeting Assistant

# Transcribe meeting
lc transcribe meeting.m4a --output meeting.txt

# Extract action items
lc "List all action items from this meeting" --file meeting.txt

# Generate audio summary
lc tts "Meeting summary: $(cat summary.txt)" --output meeting_summary.mp3

Language Learning

# Transcribe foreign language audio
lc transcribe spanish_lesson.mp3 --language es

# Get translation
lc "Translate this to English and explain grammar" --audio spanish_lesson.mp3

# Generate pronunciation guide
lc tts "Hola, ¿cómo estás?" --voice shimmer --speed 0.8