Transform Your Content
Into Anything

Pick a target — any audio, video, or document file to convert to text and combine with prompts to generate text, audio, image, music, and video assets.

Get Started View Show Notes

Generated content package with written, audio, image, video, and music assets

Podcast.mp4
Deck.pdf
Product page
Brand notes

Show Notes PackReady

Launch Kit: AI product walkthrough

A concise buyer-facing recap with chapters, highlights, narration, and channel-specific follow-up assets.

00:00 Opening problem
04:18 Feature walkthrough
09:42 Launch takeaways

Voiceover mix02:14 narration, normalized and attached

Thumbnail1280 x 720 cover artReady
Short Clip30s product recapRendering
VoiceoverNarrated summaryReady
Social Posts5 launch captionsDrafted
Theme Music18s intro bedQueued
FAQ8 buyer questionsReady

Blog
YouTube
Newsletter
LMS

Choose a Target

Your target is processed into text through transcription or document extraction. The text is then combined with prompts to generate text, audio, image, music, and video outputs.

Audio / Video
Upload MP3/MP4 files from your computer, paste podcast feed or episode URLs, use direct public MP3/MP4 URLs, or paste approved streaming URLs from YouTube, Vimeo, Twitch, Dailymotion, TikTok, Instagram, Facebook, X/Twitter, or SoundCloud.
Document
Upload PDF, EPUB, PPTX, DOCX, XLSX, TXT, PNG, JPEG, or TIFF files, or scrape URL/HTML pages. Extract with 7 services: Local Text Extraction, Mistral OCR, GLM-OCR, OpenAI, Google Gemini, Grok, or Kimi.

AI-Powered Media Generation

Go beyond transcription. Generate narrated audio, cover images, original music, and video clips from your content using the latest generative AI models.

Text-to-Speech
Convert summaries to narrated audio using 6 text-to-speech services: OpenAI TTS, ElevenLabs TTS, Deepgram Aura TTS, Gemini TTS, Grok TTS, or Groq Orpheus TTS. Choose from multiple voices and output formats.
AI Image Generation
Create cover art, thumbnails, and promotional images with 3 image services: ChatGPT Image, Gemini Image, or Grok Image. Generate up to 6 images per job with customizable dimensions and aspect ratios.
Music Generation
Generate original theme music with AI-written lyrics. Choose from 7 genres: Pop, Rock, Rap, Country, Folk, Jazz, Electronic. Powered by Eleven Music, MiniMax Music, or Gemini Music.
Video Generation
Create up to 2 explainer clips, highlights, intros, outros, and social media videos per job with 2 video services: Gemini Veo or Grok Video. All prompts include safety filtering.

4-Step Processing Pipeline

Content flows through a configurable pipeline. Each step can be customized with different providers and models. Optional steps are skipped if not enabled.

01
Resolve Sources
Uploaded files, direct URLs, source search results, and approved streaming links are fetched and normalized before text extraction.
- Audio / VideoPodcast feed/episode URLs, YouTube/Twitch/TikTok URLs, MP3/MP4 files from your computer, and direct public MP3/MP4 URLsPodcastYouTubeTwitchTikTokMP3MP4
- Documents / ImagesPDF, EPUB, PPTX, DOCX, PNG, JPEG, and TIFF files are normalized before extraction.PDFEPUBPPTXDOCXPNGJPEGTIFF
- URLsURL/HTML pages, feeds, and supported source links are resolved for scrapingHTMLFeedPage
02
Extract Text
Every source becomes reusable text through the matching extraction path.
- Audio / Video to TranscribePodcast feed/episode URLs, YouTube/Twitch/TikTok links, MP3/MP4 uploads, and direct public media URLs
- Documents / Images to Parse DocumentsPDF, EPUB, PPTX, DOCX, PNG, JPEG, TIFF, and text extraction
- URLs to ScrapeURL/HTML pages, feeds, and crawlable links
03
Write Text + Narration
Dynamic prompts are assembled, the LLM generates structured outputs, and narration can be synthesized in the same stage.
- Written OutputsStructured LLM content assembled from prompts and source textSummariesChaptersFAQSocialMarketingLearning
- Text-to-SpeechOptional narrated audio synthesized from the generated text
04
Generate Media
Optional downstream media turns the generated text into visual and audio assets.
- ImageCover art, thumbnails, and visual assets
- VideoExplainers, highlights, intros, outros, and social clips
- MusicOriginal theme music with optional AI-written lyrics

Technical Specifications

Built for reliability and scale with enterprise-grade infrastructure.

Transcription Services
12 services: Groq Whisper, DeepInfra Whisper, AssemblyAI, Deepgram Nova 3, Grok STT, Soniox, Mistral Voxtral Mini, ScrapeCreators, Supadata, HappyScribe, deAPI, or Gladia.
LLM Providers
5 providers: OpenAI, Claude, Google Gemini, Grok, and Groq.
Text-to-Speech Services
6 services: OpenAI TTS, ElevenLabs TTS, Deepgram Aura TTS, Gemini TTS, Grok TTS, or Groq Orpheus TTS.
Image Generation
3 services: ChatGPT Image, Gemini Image, or Grok Image. Supports square, landscape, portrait, and provider-specific aspect ratios.
Music Generation
Eleven Music (music_v1), MiniMax Music (music-2.6), Gemini Music (lyria-3-pro-preview). 7 genres available: Pop, Rock, Rap, Country, Folk, Jazz, and Electronic.
Video Generation
2 services: Gemini Veo or Grok Video. Durations, resolutions, and aspect ratios vary by provider.
Cloud Storage
Generated files are saved under the configured artifact output root. When S3-compatible storage is configured, uploads and generated media can also use bucket-backed URLs.
Document Processing
7 services: Local Text Extraction, Mistral OCR, GLM-OCR, OpenAI, Google Gemini, Grok, or Kimi. Supports PDF, EPUB, DOCX, PPTX, XLSX, PNG, JPEG, TIFF, and TXT uploads, plus URL/HTML page scraping.

Frequently Asked Questions

What input formats are supported?

AutoShow accepts podcast feed and episode URLs; approved streaming URLs for YouTube, Vimeo, Twitch, Dailymotion, TikTok, Instagram, Facebook, X/Twitter, and SoundCloud; direct public MP3/MP4 media URLs; URL/HTML pages; uploaded audio and video files from your computer; and uploaded document and image files. Supported uploads include audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA, MPEG/MPGA), video files (MP4, MOV, AVI, MKV, WEBM, WMV, FLV, M4V), documents (PDF, EPUB, DOCX, PPTX, XLSX, TXT), and images (PNG, JPG/JPEG, TIFF/TIF).

Which transcription service should I use?

For speaker identification, use AssemblyAI, Deepgram Nova 3, Grok STT, Mistral Voxtral Mini, or Soniox. For fast local or direct media without speaker labels, use Groq Whisper or DeepInfra Whisper. For streaming URLs, use ScrapeCreators, Supadata, HappyScribe, deAPI, or Gladia.

How long can my content be?

Uploads and remote sources are subject to configured size limits: uploads default to 1 GB, and remote documents default to 300 MB. Long audio is segmented when needed, with 10-minute chunks for non-diarized local/direct Whisper transcription and 30-minute fallback chunks for other long audio.

What LLM providers are available?

5 LLM providers are configured: OpenAI, Claude, Google Gemini, Grok, and Groq. Structured JSON output uses retry and fallback logic when a selected provider fails.

How does video generation work?

First, an LLM generates a detailed scene description based on your content. Then, the scene is rendered using Gemini Veo or Grok Video. Video types include explainer, highlight, intro, outro, and social clips.

Where are generated files stored?

Generated files are saved under the configured artifact output root. If S3 storage is configured with Railway Storage Buckets or another S3-compatible service, uploads and generated media can also be stored in the bucket with URL-backed access.

What music genres are available?

AutoShow supports 7 genres: Pop, Rock, Rap, Country, Folk, Jazz, and Electronic. An LLM first writes original, copyright-safe lyrics tailored to your content, then the music is composed.

Start Processing Today

Transform your content with AI transcription, summarization, and generation.

Usage based pricing - No subscriptions or hidden fees

Transform Your ContentInto Anything

Audio / Video

Document

Text-to-Speech

AI Image Generation

Music Generation

Video Generation

Transcription Services

LLM Providers

Text-to-Speech Services

Image Generation

Music Generation

Video Generation

Cloud Storage

Document Processing

Start Processing Today

Transform Your Content
Into Anything