Transform Your Content
Into Anything

Pick a target — any audio, video, or document file to convert to text and combine with prompts to generate text, audio, image, music, and video assets.

AutoShow runtime pipeline
  1. 01Supported Inputs

    Start with common audio, video, document, or image files.

    • Audio / VideoPodcastYouTubeTwitchTikTokMP3MP4
    • Documents / ImagesPDFEPUBPPTXDOCXPNGJPEGTIFF
    • URLsHTMLFeedPage
  2. 02Extract Text

    Turn the original source into text you can use.

    • TranscribeAudio/video with timestamps
    • Parse DocumentsPDF, Office files, images, text
    • Scrape URLsWeb pages and crawlable links
  3. 03Write Text + Narration

    Create useful written content and optional spoken narration from the text.

    • Summaries
    • Chapters
    • FAQ
    • Social
    • Marketing
    • Learning
    Text-to-speech narrationOptional spoken audio output
  4. 04Optional Media Assets

    Use the same text to create images, videos, or music.

    • Imagecovers + thumbnails
    • Videoclips + explainers
    • Musicthemes + lyrics

Choose a Target

Your target is processed into text through transcription or document extraction. The text is then combined with prompts to generate text, audio, image, music, and video outputs.

  • Audio / Video

    Upload MP3/MP4 files from your computer, import from Google Drive, paste podcast feed or episode URLs, use direct public MP3/MP4 URLs, or paste approved streaming URLs from YouTube, Vimeo, Twitch, Dailymotion, TikTok, Instagram, Facebook, X/Twitter, or SoundCloud.

  • Document

    Upload PDF, EPUB, PPTX, DOCX, XLSX, TXT, PNG, JPEG, or TIFF files, import from Google Drive, or scrape URL/HTML pages. Extract with 7 services: Local EPUB Text, Mistral OCR, GLM-OCR, OpenAI, Google Gemini, Grok, or Kimi.

AI-Powered Media Generation

Go beyond transcription. Generate narrated audio, cover images, original music, and video clips from your content using the latest generative AI models.

  • Text-to-Speech

    Convert summaries to narrated audio using 6 text-to-speech services: OpenAI TTS, ElevenLabs TTS, Deepgram Aura TTS, Gemini TTS, Grok TTS, or Groq Orpheus TTS. Choose from multiple voices and output formats.

  • AI Image Generation

    Create cover art, thumbnails, and promotional images with 3 image services: ChatGPT Image, Gemini Image, or Grok Image. Generate 1-3 images per job with customizable dimensions and aspect ratios.

  • Music Generation

    Generate original theme music with AI-written lyrics. Choose from 7 genres: Pop, Rock, Rap, Country, Folk, Jazz, Electronic. Powered by Eleven Music, MiniMax Music, or Gemini Music.

  • Video Generation

    Create explainer clips, highlights, intros, outros, and social media videos with 2 video services: Gemini Veo or Grok Video. All prompts include safety filtering.

4-Step Processing Pipeline

Content flows through a configurable pipeline. Each step can be customized with different providers and models. Optional steps are skipped if not enabled.

  1. 01

    Resolve Sources

    Uploaded files, Google Drive imports, direct URLs, and approved streaming links are fetched and normalized before text extraction.

    • Audio / VideoPodcast feed/episode URLs, YouTube/Twitch/TikTok URLs, MP3/MP4 files from your computer, and direct public MP3/MP4 URLsPodcastYouTubeTwitchTikTokMP3MP4
    • Documents / ImagesPDF, EPUB, PPTX, DOCX, PNG, JPEG, and TIFF files are normalized before extraction.PDFEPUBPPTXDOCXPNGJPEGTIFF
    • URLsURL/HTML pages, feeds, and supported source links are resolved for scrapingHTMLFeedPage
  2. 02

    Extract Text

    Every source becomes reusable text through the matching extraction path.

    • Audio / Video to TranscribePodcast feed/episode URLs, YouTube/Twitch/TikTok links, MP3/MP4 uploads, and direct public media URLs
    • Documents / Images to Parse DocumentsPDF, EPUB, PPTX, DOCX, PNG, JPEG, TIFF, and text extraction
    • URLs to ScrapeURL/HTML pages, feeds, and crawlable links
  3. 03

    Write Text + Narration

    Dynamic prompts are assembled, the LLM generates structured outputs, and narration can be synthesized in the same stage.

    • Written OutputsStructured LLM content assembled from prompts and source textSummariesChaptersFAQSocialMarketingLearning
    • Text-to-SpeechOptional narrated audio synthesized from the generated text
  4. 04

    Generate Media

    Optional downstream media turns the generated text into visual and audio assets.

    • ImageCover art, thumbnails, and visual assets
    • VideoExplainers, highlights, intros, outros, and social clips
    • MusicOriginal theme music with optional AI-written lyrics

Technical Specifications

Built for reliability and scale with enterprise-grade infrastructure.

  • Transcription Services

    12 services: Groq Whisper, DeepInfra Whisper, Gladia, AssemblyAI, Deepgram Nova 3, Grok STT, Soniox, Mistral Voxtral Mini, YouTube Captions, ScrapeCreators, deAPI, or Supadata.

  • LLM Providers

    5 providers: OpenAI, Claude, Google Gemini, Grok, and Groq.

  • Text-to-Speech Services

    6 services: OpenAI TTS, ElevenLabs TTS, Deepgram Aura TTS, Gemini TTS, Grok TTS, or Groq Orpheus TTS.

  • Image Generation

    3 services: ChatGPT Image, Gemini Image, or Grok Image. Supports square, landscape, portrait, and provider-specific aspect ratios.

  • Music Generation

    Eleven Music (music_v1), MiniMax Music (music-2.6), Gemini Music (lyria-3-pro-preview). 7 genres available: Pop, Rock, Rap, Country, Folk, Jazz, and Electronic.

  • Video Generation

    2 services: Gemini Veo or Grok Video. Durations, resolutions, and aspect ratios vary by provider.

  • Cloud Storage

    Generated files are saved under the configured artifact output root. When S3-compatible storage is configured, uploads and generated media can also use bucket-backed URLs.

  • Document Processing

    7 services: Local EPUB Text, Mistral OCR, GLM-OCR, OpenAI, Google Gemini, Grok, or Kimi. Supports PDF, EPUB, DOCX, PPTX, XLSX, PNG, JPEG, TIFF, and TXT uploads, plus URL/HTML page scraping.

Frequently Asked Questions

What input formats are supported?

AutoShow accepts podcast feed and episode URLs; approved streaming URLs for YouTube, Vimeo, Twitch, Dailymotion, TikTok, Instagram, Facebook, X/Twitter, and SoundCloud; direct public MP3/MP4 media URLs; URL/HTML pages; uploaded audio and video files from your computer; uploaded document and image files; and Google Drive imports when configured. Supported uploads include audio files (MP3, WAV, M4A, FLAC, OGG, AAC, WMA, MPEG/MPGA), video files (MP4, MOV, AVI, MKV, WEBM, WMV, FLV, M4V), documents (PDF, EPUB, DOCX, PPTX, XLSX, TXT), and images (PNG, JPG/JPEG, TIFF/TIF).

Which transcription service should I use?

For speaker identification, use Gladia, AssemblyAI, Deepgram Nova 3, Grok STT, Mistral Voxtral Mini, or Soniox. For fast local or direct media without speaker labels, use Groq Whisper or DeepInfra Whisper. For streaming URLs, use YouTube Captions when available, or ScrapeCreators, Gladia, deAPI, or Supadata.

How long can my content be?

Uploads and remote sources are subject to configured size limits: uploads default to 1 GB, and remote documents default to 300 MB. Long audio is segmented when needed, with 10-minute chunks for non-diarized local/direct Whisper transcription and 30-minute fallback chunks for other long audio.

What LLM providers are available?

5 LLM providers are configured: OpenAI, Claude, Google Gemini, Grok, and Groq. Structured JSON output uses retry and fallback logic when a selected provider fails.

How does video generation work?

First, an LLM generates a detailed scene description based on your content. Then, the scene is rendered using Gemini Veo or Grok Video. Video types include explainer, highlight, intro, outro, and social clips.

Where are generated files stored?

Generated files are saved under the configured artifact output root. If S3 storage is configured with Railway Storage Buckets or another S3-compatible service, uploads and generated media can also be stored in the bucket with URL-backed access.

What music genres are available?

AutoShow supports 7 genres: Pop, Rock, Rap, Country, Folk, Jazz, and Electronic. An LLM first writes original, copyright-safe lyrics tailored to your content, then the music is composed.

Start Processing Today

Transform your content with AI transcription, summarization, and generation.

Usage based pricing - No subscriptions or hidden fees