Inference Embeddings Rerank Images Speech Guard
DE ES IT

AI Speech-to-Text &
Text-to-Speech,
built for Europe.

German HQ 100% EU data residency

Transcribe audio and generate speech on European GPUs.
Open-source models with custom voice cloning.
Your voice data never leaves the EU.

Create free account 5 min/month free
// models + pricing

Speech Models

We run the Qwen3 ASR and TTS model families for speech recognition and synthesis. Multilingual, open weights, and optimized for production workloads. Custom voice cloning included.

All models run on modern Blackwell or newer chips for ideal performance. Free tier included on all models.


Speech-to-Text
Qwen3-ASR-0.6B
Fast, lightweight transcription. Ideal for high-throughput or real-time workloads.
0,006 € / minute Coming soon
Parameters0.6B
TaskSpeech-to-Text
LanguagesMultilingual
Pricing0,006 € / min
Qwen3-ASR-1.7B
Higher accuracy for complex audio. Best for meetings, calls, and noisy environments.
0,01 € / minute Coming soon
Parameters1.7B
TaskSpeech-to-Text
LanguagesMultilingual
Pricing0,01 € / min

Text-to-Speech
Qwen3-TTS-0.6B
Fast speech synthesis with natural intonation. Great for notifications and short content.
0,015 € / 1K characters Coming soon
Parameters0.6B
TaskText-to-Speech
Custom VoiceYes
Pricing0,015 € / 1K chars
Qwen3-TTS-1.7B
Premium quality synthesis. Expressive, natural speech for audiobooks, assistants, and customer-facing content.
0,020 € / 1K characters Coming soon
Parameters1.7B
TaskText-to-Speech
Custom VoiceYes
Pricing0,020 € / 1K chars
Free tier
5 min of transcription & synthesis/month No credit card
Custom Voice Cloning
Create a synthetic voice that sounds like a specific speaker. Provide a short audio reference and the TTS model will generate new speech in that voice. Ideal for brand voices, virtual assistants, or personalized content.
Works with both TTS models. No fine-tuning required. Included at no extra cost in every TTS API call.
  1. 1. Upload a short audio sample (10+ seconds recommended)
  2. 2. Reference the voice in your TTS API calls
  3. 3. Generate speech in that voice from any text
All voice data stays on EU infrastructure. No voice data is stored after processing unless you explicitly create a saved voice profile.
// what you can build

Use Cases

Speech APIs enable a wide range of applications. From transcription pipelines to voice-enabled products.

Meeting & Call Transcription
Transcribe meetings, calls, and interviews in real time or from recordings. Multilingual support for European teams working across languages.
Voice Assistants & Chatbots
Combine speech-to-text and text-to-speech for fully voice-enabled AI assistants. Process user speech, generate a response, and speak it back.
Content Narration
Turn articles, documentation, or e-learning content into natural-sounding audio. Use custom voices for consistent brand identity across all content.
Accessibility
Make your application accessible with text-to-speech for visually impaired users and speech-to-text for hearing impaired users. GDPR-compliant by default.
// for teams that need more
Need more? The Business Plan covers all Nodion.ai products: Inference, Embeddings, Images, Speech, and more. 500 €/month, dedicated GPU capacity, 99.5% SLA.
View Business Plan →
// getting started

API Documentation

The Speech API follows the OpenAI Audio API format. Use the same endpoints and SDKs you already know.

# Base URL
https://api.nodion.ai/v1

Speech-to-Text

# Transcribe audio
curl https://api.nodion.ai/v1/audio/transcriptions \
  -H "Authorization: Bearer $NODION_API_KEY" \
  -F file=@meeting.mp3 \
  -F model=qwen/qwen3-asr-1.7b

Text-to-Speech

# Generate speech
curl https://api.nodion.ai/v1/audio/speech \
  -H "Authorization: Bearer $NODION_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3-tts-1.7b",
    "input": "Welcome to Nodion, your European AI platform."
  }' \
  --output speech.mp3

Supports: /v1/audio/transcriptions, /v1/audio/speech. Custom voice via the voice parameter. Multiple audio formats.

// why this matters
GDPR-native. Voice data is biometric data under GDPR. Our infrastructure ensures it never leaves the EU. No transatlantic transfers. No adequacy decision risks.
Nordic green energy. GPU clusters in Sweden and Finland run on renewable energy. Cold climate means natural cooling, lower energy waste, smaller footprint.
No US dependency. German company. EU servers. Open-source models. Full stack sovereignty without hyperscaler lock-in.
Open-source only. Every model we serve is fully open. You can inspect the weights, understand the architecture, audit the outputs.
OpenAI-compatible API. Drop-in replacement. Change your base URL and you're processing speech on sovereign European infrastructure.

Ready to start?

5 minutes of transcription and synthesis per month. No credit card required.

Create free account