Skip to content

GPU Compute

Stackpad lets you run AI and ML workloads on European GPU infrastructure. Add a GPU service from the service catalog, pick a model, and it’s running — no CUDA setup, no drivers, no DevOps.

Available GPU templates

TemplateUse caseExample models
vLLM InferenceRun any HuggingFace language modelLlama, Mistral, Qwen, Phi
Text EmbeddingsGenerate vector embeddings for RAG and searchBGE, E5, GTE
WhisperSpeech-to-text transcriptionOpenAI Whisper

Adding a GPU service

  1. Open your project and click Add Service
  2. Search for the GPU template you want (e.g. “vLLM”)
  3. Configure the model and GPU type
  4. Click Add

Stackpad deploys the GPU service and automatically wires it to your other services. Your web app gets the inference URL and API key injected as environment variables.

Connecting from your application

When you add a GPU service, Stackpad auto-injects connection details into your web services:

// Call your vLLM inference endpoint
const response = await fetch(process.env.INFERENCE_URL + '/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.INFERENCE_API_KEY}`,
},
body: JSON.stringify({
model: 'your-model-name',
messages: [{ role: 'user', content: 'Hello!' }],
}),
});
const data = await response.json();

The API is OpenAI-compatible, so you can use the OpenAI SDK:

import OpenAI from 'openai';
const client = new OpenAI({
baseURL: process.env.INFERENCE_URL + '/v1',
apiKey: process.env.INFERENCE_API_KEY,
});
const completion = await client.chat.completions.create({
model: 'your-model-name',
messages: [{ role: 'user', content: 'Hello!' }],
});

Text embeddings

const response = await fetch(process.env.EMBEDDINGS_URL + '/embed', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
inputs: 'Text to embed',
}),
});
const embeddings = await response.json();

Whisper (speech-to-text)

const formData = new FormData();
formData.append('file', audioFile);
const response = await fetch(process.env.WHISPER_URL + '/v1/audio/transcriptions', {
method: 'POST',
body: formData,
});
const transcription = await response.json();

Pricing

GPU services are billed pay-per-hour — you only pay while the GPU is running. The cost is displayed upfront in the service catalog when you select a GPU type.

There are no idle charges when your GPU service is paused.

Managing GPU services

Pause and resume

GPU services can be paused to stop billing when not in use:

  • Pause — stops the GPU container, stops billing
  • Resume — restarts the GPU container, billing resumes

Manage this from the service detail page.

Live progress

GPU deployments show live progress logs in the dashboard. Large model images can take several minutes to pull — the progress indicator shows exactly where things are.

Infrastructure

All GPU workloads run on European infrastructure via Verda serverless containers. Your models and data stay in the EU.

What’s next?