Skip to main content
POST
/
v2
/
serverless
Error
A valid request URL is required to generate request examples
{
  "id": "ep_abc123",
  "name": "my-inference",
  "workers": {
    "min": 0,
    "max": 5
  },
  "scaling": {
    "value": 4,
    "idleTimeout": 5
  },
  "dataCenterIds": [
    "US-TX-3"
  ],
  "networkVolumes": [
    "vol_abc"
  ],
  "timeout": 300000,
  "createdAt": "2026-03-13T20:00:00Z",
  "image": "runpod/pytorch:2.8.0-py3.11-cuda12.8.1",
  "args": "",
  "disk": 50,
  "ports": [
    "8888/http",
    "22/tcp"
  ],
  "env": {
    "JUPYTER_PASSWORD": "hunter2"
  },
  "registry": null,
  "gpu": {
    "pools": [
      "ADA_24"
    ],
    "count": 1
  },
  "cpu": {
    "id": "cpu5c",
    "vcpuCount": 4,
    "memory": 16
  }
}

Authorizations

Authorization
string
header
required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.

image
string
required

Docker image reference

Example:

"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"

name
string
required
Minimum string length: 1
Example:

"my-inference"

gpu
object
required
args
string

Arguments passed to the container entrypoint

Example:

""

disk
integer

Container disk in GB (ephemeral, wiped on restart)

Required range: x >= 1
Example:

50

ports
string[]

Exposed ports, formatted as port/protocol

Example:
["8888/http", "22/tcp"]
env
object

Environment variables as key-value pairs

Example:
{ "JUPYTER_PASSWORD": "hunter2" }
registry
string | null

Container registry credential ID (for private images)

Example:

null

workers
object
scaling
object
dataCenterIds
string[]

Preferred data centers for placement. Omit or pass an empty array to let the scheduler choose.

networkVolumes
string[]
timeout
integer
default:300000
flashboot
enum<string>
default:OFF

FlashBoot cold-start acceleration mode.

  • OFF — disabled
  • FLASHBOOT — enabled
  • PRIORITY_FLASHBOOT — enabled with priority capacity
Available options:
OFF,
FLASHBOOT,
PRIORITY_FLASHBOOT

Response

Created

Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.

id
string
required
Example:

"ep_abc123"

name
string
required
Example:

"my-inference"

workers
object
required
scaling
object
required
dataCenterIds
string[]
required
Example:
["US-TX-3"]
networkVolumes
string[]
required
Example:
["vol_abc"]
timeout
integer
required

Per-request execution timeout in milliseconds

Example:

300000

flashboot
enum<string>
required

FlashBoot cold-start acceleration mode.

  • OFF — disabled
  • FLASHBOOT — enabled
  • PRIORITY_FLASHBOOT — enabled with priority capacity
Available options:
OFF,
FLASHBOOT,
PRIORITY_FLASHBOOT
createdAt
string<date-time>
required
Example:

"2026-03-13T20:00:00Z"

image
string

Docker image reference

Example:

"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"

args
string

Arguments passed to the container entrypoint

Example:

""

disk
integer

Container disk in GB (ephemeral, wiped on restart)

Required range: x >= 1
Example:

50

ports
string[]

Exposed ports, formatted as port/protocol

Example:
["8888/http", "22/tcp"]
env
object

Environment variables as key-value pairs

Example:
{ "JUPYTER_PASSWORD": "hunter2" }
registry
string | null

Container registry credential ID (for private images)

Example:

null

gpu
object | null
cpu
object | null

Read-only. Present for CPU serverless endpoints; CPU create/update is not yet supported.