Create a serverless endpoint - Runpod Documentation

{ "id": "ep_abc123", "name": "my-inference", "workers": { "min": 0, "max": 5 }, "scaling": { "value": 4, "idleTimeout": 5 }, "dataCenterIds": [ "US-TX-3" ], "networkVolumes": [ "vol_abc" ], "timeout": 300000, "createdAt": "2026-03-13T20:00:00Z", "image": "runpod/pytorch:2.8.0-py3.11-cuda12.8.1", "args": "", "disk": 50, "ports": [ "8888/http", "22/tcp" ], "env": { "JUPYTER_PASSWORD": "hunter2" }, "registry": null, "gpu": { "pools": [ "ADA_24" ], "count": 1 }, "cpu": { "id": "cpu5c", "vcpuCount": 4, "memory": 16 } }

Authorizations

Authorization

string

header

required

Bearer authentication header of the form Bearer <token>, where <token> is your auth token.

Body

application/json

Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.

image

string

required

Docker image reference

Example:

"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"

name

string

required

Minimum string length: 1

Example:

"my-inference"

gpu

object

required

Show child attributes

args

string

Arguments passed to the container entrypoint

Example:

""

disk

integer

Container disk in GB (ephemeral, wiped on restart)

Required range: x >= 1

Example:

50

ports

string[]

Exposed ports, formatted as port/protocol

Example:

["8888/http", "22/tcp"]

env

object

Environment variables as key-value pairs

Show child attributes

Example:

{ "JUPYTER_PASSWORD": "hunter2" }

registry

string | null

Container registry credential ID (for private images)

Example:

null

workers

object

Show child attributes

scaling

object

Show child attributes

dataCenterIds

string[]

Preferred data centers for placement. Omit or pass an empty array to let the scheduler choose.

networkVolumes

string[]

timeout

integer

default:300000

flashboot

enum<string>

default:OFF

FlashBoot cold-start acceleration mode.

OFF — disabled
FLASHBOOT — enabled
PRIORITY_FLASHBOOT — enabled with priority capacity

Available options:

OFF,

FLASHBOOT,

PRIORITY_FLASHBOOT

Response

Created

Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.

string

required

Example:

"ep_abc123"

name

string

required

Example:

"my-inference"

workers

object

required

Show child attributes

scaling

object

required

Show child attributes

dataCenterIds

string[]

required

Example:

["US-TX-3"]

networkVolumes

string[]

required

Example:

["vol_abc"]

timeout

integer

required

Per-request execution timeout in milliseconds

Example:

300000

flashboot

enum<string>

required

FlashBoot cold-start acceleration mode.

OFF — disabled
FLASHBOOT — enabled
PRIORITY_FLASHBOOT — enabled with priority capacity

Available options:

OFF,

FLASHBOOT,

PRIORITY_FLASHBOOT

createdAt

string<date-time>

required

Example:

"2026-03-13T20:00:00Z"

image

string

Docker image reference

Example:

"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"

args

string

Arguments passed to the container entrypoint

Example:

""

disk

integer

Container disk in GB (ephemeral, wiped on restart)

Required range: x >= 1

Example:

50

ports

string[]

Exposed ports, formatted as port/protocol

Example:

["8888/http", "22/tcp"]

env

object

Environment variables as key-value pairs

Show child attributes

Example:

{ "JUPYTER_PASSWORD": "hunter2" }

registry

string | null

Container registry credential ID (for private images)

Example:

null

gpu

object | null

Show child attributes

cpu

object | null

Read-only. Present for CPU serverless endpoints; CPU create/update is not yet supported.

Show child attributes