Create a serverless endpoint
Creates a serverless endpoint. Specify gpu for compute (CPU serverless
endpoints are read-only). ContainerConfig fields can be spread from a
template response.
Authorizations
Bearer authentication header of the form Bearer <token>, where <token> is your auth token.
Body
Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.
Docker image reference
"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"
1"my-inference"
Arguments passed to the container entrypoint
""
Container disk in GB (ephemeral, wiped on restart)
x >= 150
Exposed ports, formatted as port/protocol
["8888/http", "22/tcp"]
Environment variables as key-value pairs
{ "JUPYTER_PASSWORD": "hunter2" }
Container registry credential ID (for private images)
null
Preferred data centers for placement. Omit or pass an empty array to let the scheduler choose.
FlashBoot cold-start acceleration mode.
OFF— disabledFLASHBOOT— enabledPRIORITY_FLASHBOOT— enabled with priority capacity
OFF, FLASHBOOT, PRIORITY_FLASHBOOT Response
Created
Reusable container configuration shared across templates, pods, and serverless endpoints. Adding a field here automatically propagates to all three resources.
"ep_abc123"
"my-inference"
["US-TX-3"]
["vol_abc"]
Per-request execution timeout in milliseconds
300000
FlashBoot cold-start acceleration mode.
OFF— disabledFLASHBOOT— enabledPRIORITY_FLASHBOOT— enabled with priority capacity
OFF, FLASHBOOT, PRIORITY_FLASHBOOT "2026-03-13T20:00:00Z"
Docker image reference
"runpod/pytorch:2.8.0-py3.11-cuda12.8.1"
Arguments passed to the container entrypoint
""
Container disk in GB (ephemeral, wiped on restart)
x >= 150
Exposed ports, formatted as port/protocol
["8888/http", "22/tcp"]
Environment variables as key-value pairs
{ "JUPYTER_PASSWORD": "hunter2" }
Container registry credential ID (for private images)
null
Read-only. Present for CPU serverless endpoints; CPU create/update is not yet supported.