Deploy Next.js on ECS / EKS (Containers)
Deploy a Next.js 15 App Router application as a Docker container on AWS ECS (Fargate) or Kubernetes (EKS). This guide walks through containerizing your app with output: "standalone", pushing to ECR, orchestrating with ECS or EKS, auto-scaling, and solving the problems Vercel handled invisibly -- ISR cache sharing, image optimization, preview deployments, and zero-downtime rollouts.
Recipe
Quick-reference recipe card -- copy-paste ready.
Dockerfile (production multi-stage):
FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV HOSTNAME="0.0.0.0"
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
USER node
CMD ["node", "server.js"]Deploy to ECS (Fargate):
docker build -t myapp .
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker tag myapp:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
aws ecs update-service --cluster my-cluster --service my-service --force-new-deploymentDeploy to EKS (Kubernetes):
docker build -t myapp .
docker tag myapp:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
kubectl set image deployment/myapp myapp=123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latestWhen to reach for this: Your team needs AWS-native infrastructure, compliance requires self-hosting, you need fine-grained control over networking and scaling, or you are already running ECS/EKS for other services.
Working Example
A complete, production-ready deployment walkthrough from Dockerfile to running service.
1. The Production Dockerfile
Multi-stage build with three stages: deps installs only production dependencies, builder compiles the Next.js app, and runner copies just the standalone output into a minimal image.
Prerequisite -- set output: "standalone" in your Next.js config:
// next.config.ts
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
output: "standalone",
};
export default nextConfig;Complete Dockerfile:
# -----------------------------------------------------------
# Stage 1: deps -- install production dependencies only
# -----------------------------------------------------------
FROM node:20-alpine AS deps
RUN apk add --no-cache libc6-compat
WORKDIR /app
# Copy lockfile first so this layer is cached unless deps change
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
# -----------------------------------------------------------
# Stage 2: builder -- build the Next.js application
# -----------------------------------------------------------
FROM node:20-alpine AS builder
WORKDIR /app
# Copy ALL node_modules (including devDependencies) for the build
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
# Build-time env vars (NEXT_PUBLIC_*) are baked in here
# ARG NEXT_PUBLIC_API_URL
# ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
RUN npm run build
# -----------------------------------------------------------
# Stage 3: runner -- minimal production image
# -----------------------------------------------------------
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
# CRITICAL: standalone binds to localhost by default.
# Containers must bind to 0.0.0.0 to accept traffic from
# the Docker network / ALB / Kubernetes service.
ENV HOSTNAME="0.0.0.0"
ENV PORT=3000
# Install sharp for next/image optimization in production
RUN npm install --prefix /app sharp
# Don't run as root
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
# Copy the standalone server and static assets
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
# Set correct ownership
RUN chown -R nextjs:nodejs /app
USER nextjs
EXPOSE 3000
# standalone output produces server.js -- this replaces `next start`
CMD ["node", "server.js"].dockerignore -- keep the build context small:
.git
node_modules
.next
.env*
*.md
.github
.vscode
coverage2. Build and Push to ECR
Create an ECR repository (one-time), then build and push your image:
# Create ECR repository (one-time)
aws ecr create-repository \
--repository-name myapp \
--region us-east-1
# Build the Docker image
docker build -t myapp .
# Authenticate Docker with ECR
aws ecr get-login-password --region us-east-1 \
| docker login --username AWS --password-stdin \
123456789.dkr.ecr.us-east-1.amazonaws.com
# Tag for ECR
docker tag myapp:latest \
123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
# Push to ECR
docker push \
123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest3. ECS Fargate Deployment
Task Definition
{
"family": "myapp",
"networkMode": "awsvpc",
"requiresCompatibilities": ["FARGATE"],
"cpu": "512",
"memory": "1024",
"executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
"taskRoleArn": "arn:aws:iam::123456789:role/ecsTaskRole",
"containerDefinitions": [
{
"name": "myapp",
"image": "123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest",
"portMappings": [
{
"containerPort": 3000,
"protocol": "tcp"
}
],
"environment": [
{ "name": "NODE_ENV", "value": "production" },
{ "name": "APP_VERSION", "value": "1.0.0" },
{ "name": "NODE_OPTIONS", "value": "--max-old-space-size=768" }
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:myapp/DATABASE_URL"
}
],
"healthCheck": {
"command": ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"],
"interval": 30,
"timeout": 5,
"retries": 3,
"startPeriod": 60
},
"logConfiguration": {
"logDriver": "awslogs",
"options": {
"awslogs-group": "/ecs/myapp",
"awslogs-region": "us-east-1",
"awslogs-stream-prefix": "ecs"
}
},
"essential": true
}
]
}Service Configuration
# Create the ECS service with ALB
aws ecs create-service \
--cluster my-cluster \
--service-name myapp \
--task-definition myapp:1 \
--desired-count 2 \
--launch-type FARGATE \
--network-configuration "awsvpcConfiguration={
subnets=[subnet-abc123,subnet-def456],
securityGroups=[sg-abc123],
assignPublicIp=ENABLED
}" \
--load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123,containerName=myapp,containerPort=3000" \
--deployment-configuration "minimumHealthyPercent=100,maximumPercent=200"Auto-Scaling Policy
# Register scalable target
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--resource-id service/my-cluster/myapp \
--scalable-dimension ecs:service:DesiredCount \
--min-capacity 2 \
--max-capacity 10
# Target tracking on CPU utilization
aws application-autoscaling put-scaling-policy \
--service-namespace ecs \
--resource-id service/my-cluster/myapp \
--scalable-dimension ecs:service:DesiredCount \
--policy-name cpu-tracking \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ECSServiceAverageCPUUtilization"
},
"ScaleOutCooldown": 60,
"ScaleInCooldown": 120
}'ALB Configuration
Configure the Application Load Balancer target group health check:
# Configure ALB target group health check
aws elbv2 modify-target-group \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123 \
--health-check-path /api/health \
--health-check-interval-seconds 30 \
--healthy-threshold-count 2 \
--unhealthy-threshold-count 3 \
--health-check-timeout-seconds 5
# Enable stickiness (useful for ISR if not using shared cache)
aws elbv2 modify-target-group-attributes \
--target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123 \
--attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=lb_cookie Key=stickiness.lb_cookie.duration_seconds,Value=36004. EKS / Kubernetes Deployment
Deployment Manifest
# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
labels:
app: myapp
spec:
replicas: 2
selector:
matchLabels:
app: myapp
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0 # Zero-downtime: never remove a pod before a new one is ready
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
ports:
- containerPort: 3000
env:
- name: NODE_ENV
value: "production"
- name: NODE_OPTIONS
value: "--max-old-space-size=768"
envFrom:
- configMapRef:
name: myapp-config
- secretRef:
name: myapp-secrets
resources:
requests:
cpu: "250m"
memory: "512Mi"
limits:
cpu: "1000m"
memory: "1024Mi"
readinessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 10
periodSeconds: 10
failureThreshold: 3
livenessProbe:
httpGet:
path: /api/health
port: 3000
initialDelaySeconds: 30
periodSeconds: 30
failureThreshold: 3
terminationGracePeriodSeconds: 30Service Manifest
# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
name: myapp
spec:
type: ClusterIP
selector:
app: myapp
ports:
- port: 80
targetPort: 3000
protocol: TCPIngress Manifest (AWS ALB Ingress Controller)
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: myapp
annotations:
kubernetes.io/ingress.class: alb
alb.ingress.kubernetes.io/scheme: internet-facing
alb.ingress.kubernetes.io/target-type: ip
alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/abc-123
alb.ingress.kubernetes.io/ssl-redirect: "443"
alb.ingress.kubernetes.io/healthcheck-path: /api/health
alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
spec:
rules:
- host: myapp.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: myapp
port:
number: 80
tls:
- hosts:
- myapp.example.comHorizontalPodAutoscaler
# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: myapp
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
behavior:
scaleDown:
stabilizationWindowSeconds: 120
policies:
- type: Percent
value: 25
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 30
policies:
- type: Pods
value: 2
periodSeconds: 605. Environment Variables
ECS: Task Definition environment and secrets
Plain values go in environment (visible in the console). Sensitive values go in secrets -- pulled from AWS Secrets Manager or SSM Parameter Store at container start:
{
"environment": [
{ "name": "NEXT_PUBLIC_APP_NAME", "value": "MyApp" },
{ "name": "LOG_LEVEL", "value": "info" }
],
"secrets": [
{
"name": "DATABASE_URL",
"valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:myapp/db-url"
},
{
"name": "AUTH_SECRET",
"valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/myapp/auth-secret"
}
]
}EKS: ConfigMap and Secret
# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: myapp-config
data:
LOG_LEVEL: "info"
CACHE_TTL: "3600"
---
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: myapp-secrets
type: Opaque
stringData:
DATABASE_URL: "postgresql://user:pass@host:5432/db"
AUTH_SECRET: "super-secret-value"
NEXT_PUBLIC_variables are baked at build time. These are inlined into the JavaScript bundle duringnext build. Changing them in your task definition or ConfigMap has no effect -- the client bundle already contains the old value. If you need different values per environment (staging vs. production), you must either build a separate Docker image per environment or use runtime injection (a<script>tag that setswindow.__ENVand a helper that reads from it).
6. Health Check Endpoint
Create a route handler that both ECS health checks and Kubernetes probes can hit:
// app/api/health/route.ts
import { NextResponse } from "next/server";
export const dynamic = "force-dynamic";
export function GET() {
return NextResponse.json({
status: "ok",
timestamp: new Date().toISOString(),
version: process.env.APP_VERSION ?? "unknown",
uptime: process.uptime(),
});
}Readiness vs. liveness probes (Kubernetes):
- Readiness probe -- "Is this pod ready to receive traffic?" Fails during startup or temporary overload. Kubernetes removes the pod from the Service endpoints but does not restart it.
- Liveness probe -- "Is this pod alive?" Fails if the process is deadlocked or hung. Kubernetes kills and restarts the pod.
Both can hit /api/health, but in production you might make the liveness probe simpler (just return 200) and the readiness probe more thorough (check database connectivity).
7. CI/CD Pipeline
A GitHub Actions workflow that builds, pushes to ECR, and deploys to ECS:
# .github/workflows/deploy.yml
name: Deploy to ECS
on:
push:
branches: [main]
env:
AWS_REGION: us-east-1
ECR_REPOSITORY: myapp
ECS_CLUSTER: my-cluster
ECS_SERVICE: myapp
ECS_TASK_DEFINITION: .aws/task-definition.json
jobs:
deploy:
runs-on: ubuntu-latest
permissions:
id-token: write
contents: read
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Configure AWS credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::123456789:role/github-actions-deploy
aws-region: ${{ env.AWS_REGION }}
- name: Login to ECR
id: ecr-login
uses: aws-actions/amazon-ecr-login@v2
- name: Build, tag, and push image
id: build-image
env:
ECR_REGISTRY: ${{ steps.ecr-login.outputs.registry }}
IMAGE_TAG: ${{ github.sha }}
run: |
docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
- name: Update ECS task definition
id: task-def
uses: aws-actions/amazon-ecs-render-task-definition@v1
with:
task-definition: ${{ env.ECS_TASK_DEFINITION }}
container-name: myapp
image: ${{ steps.build-image.outputs.image }}
- name: Deploy to ECS
uses: aws-actions/amazon-ecs-deploy-task-definition@v2
with:
task-definition: ${{ steps.task-def.outputs.task-definition }}
service: ${{ env.ECS_SERVICE }}
cluster: ${{ env.ECS_CLUSTER }}
wait-for-service-stability: trueFor EKS, replace the deploy step:
- name: Deploy to EKS
run: |
aws eks update-kubeconfig --name my-cluster --region $AWS_REGION
kubectl set image deployment/myapp \
myapp=${{ steps.build-image.outputs.image }}
kubectl rollout status deployment/myapp --timeout=300s8. Logging
ECS: CloudWatch Logs via awslogs driver
The task definition's logConfiguration (shown in Step 3) sends all container stdout/stderr to CloudWatch. Every console.log in a Server Component, Route Handler, or Server Action appears in the log group /ecs/myapp.
EKS: stdout to a log aggregator
Kubernetes captures container stdout/stderr. Install FluentBit as a DaemonSet to ship logs to CloudWatch, Datadog, or your preferred platform:
# Simplified FluentBit output config for CloudWatch
[OUTPUT]
Name cloudwatch_logs
Match *
region us-east-1
log_group_name /eks/myapp
log_stream_prefix pod-
auto_create_group OnWhere do Next.js logs go?
console.login Server Components, Route Handlers, Server Actions, and Middleware all go to container stdout (the server process).console.login Client Components goes to the browser DevTools.- For production, consider
pinofor structured JSON logging -- CloudWatch and Datadog parse structured logs far more effectively than plain text.
// lib/logger.ts
import pino from "pino";
export const logger = pino({
level: process.env.LOG_LEVEL ?? "info",
});Deep Dive
Vercel vs. Containers: What Changes
| Concern | Vercel (managed) | ECS / EKS (self-hosted) |
|---|---|---|
| CDN / Edge caching | Built-in global CDN | ALB + CloudFront in front of ECS/EKS |
| ISR revalidation | Shared cache across all edges | Each container has its own .next/cache -- needs shared cache (Redis, S3) |
| Preview deployments | Automatic per-PR URLs | Separate ECS service per PR branch, or feature-flagged single deployment |
| Auto-scaling | Automatic, serverless scaling | ECS auto-scaling policies or Kubernetes HPA |
| Image optimization | Edge-optimized next/image | next/image uses container CPU; offload to CloudFront + Lambda@Edge or Imgproxy |
| Build & deploy speed | Optimized build pipeline | Docker layer caching, ECR cache, BuildKit |
| Rollbacks | One-click in dashboard | ECS: redeploy previous task definition revision. EKS: kubectl rollout undo |
| HTTPS | Automatic TLS | ALB handles TLS termination; ACM provides free certificates |
| Environment variables | Dashboard UI + encrypted | Task definition env / Kubernetes Secrets + Secrets Manager |
| Middleware | Runs at the edge (V8 isolate) | Runs inside the container Node.js runtime, not at the edge |
ISR in Containers -- The Cache Problem
This is the single biggest surprise for teams migrating from Vercel.
On Vercel, ISR "just works" because Vercel manages a globally shared cache. In a container deployment, each container has its own .next/cache on an ephemeral filesystem. When container A revalidates a page, containers B and C still serve the stale version until they independently revalidate.
Solutions, from simplest to most robust:
-
Sticky sessions on the ALB -- Route each user to the same container via a cookie. Simple to configure, but defeats the purpose of load balancing and creates hot spots.
-
Shared EFS mount for
.next/cache-- On ECS Fargate, mount an EFS volume at.next/cache. All containers share the same filesystem. Adds ~1-5ms latency per cache read but is operationally simple. -
Custom cache handler (recommended) -- Point the ISR cache to Redis or S3 using the Next.js
cacheHandlerconfiguration:
// next.config.ts
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
output: "standalone",
cacheHandler: require.resolve("./cache-handler.mjs"),
cacheMaxMemorySize: 0, // Disable in-memory cache, use external only
};
export default nextConfig;// cache-handler.mjs
import { createClient } from "redis";
const client = createClient({ url: process.env.REDIS_URL });
await client.connect();
export default class CacheHandler {
async get(key) {
const data = await client.get(key);
return data ? JSON.parse(data) : null;
}
async set(key, data, ctx) {
const ttl = ctx.revalidate ?? 60;
await client.set(key, JSON.stringify(data), { EX: ttl });
}
async revalidateTag(tags) {
// Implement tag-based revalidation by scanning keys
for (const tag of [tags].flat()) {
const keys = await client.keys(`*:tag:${tag}:*`);
if (keys.length > 0) {
await client.del(keys);
}
}
}
}- Accept stale-while-revalidate per container -- If slight inconsistency is tolerable (each container revalidates independently within the ISR window), you can skip shared caching entirely. The page will be at most
revalidateseconds stale on any given container.
The standalone Output Mode
Without output: "standalone", your Docker image must include the entire node_modules/ directory -- easily 500MB+ for a typical Next.js app. With standalone mode, Next.js traces the exact files needed by the server and copies them into .next/standalone/, producing a self-contained directory with its own server.js entrypoint.
What standalone includes:
server.js-- a minimal Node.js server (replacesnext start)- A pruned
node_modules/with only the packages the server needs at runtime - Your compiled server-side code
What standalone does NOT include (you must copy them separately):
.next/static/-- client-side JS/CSS bundles (served by the Node.js server or a CDN)public/-- static assets
That is why the Dockerfile has these two extra COPY lines:
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./publicResult: A production image of ~100-150MB instead of 500MB+.
Image Size Optimization
| Technique | Impact |
|---|---|
| Multi-stage build (3 stages) | Only the runner stage ends up in the final image |
.dockerignore | Excludes .git, node_modules, .env*, *.md from the build context |
Alpine base image (node:20-alpine) | ~50MB base vs. ~350MB for node:20 (Debian) |
npm ci --omit=dev in deps stage | No devDependencies in the final image |
output: "standalone" | Traced, minimal node_modules (~30MB vs. 300MB+) |
| Layer ordering | Dependencies first (cached), source code last (changes frequently) |
Zero-Downtime Deployments
ECS Fargate:
Set minimumHealthyPercent: 100 and maximumPercent: 200 in the deployment configuration. During a deploy, ECS starts new tasks (up to 2x desired count) and waits for them to pass health checks before draining the old tasks.
Deploy timeline:
t=0 [old-1] [old-2] ← 2 tasks running
t=30s [old-1] [old-2] [new-1] [new-2] ← 4 tasks, new ones starting
t=90s [old-1] [old-2] [new-1✓] [new-2✓] ← new tasks pass health check
t=120s [new-1✓] [new-2✓] ← old tasks drained, done
EKS (Kubernetes):
Set maxSurge: 1 and maxUnavailable: 0 in the rolling update strategy. Kubernetes creates one new pod, waits for its readiness probe to pass, then terminates one old pod. Repeat until all pods are updated.
Both platforms: Configure ALB deregistration delay (connection draining) to allow in-flight requests to complete before the old container is stopped. A value of 30 seconds works for most Next.js apps.
Gotchas
Common pitfalls when running Next.js in containers. Each one has bitten at least one team migrating from Vercel.
-
ISR cache is per-container. The number-one surprise for Vercel migrants. Each container independently revalidates ISR pages. Without a shared cache (Redis, S3, EFS), users hitting different containers see inconsistent versions of the same page. See the "ISR in Containers" deep dive above.
-
NEXT_PUBLIC_vars are baked atdocker buildtime. These variables are inlined into the client-side JavaScript bundle during the build. Changing them in your ECS task definition or Kubernetes ConfigMap has zero effect -- the bundle already contains the old values. Either build separate images per environment or inject values at runtime via a<script>tag. -
Container health checks must use the right port. ECS health checks hit
localhost:3000inside the container. If you change thePORTenvironment variable, update the health check command to match:curl -f http://localhost:${PORT}/api/health. -
Forgetting
HOSTNAME=0.0.0.0. The Next.js standalone server binds tolocalhost(127.0.0.1) by default. Inside a container, that means it only accepts connections from inside the container itself. The ALB or Kubernetes service cannot reach it. SetHOSTNAME="0.0.0.0"so the server listens on all network interfaces. -
Ephemeral filesystem. Fargate containers have no persistent disk. File uploads stored to the local filesystem, ISR cache files, and temporary files are all lost when the container restarts or is replaced during a deploy. Use S3 for file storage, EFS for shared filesystem, or an external cache for ISR.
-
Cold starts on Fargate. Pulling a 200MB Docker image on Fargate takes 10-30 seconds. Combine that with Node.js startup time and you get noticeable cold start latency. Keep images small (standalone + Alpine = ~100MB). Use ECR in the same region as your Fargate cluster. Consider provisioned capacity for latency-sensitive services.
-
sharpnot installed for image optimization.next/imagerequires thesharppackage for production image optimization. The standalone output does not always include it. Explicitly install sharp in the runner stage of your Dockerfile (RUN npm install sharp) or setNEXT_SHARP_PATHto point to an installed copy. -
Not setting resource limits. A Next.js build can consume 2GB+ of RAM, and even the runtime can spike under load. Without memory limits in your task definition or pod spec, one runaway process can starve other containers on the same host. Set
NODE_OPTIONS=--max-old-space-size=1536for a 2GB container to leave headroom for the OS. -
Log output is unstructured by default.
console.logproduces plain text. CloudWatch, Datadog, and other log aggregators parse structured JSON far more effectively. Usepinoor a similar structured logger in Server Components and Route Handlers to get searchable, filterable logs with log levels, request IDs, and timing.
Alternatives
| Alternative | Use When | Don't Use When |
|---|---|---|
| Vercel | You want zero ops and instant deploys | Company requires self-hosting or AWS-only |
| EC2 + PM2 (standalone) | Simple single-server deployment, low traffic | You need auto-scaling or container orchestration |
| AWS App Runner | You want Fargate simplicity without task definitions | You need fine-grained networking or sidecar containers |
| Coolify / Railway | You want a PaaS with container support | Enterprise compliance requires AWS-native services |
| Static export | Fully static site behind a CDN | You use SSR, ISR, middleware, or Server Actions |
FAQs
Does ISR work in containers?
Yes, ISR works in containers -- revalidate timers fire and pages regenerate on demand. The catch is that each container has its own cache. Without a shared cache backend (Redis, S3, or EFS), different containers serve different versions of the same ISR page. For most apps, the simplest fix is a Redis-backed custom cache handler. See the "ISR in Containers" deep dive above.
How do I handle preview deployments without Vercel?
Two common approaches: (1) Deploy a separate ECS service or Kubernetes namespace per PR branch, each with its own ALB target group and a subdomain like pr-123.preview.example.com. (2) Use a single staging environment with feature flags -- the PR toggles a flag, and the staging deploy shows the new code to testers. Approach 1 gives true isolation but costs more. Approach 2 is cheaper but requires a feature flag system.
What about image optimization with next/image?
next/image works in containers -- it uses the sharp library to resize and optimize images on the fly. The trade-off is that optimization uses container CPU. For high-traffic sites, offload image optimization to CloudFront with Lambda@Edge, or use a dedicated image proxy like Imgproxy. You can also pre-optimize images at build time using next/image with loader set to a custom function.
How do I roll back a bad deployment?
ECS: Every deployment creates a new task definition revision. To roll back, update the service to use the previous revision: aws ecs update-service --cluster my-cluster --service myapp --task-definition myapp:42 (where 42 is the previous revision number). EKS: kubectl rollout undo deployment/myapp. Both approaches are near-instant because the previous Docker image is already cached in ECR.
How do I achieve zero-downtime deploys?
ECS: Set minimumHealthyPercent: 100 and maximumPercent: 200 in the service deployment configuration. ECS starts new tasks alongside old ones, waits for health checks to pass, then drains old tasks. EKS: Set maxSurge: 1 and maxUnavailable: 0 in the Deployment rolling update strategy. Enable ALB deregistration delay (30s) on both platforms so in-flight requests complete before old containers stop.
What is the difference between ECS and EKS?
ECS (Elastic Container Service) is AWS-native container orchestration. You define tasks and services. Fargate mode is serverless -- no EC2 instances to manage. It is simpler to learn and operate. EKS (Elastic Kubernetes Service) runs standard Kubernetes. You get the full Kubernetes ecosystem (Helm, Istio, ArgoCD, etc.) and portability across clouds. EKS is more complex but more flexible. Choose ECS if you are AWS-only and want simplicity. Choose EKS if you need Kubernetes features, multi-cloud portability, or your team already knows Kubernetes.
Do I need Kubernetes?
No. For most Next.js deployments, ECS Fargate is simpler and sufficient. You get auto-scaling, rolling deployments, health checks, and ALB integration without learning Kubernetes. Choose EKS only if your organization already uses Kubernetes, you need its ecosystem (service mesh, GitOps, custom operators), or you want cloud portability.
How much does running containers on AWS cost compared to Vercel?
It depends on scale. A minimal ECS Fargate setup (2 tasks, 0.5 vCPU, 1GB RAM each) costs roughly $30-50/month. Add ALB ($20/month + data transfer) and ECR ($1-5/month). Total: ~$50-75/month for a small app. Vercel Pro is $20/month per seat but can spike with high traffic (bandwidth overages, function invocations). At high scale, containers are usually cheaper. At low scale, Vercel is cheaper and far less operational work.
How do I handle WebSockets or long-lived connections?
ALB supports WebSocket connections natively. Set the idle timeout on the ALB to match your longest expected connection (default 60s, max 4000s). For ECS, ensure your task's security group allows the traffic. For EKS, the ALB Ingress Controller supports WebSocket by default. Note that sticky sessions may be needed if your WebSocket server maintains in-memory state.
Can I use middleware in a container deployment?
Yes, middleware runs inside the container's Node.js runtime on every request. On Vercel, middleware runs at the edge in a V8 isolate with a limited API surface. In a container, middleware runs in full Node.js, so you get access to all Node.js APIs. The trade-off is latency -- Vercel edge middleware runs closer to the user, while container middleware runs in the container's region. For global latency, put CloudFront in front of the ALB.
How do I set up a custom domain with HTTPS?
Request a free TLS certificate from AWS Certificate Manager (ACM) for your domain. Attach it to the ALB listener on port 443. Create a CNAME or alias DNS record pointing your domain to the ALB's DNS name. The ALB terminates TLS -- traffic between the ALB and your containers is HTTP on port 3000 inside the VPC, which is fine for most use cases.
What is the best way to handle database connections in containers?
Each container process opens its own database connection pool. With auto-scaling, you can easily exhaust database connections. Use a connection pooler like PgBouncer (for PostgreSQL) or RDS Proxy (managed by AWS). Set your pool size conservatively -- for a 2-container deployment with max_connections: 20 each, that is 40 connections total. Monitor connection count in CloudWatch and scale the database before you hit the limit.
Related
- Deployment -- Overview of Next.js deployment options
- Deploy on EC2 / Standalone -- Non-containerized deployment
- Environment Variables -- Managing env vars in Next.js