React SME Cookbook
All FAQs

Search Documentation

Search across all documentation pages

deploymentdockerecseksfargatekubernetescontainersawsproduction

Deploy Next.js on ECS / EKS (Containers)

Deploy a Next.js 15 App Router application as a Docker container on AWS ECS (Fargate) or Kubernetes (EKS). This guide walks through containerizing your app with output: "standalone", pushing to ECR, orchestrating with ECS or EKS, auto-scaling, and solving the problems Vercel handled invisibly -- ISR cache sharing, image optimization, preview deployments, and zero-downtime rollouts.

Recipe

Quick-reference recipe card -- copy-paste ready.

Dockerfile (production multi-stage):

FROM node:20-alpine AS builder
WORKDIR /app
COPY package.json package-lock.json ./
RUN npm ci
COPY . .
RUN npm run build
 
FROM node:20-alpine AS runner
WORKDIR /app
ENV NODE_ENV=production
ENV HOSTNAME="0.0.0.0"
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
EXPOSE 3000
USER node
CMD ["node", "server.js"]

Deploy to ECS (Fargate):

docker build -t myapp .
aws ecr get-login-password --region us-east-1 | docker login --username AWS --password-stdin 123456789.dkr.ecr.us-east-1.amazonaws.com
docker tag myapp:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
aws ecs update-service --cluster my-cluster --service my-service --force-new-deployment

Deploy to EKS (Kubernetes):

docker build -t myapp .
docker tag myapp:latest 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
docker push 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
kubectl set image deployment/myapp myapp=123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

When to reach for this: Your team needs AWS-native infrastructure, compliance requires self-hosting, you need fine-grained control over networking and scaling, or you are already running ECS/EKS for other services.

Working Example

A complete, production-ready deployment walkthrough from Dockerfile to running service.

1. The Production Dockerfile

Multi-stage build with three stages: deps installs only production dependencies, builder compiles the Next.js app, and runner copies just the standalone output into a minimal image.

Prerequisite -- set output: "standalone" in your Next.js config:

// next.config.ts
import type { NextConfig } from "next";
 
const nextConfig: NextConfig = {
  output: "standalone",
};
 
export default nextConfig;

Complete Dockerfile:

# -----------------------------------------------------------
# Stage 1: deps -- install production dependencies only
# -----------------------------------------------------------
FROM node:20-alpine AS deps
RUN apk add --no-cache libc6-compat
WORKDIR /app
 
# Copy lockfile first so this layer is cached unless deps change
COPY package.json package-lock.json ./
RUN npm ci --omit=dev
 
# -----------------------------------------------------------
# Stage 2: builder -- build the Next.js application
# -----------------------------------------------------------
FROM node:20-alpine AS builder
WORKDIR /app
 
# Copy ALL node_modules (including devDependencies) for the build
COPY package.json package-lock.json ./
RUN npm ci
 
COPY . .
 
# Build-time env vars (NEXT_PUBLIC_*) are baked in here
# ARG NEXT_PUBLIC_API_URL
# ENV NEXT_PUBLIC_API_URL=$NEXT_PUBLIC_API_URL
 
RUN npm run build
 
# -----------------------------------------------------------
# Stage 3: runner -- minimal production image
# -----------------------------------------------------------
FROM node:20-alpine AS runner
WORKDIR /app
 
ENV NODE_ENV=production
# CRITICAL: standalone binds to localhost by default.
# Containers must bind to 0.0.0.0 to accept traffic from
# the Docker network / ALB / Kubernetes service.
ENV HOSTNAME="0.0.0.0"
ENV PORT=3000
 
# Install sharp for next/image optimization in production
RUN npm install --prefix /app sharp
 
# Don't run as root
RUN addgroup --system --gid 1001 nodejs
RUN adduser --system --uid 1001 nextjs
 
# Copy the standalone server and static assets
COPY --from=builder /app/.next/standalone ./
COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public
 
# Set correct ownership
RUN chown -R nextjs:nodejs /app
 
USER nextjs
 
EXPOSE 3000
 
# standalone output produces server.js -- this replaces `next start`
CMD ["node", "server.js"]

.dockerignore -- keep the build context small:

.git
node_modules
.next
.env*
*.md
.github
.vscode
coverage

2. Build and Push to ECR

Create an ECR repository (one-time), then build and push your image:

# Create ECR repository (one-time)
aws ecr create-repository \
  --repository-name myapp \
  --region us-east-1
 
# Build the Docker image
docker build -t myapp .
 
# Authenticate Docker with ECR
aws ecr get-login-password --region us-east-1 \
  | docker login --username AWS --password-stdin \
    123456789.dkr.ecr.us-east-1.amazonaws.com
 
# Tag for ECR
docker tag myapp:latest \
  123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
 
# Push to ECR
docker push \
  123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest

3. ECS Fargate Deployment

Task Definition

{
  "family": "myapp",
  "networkMode": "awsvpc",
  "requiresCompatibilities": ["FARGATE"],
  "cpu": "512",
  "memory": "1024",
  "executionRoleArn": "arn:aws:iam::123456789:role/ecsTaskExecutionRole",
  "taskRoleArn": "arn:aws:iam::123456789:role/ecsTaskRole",
  "containerDefinitions": [
    {
      "name": "myapp",
      "image": "123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest",
      "portMappings": [
        {
          "containerPort": 3000,
          "protocol": "tcp"
        }
      ],
      "environment": [
        { "name": "NODE_ENV", "value": "production" },
        { "name": "APP_VERSION", "value": "1.0.0" },
        { "name": "NODE_OPTIONS", "value": "--max-old-space-size=768" }
      ],
      "secrets": [
        {
          "name": "DATABASE_URL",
          "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:myapp/DATABASE_URL"
        }
      ],
      "healthCheck": {
        "command": ["CMD-SHELL", "curl -f http://localhost:3000/api/health || exit 1"],
        "interval": 30,
        "timeout": 5,
        "retries": 3,
        "startPeriod": 60
      },
      "logConfiguration": {
        "logDriver": "awslogs",
        "options": {
          "awslogs-group": "/ecs/myapp",
          "awslogs-region": "us-east-1",
          "awslogs-stream-prefix": "ecs"
        }
      },
      "essential": true
    }
  ]
}

Service Configuration

# Create the ECS service with ALB
aws ecs create-service \
  --cluster my-cluster \
  --service-name myapp \
  --task-definition myapp:1 \
  --desired-count 2 \
  --launch-type FARGATE \
  --network-configuration "awsvpcConfiguration={
    subnets=[subnet-abc123,subnet-def456],
    securityGroups=[sg-abc123],
    assignPublicIp=ENABLED
  }" \
  --load-balancers "targetGroupArn=arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123,containerName=myapp,containerPort=3000" \
  --deployment-configuration "minimumHealthyPercent=100,maximumPercent=200"

Auto-Scaling Policy

# Register scalable target
aws application-autoscaling register-scalable-target \
  --service-namespace ecs \
  --resource-id service/my-cluster/myapp \
  --scalable-dimension ecs:service:DesiredCount \
  --min-capacity 2 \
  --max-capacity 10
 
# Target tracking on CPU utilization
aws application-autoscaling put-scaling-policy \
  --service-namespace ecs \
  --resource-id service/my-cluster/myapp \
  --scalable-dimension ecs:service:DesiredCount \
  --policy-name cpu-tracking \
  --policy-type TargetTrackingScaling \
  --target-tracking-scaling-policy-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ECSServiceAverageCPUUtilization"
    },
    "ScaleOutCooldown": 60,
    "ScaleInCooldown": 120
  }'

ALB Configuration

Configure the Application Load Balancer target group health check:

# Configure ALB target group health check
aws elbv2 modify-target-group \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123 \
  --health-check-path /api/health \
  --health-check-interval-seconds 30 \
  --healthy-threshold-count 2 \
  --unhealthy-threshold-count 3 \
  --health-check-timeout-seconds 5
 
# Enable stickiness (useful for ISR if not using shared cache)
aws elbv2 modify-target-group-attributes \
  --target-group-arn arn:aws:elasticloadbalancing:us-east-1:123456789:targetgroup/myapp/abc123 \
  --attributes Key=stickiness.enabled,Value=true Key=stickiness.type,Value=lb_cookie Key=stickiness.lb_cookie.duration_seconds,Value=3600

4. EKS / Kubernetes Deployment

Deployment Manifest

# k8s/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: myapp
  labels:
    app: myapp
spec:
  replicas: 2
  selector:
    matchLabels:
      app: myapp
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0  # Zero-downtime: never remove a pod before a new one is ready
  template:
    metadata:
      labels:
        app: myapp
    spec:
      containers:
        - name: myapp
          image: 123456789.dkr.ecr.us-east-1.amazonaws.com/myapp:latest
          ports:
            - containerPort: 3000
          env:
            - name: NODE_ENV
              value: "production"
            - name: NODE_OPTIONS
              value: "--max-old-space-size=768"
          envFrom:
            - configMapRef:
                name: myapp-config
            - secretRef:
                name: myapp-secrets
          resources:
            requests:
              cpu: "250m"
              memory: "512Mi"
            limits:
              cpu: "1000m"
              memory: "1024Mi"
          readinessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 10
            periodSeconds: 10
            failureThreshold: 3
          livenessProbe:
            httpGet:
              path: /api/health
              port: 3000
            initialDelaySeconds: 30
            periodSeconds: 30
            failureThreshold: 3
      terminationGracePeriodSeconds: 30

Service Manifest

# k8s/service.yaml
apiVersion: v1
kind: Service
metadata:
  name: myapp
spec:
  type: ClusterIP
  selector:
    app: myapp
  ports:
    - port: 80
      targetPort: 3000
      protocol: TCP

Ingress Manifest (AWS ALB Ingress Controller)

# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: myapp
  annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/scheme: internet-facing
    alb.ingress.kubernetes.io/target-type: ip
    alb.ingress.kubernetes.io/listen-ports: '[{"HTTPS": 443}]'
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:123456789:certificate/abc-123
    alb.ingress.kubernetes.io/ssl-redirect: "443"
    alb.ingress.kubernetes.io/healthcheck-path: /api/health
    alb.ingress.kubernetes.io/healthcheck-interval-seconds: "30"
    alb.ingress.kubernetes.io/target-group-attributes: deregistration_delay.timeout_seconds=30
spec:
  rules:
    - host: myapp.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: myapp
                port:
                  number: 80
  tls:
    - hosts:
        - myapp.example.com

HorizontalPodAutoscaler

# k8s/hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: myapp
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: myapp
  minReplicas: 2
  maxReplicas: 10
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 120
      policies:
        - type: Percent
          value: 25
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 30
      policies:
        - type: Pods
          value: 2
          periodSeconds: 60

5. Environment Variables

ECS: Task Definition environment and secrets

Plain values go in environment (visible in the console). Sensitive values go in secrets -- pulled from AWS Secrets Manager or SSM Parameter Store at container start:

{
  "environment": [
    { "name": "NEXT_PUBLIC_APP_NAME", "value": "MyApp" },
    { "name": "LOG_LEVEL", "value": "info" }
  ],
  "secrets": [
    {
      "name": "DATABASE_URL",
      "valueFrom": "arn:aws:secretsmanager:us-east-1:123456789:secret:myapp/db-url"
    },
    {
      "name": "AUTH_SECRET",
      "valueFrom": "arn:aws:ssm:us-east-1:123456789:parameter/myapp/auth-secret"
    }
  ]
}

EKS: ConfigMap and Secret

# k8s/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: myapp-config
data:
  LOG_LEVEL: "info"
  CACHE_TTL: "3600"
 
---
# k8s/secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: myapp-secrets
type: Opaque
stringData:
  DATABASE_URL: "postgresql://user:pass@host:5432/db"
  AUTH_SECRET: "super-secret-value"

NEXT_PUBLIC_ variables are baked at build time. These are inlined into the JavaScript bundle during next build. Changing them in your task definition or ConfigMap has no effect -- the client bundle already contains the old value. If you need different values per environment (staging vs. production), you must either build a separate Docker image per environment or use runtime injection (a <script> tag that sets window.__ENV and a helper that reads from it).

6. Health Check Endpoint

Create a route handler that both ECS health checks and Kubernetes probes can hit:

// app/api/health/route.ts
import { NextResponse } from "next/server";
 
export const dynamic = "force-dynamic";
 
export function GET() {
  return NextResponse.json({
    status: "ok",
    timestamp: new Date().toISOString(),
    version: process.env.APP_VERSION ?? "unknown",
    uptime: process.uptime(),
  });
}

Readiness vs. liveness probes (Kubernetes):

  • Readiness probe -- "Is this pod ready to receive traffic?" Fails during startup or temporary overload. Kubernetes removes the pod from the Service endpoints but does not restart it.
  • Liveness probe -- "Is this pod alive?" Fails if the process is deadlocked or hung. Kubernetes kills and restarts the pod.

Both can hit /api/health, but in production you might make the liveness probe simpler (just return 200) and the readiness probe more thorough (check database connectivity).

7. CI/CD Pipeline

A GitHub Actions workflow that builds, pushes to ECR, and deploys to ECS:

# .github/workflows/deploy.yml
name: Deploy to ECS
 
on:
  push:
    branches: [main]
 
env:
  AWS_REGION: us-east-1
  ECR_REPOSITORY: myapp
  ECS_CLUSTER: my-cluster
  ECS_SERVICE: myapp
  ECS_TASK_DEFINITION: .aws/task-definition.json
 
jobs:
  deploy:
    runs-on: ubuntu-latest
    permissions:
      id-token: write
      contents: read
 
    steps:
      - name: Checkout
        uses: actions/checkout@v4
 
      - name: Configure AWS credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::123456789:role/github-actions-deploy
          aws-region: ${{ env.AWS_REGION }}
 
      - name: Login to ECR
        id: ecr-login
        uses: aws-actions/amazon-ecr-login@v2
 
      - name: Build, tag, and push image
        id: build-image
        env:
          ECR_REGISTRY: ${{ steps.ecr-login.outputs.registry }}
          IMAGE_TAG: ${{ github.sha }}
        run: |
          docker build -t $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG .
          docker push $ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG
          echo "image=$ECR_REGISTRY/$ECR_REPOSITORY:$IMAGE_TAG" >> $GITHUB_OUTPUT
 
      - name: Update ECS task definition
        id: task-def
        uses: aws-actions/amazon-ecs-render-task-definition@v1
        with:
          task-definition: ${{ env.ECS_TASK_DEFINITION }}
          container-name: myapp
          image: ${{ steps.build-image.outputs.image }}
 
      - name: Deploy to ECS
        uses: aws-actions/amazon-ecs-deploy-task-definition@v2
        with:
          task-definition: ${{ steps.task-def.outputs.task-definition }}
          service: ${{ env.ECS_SERVICE }}
          cluster: ${{ env.ECS_CLUSTER }}
          wait-for-service-stability: true

For EKS, replace the deploy step:

      - name: Deploy to EKS
        run: |
          aws eks update-kubeconfig --name my-cluster --region $AWS_REGION
          kubectl set image deployment/myapp \
            myapp=${{ steps.build-image.outputs.image }}
          kubectl rollout status deployment/myapp --timeout=300s

8. Logging

ECS: CloudWatch Logs via awslogs driver

The task definition's logConfiguration (shown in Step 3) sends all container stdout/stderr to CloudWatch. Every console.log in a Server Component, Route Handler, or Server Action appears in the log group /ecs/myapp.

EKS: stdout to a log aggregator

Kubernetes captures container stdout/stderr. Install FluentBit as a DaemonSet to ship logs to CloudWatch, Datadog, or your preferred platform:

# Simplified FluentBit output config for CloudWatch
[OUTPUT]
    Name              cloudwatch_logs
    Match             *
    region            us-east-1
    log_group_name    /eks/myapp
    log_stream_prefix pod-
    auto_create_group On

Where do Next.js logs go?

  • console.log in Server Components, Route Handlers, Server Actions, and Middleware all go to container stdout (the server process).
  • console.log in Client Components goes to the browser DevTools.
  • For production, consider pino for structured JSON logging -- CloudWatch and Datadog parse structured logs far more effectively than plain text.
// lib/logger.ts
import pino from "pino";
 
export const logger = pino({
  level: process.env.LOG_LEVEL ?? "info",
});

Deep Dive

Vercel vs. Containers: What Changes

ConcernVercel (managed)ECS / EKS (self-hosted)
CDN / Edge cachingBuilt-in global CDNALB + CloudFront in front of ECS/EKS
ISR revalidationShared cache across all edgesEach container has its own .next/cache -- needs shared cache (Redis, S3)
Preview deploymentsAutomatic per-PR URLsSeparate ECS service per PR branch, or feature-flagged single deployment
Auto-scalingAutomatic, serverless scalingECS auto-scaling policies or Kubernetes HPA
Image optimizationEdge-optimized next/imagenext/image uses container CPU; offload to CloudFront + Lambda@Edge or Imgproxy
Build & deploy speedOptimized build pipelineDocker layer caching, ECR cache, BuildKit
RollbacksOne-click in dashboardECS: redeploy previous task definition revision. EKS: kubectl rollout undo
HTTPSAutomatic TLSALB handles TLS termination; ACM provides free certificates
Environment variablesDashboard UI + encryptedTask definition env / Kubernetes Secrets + Secrets Manager
MiddlewareRuns at the edge (V8 isolate)Runs inside the container Node.js runtime, not at the edge

ISR in Containers -- The Cache Problem

This is the single biggest surprise for teams migrating from Vercel.

On Vercel, ISR "just works" because Vercel manages a globally shared cache. In a container deployment, each container has its own .next/cache on an ephemeral filesystem. When container A revalidates a page, containers B and C still serve the stale version until they independently revalidate.

Solutions, from simplest to most robust:

  1. Sticky sessions on the ALB -- Route each user to the same container via a cookie. Simple to configure, but defeats the purpose of load balancing and creates hot spots.

  2. Shared EFS mount for .next/cache -- On ECS Fargate, mount an EFS volume at .next/cache. All containers share the same filesystem. Adds ~1-5ms latency per cache read but is operationally simple.

  3. Custom cache handler (recommended) -- Point the ISR cache to Redis or S3 using the Next.js cacheHandler configuration:

// next.config.ts
import type { NextConfig } from "next";
 
const nextConfig: NextConfig = {
  output: "standalone",
  cacheHandler: require.resolve("./cache-handler.mjs"),
  cacheMaxMemorySize: 0, // Disable in-memory cache, use external only
};
 
export default nextConfig;
// cache-handler.mjs
import { createClient } from "redis";
 
const client = createClient({ url: process.env.REDIS_URL });
await client.connect();
 
export default class CacheHandler {
  async get(key) {
    const data = await client.get(key);
    return data ? JSON.parse(data) : null;
  }
 
  async set(key, data, ctx) {
    const ttl = ctx.revalidate ?? 60;
    await client.set(key, JSON.stringify(data), { EX: ttl });
  }
 
  async revalidateTag(tags) {
    // Implement tag-based revalidation by scanning keys
    for (const tag of [tags].flat()) {
      const keys = await client.keys(`*:tag:${tag}:*`);
      if (keys.length > 0) {
        await client.del(keys);
      }
    }
  }
}
  1. Accept stale-while-revalidate per container -- If slight inconsistency is tolerable (each container revalidates independently within the ISR window), you can skip shared caching entirely. The page will be at most revalidate seconds stale on any given container.

The standalone Output Mode

Without output: "standalone", your Docker image must include the entire node_modules/ directory -- easily 500MB+ for a typical Next.js app. With standalone mode, Next.js traces the exact files needed by the server and copies them into .next/standalone/, producing a self-contained directory with its own server.js entrypoint.

What standalone includes:

  • server.js -- a minimal Node.js server (replaces next start)
  • A pruned node_modules/ with only the packages the server needs at runtime
  • Your compiled server-side code

What standalone does NOT include (you must copy them separately):

  • .next/static/ -- client-side JS/CSS bundles (served by the Node.js server or a CDN)
  • public/ -- static assets

That is why the Dockerfile has these two extra COPY lines:

COPY --from=builder /app/.next/static ./.next/static
COPY --from=builder /app/public ./public

Result: A production image of ~100-150MB instead of 500MB+.

Image Size Optimization

TechniqueImpact
Multi-stage build (3 stages)Only the runner stage ends up in the final image
.dockerignoreExcludes .git, node_modules, .env*, *.md from the build context
Alpine base image (node:20-alpine)~50MB base vs. ~350MB for node:20 (Debian)
npm ci --omit=dev in deps stageNo devDependencies in the final image
output: "standalone"Traced, minimal node_modules (~30MB vs. 300MB+)
Layer orderingDependencies first (cached), source code last (changes frequently)

Zero-Downtime Deployments

ECS Fargate:

Set minimumHealthyPercent: 100 and maximumPercent: 200 in the deployment configuration. During a deploy, ECS starts new tasks (up to 2x desired count) and waits for them to pass health checks before draining the old tasks.

Deploy timeline:
  t=0    [old-1] [old-2]                     ← 2 tasks running
  t=30s  [old-1] [old-2] [new-1] [new-2]    ← 4 tasks, new ones starting
  t=90s  [old-1] [old-2] [new-1✓] [new-2✓]  ← new tasks pass health check
  t=120s                  [new-1✓] [new-2✓]  ← old tasks drained, done

EKS (Kubernetes):

Set maxSurge: 1 and maxUnavailable: 0 in the rolling update strategy. Kubernetes creates one new pod, waits for its readiness probe to pass, then terminates one old pod. Repeat until all pods are updated.

Both platforms: Configure ALB deregistration delay (connection draining) to allow in-flight requests to complete before the old container is stopped. A value of 30 seconds works for most Next.js apps.

Gotchas

Common pitfalls when running Next.js in containers. Each one has bitten at least one team migrating from Vercel.

  1. ISR cache is per-container. The number-one surprise for Vercel migrants. Each container independently revalidates ISR pages. Without a shared cache (Redis, S3, EFS), users hitting different containers see inconsistent versions of the same page. See the "ISR in Containers" deep dive above.

  2. NEXT_PUBLIC_ vars are baked at docker build time. These variables are inlined into the client-side JavaScript bundle during the build. Changing them in your ECS task definition or Kubernetes ConfigMap has zero effect -- the bundle already contains the old values. Either build separate images per environment or inject values at runtime via a <script> tag.

  3. Container health checks must use the right port. ECS health checks hit localhost:3000 inside the container. If you change the PORT environment variable, update the health check command to match: curl -f http://localhost:${PORT}/api/health.

  4. Forgetting HOSTNAME=0.0.0.0. The Next.js standalone server binds to localhost (127.0.0.1) by default. Inside a container, that means it only accepts connections from inside the container itself. The ALB or Kubernetes service cannot reach it. Set HOSTNAME="0.0.0.0" so the server listens on all network interfaces.

  5. Ephemeral filesystem. Fargate containers have no persistent disk. File uploads stored to the local filesystem, ISR cache files, and temporary files are all lost when the container restarts or is replaced during a deploy. Use S3 for file storage, EFS for shared filesystem, or an external cache for ISR.

  6. Cold starts on Fargate. Pulling a 200MB Docker image on Fargate takes 10-30 seconds. Combine that with Node.js startup time and you get noticeable cold start latency. Keep images small (standalone + Alpine = ~100MB). Use ECR in the same region as your Fargate cluster. Consider provisioned capacity for latency-sensitive services.

  7. sharp not installed for image optimization. next/image requires the sharp package for production image optimization. The standalone output does not always include it. Explicitly install sharp in the runner stage of your Dockerfile (RUN npm install sharp) or set NEXT_SHARP_PATH to point to an installed copy.

  8. Not setting resource limits. A Next.js build can consume 2GB+ of RAM, and even the runtime can spike under load. Without memory limits in your task definition or pod spec, one runaway process can starve other containers on the same host. Set NODE_OPTIONS=--max-old-space-size=1536 for a 2GB container to leave headroom for the OS.

  9. Log output is unstructured by default. console.log produces plain text. CloudWatch, Datadog, and other log aggregators parse structured JSON far more effectively. Use pino or a similar structured logger in Server Components and Route Handlers to get searchable, filterable logs with log levels, request IDs, and timing.

Alternatives

AlternativeUse WhenDon't Use When
VercelYou want zero ops and instant deploysCompany requires self-hosting or AWS-only
EC2 + PM2 (standalone)Simple single-server deployment, low trafficYou need auto-scaling or container orchestration
AWS App RunnerYou want Fargate simplicity without task definitionsYou need fine-grained networking or sidecar containers
Coolify / RailwayYou want a PaaS with container supportEnterprise compliance requires AWS-native services
Static exportFully static site behind a CDNYou use SSR, ISR, middleware, or Server Actions

FAQs

Does ISR work in containers?

Yes, ISR works in containers -- revalidate timers fire and pages regenerate on demand. The catch is that each container has its own cache. Without a shared cache backend (Redis, S3, or EFS), different containers serve different versions of the same ISR page. For most apps, the simplest fix is a Redis-backed custom cache handler. See the "ISR in Containers" deep dive above.

How do I handle preview deployments without Vercel?

Two common approaches: (1) Deploy a separate ECS service or Kubernetes namespace per PR branch, each with its own ALB target group and a subdomain like pr-123.preview.example.com. (2) Use a single staging environment with feature flags -- the PR toggles a flag, and the staging deploy shows the new code to testers. Approach 1 gives true isolation but costs more. Approach 2 is cheaper but requires a feature flag system.

What about image optimization with next/image?

next/image works in containers -- it uses the sharp library to resize and optimize images on the fly. The trade-off is that optimization uses container CPU. For high-traffic sites, offload image optimization to CloudFront with Lambda@Edge, or use a dedicated image proxy like Imgproxy. You can also pre-optimize images at build time using next/image with loader set to a custom function.

How do I roll back a bad deployment?

ECS: Every deployment creates a new task definition revision. To roll back, update the service to use the previous revision: aws ecs update-service --cluster my-cluster --service myapp --task-definition myapp:42 (where 42 is the previous revision number). EKS: kubectl rollout undo deployment/myapp. Both approaches are near-instant because the previous Docker image is already cached in ECR.

How do I achieve zero-downtime deploys?

ECS: Set minimumHealthyPercent: 100 and maximumPercent: 200 in the service deployment configuration. ECS starts new tasks alongside old ones, waits for health checks to pass, then drains old tasks. EKS: Set maxSurge: 1 and maxUnavailable: 0 in the Deployment rolling update strategy. Enable ALB deregistration delay (30s) on both platforms so in-flight requests complete before old containers stop.

What is the difference between ECS and EKS?

ECS (Elastic Container Service) is AWS-native container orchestration. You define tasks and services. Fargate mode is serverless -- no EC2 instances to manage. It is simpler to learn and operate. EKS (Elastic Kubernetes Service) runs standard Kubernetes. You get the full Kubernetes ecosystem (Helm, Istio, ArgoCD, etc.) and portability across clouds. EKS is more complex but more flexible. Choose ECS if you are AWS-only and want simplicity. Choose EKS if you need Kubernetes features, multi-cloud portability, or your team already knows Kubernetes.

Do I need Kubernetes?

No. For most Next.js deployments, ECS Fargate is simpler and sufficient. You get auto-scaling, rolling deployments, health checks, and ALB integration without learning Kubernetes. Choose EKS only if your organization already uses Kubernetes, you need its ecosystem (service mesh, GitOps, custom operators), or you want cloud portability.

How much does running containers on AWS cost compared to Vercel?

It depends on scale. A minimal ECS Fargate setup (2 tasks, 0.5 vCPU, 1GB RAM each) costs roughly $30-50/month. Add ALB ($20/month + data transfer) and ECR ($1-5/month). Total: ~$50-75/month for a small app. Vercel Pro is $20/month per seat but can spike with high traffic (bandwidth overages, function invocations). At high scale, containers are usually cheaper. At low scale, Vercel is cheaper and far less operational work.

How do I handle WebSockets or long-lived connections?

ALB supports WebSocket connections natively. Set the idle timeout on the ALB to match your longest expected connection (default 60s, max 4000s). For ECS, ensure your task's security group allows the traffic. For EKS, the ALB Ingress Controller supports WebSocket by default. Note that sticky sessions may be needed if your WebSocket server maintains in-memory state.

Can I use middleware in a container deployment?

Yes, middleware runs inside the container's Node.js runtime on every request. On Vercel, middleware runs at the edge in a V8 isolate with a limited API surface. In a container, middleware runs in full Node.js, so you get access to all Node.js APIs. The trade-off is latency -- Vercel edge middleware runs closer to the user, while container middleware runs in the container's region. For global latency, put CloudFront in front of the ALB.

How do I set up a custom domain with HTTPS?

Request a free TLS certificate from AWS Certificate Manager (ACM) for your domain. Attach it to the ALB listener on port 443. Create a CNAME or alias DNS record pointing your domain to the ALB's DNS name. The ALB terminates TLS -- traffic between the ALB and your containers is HTTP on port 3000 inside the VPC, which is fine for most use cases.

What is the best way to handle database connections in containers?

Each container process opens its own database connection pool. With auto-scaling, you can easily exhaust database connections. Use a connection pooler like PgBouncer (for PostgreSQL) or RDS Proxy (managed by AWS). Set your pool size conservatively -- for a 2-container deployment with max_connections: 20 each, that is 40 connections total. Monitor connection count in CloudWatch and scale the database before you hit the limit.