1 Executive Summary

IntakeIQ is an AI-powered dental intake automation platform deployed on Amazon Web Services (AWS). The infrastructure is purpose-built for healthcare workloads, with HIPAA compliance as a first-class architectural requirement rather than an afterthought.

Why AWS

AWS provides the most comprehensive suite of HIPAA-eligible services in the cloud market. Every service used in this stack is covered under the AWS Business Associate Addendum (BAA), which is a legal prerequisite for processing Protected Health Information (PHI).

HIPAA Compliance Posture

RequirementImplementationStatus
Business Associate AgreementAWS BAA signed via AWS ArtifactActive
Encryption at RestAES-256 via AWS KMS (all data stores)Active
Encryption in TransitTLS 1.3 enforced on ALB; S3 TLS-only policyActive
Audit LoggingCloudWatch Logs with 7-year (2,557-day) retentionActive
Access ControlIAM roles with least-privilege; security group chainActive
Backup & Recovery35-day automated RDS snapshots; S3 versioningActive
Network IsolationDatabase in private subnets; no public accessActive

Infrastructure at a Glance

AttributeValue
AWS Regionus-east-1
ProvisioningCloudFormation (Infrastructure as Code)
Application RuntimeECS Fargate (serverless containers)
DatabaseRDS PostgreSQL 16.4
Monthly Cost (Year 1)~$30–35
Monthly Cost (Year 2+)~$50–55
Deploy Time~15–20 minutes

2 Architecture Overview

The stack follows a three-tier architecture: load balancer, application, and database. All traffic enters through a single HTTPS endpoint on the Application Load Balancer, flows to a containerized Next.js application on ECS Fargate, and persists in an encrypted PostgreSQL database on RDS.

PATIENT BROWSER | | HTTPS (TLS 1.3) v +-------------------------------+ | Route 53 / DNS CNAME | Domain resolution +-------------------------------+ | v +-------------------------------+ | Application Load Balancer | Public Subnets 1 & 2 | Port 443 (HTTPS) | SSL Policy: TLS13-1-2-2021-06 | Port 80 → 301 Redirect | ACM Certificate (free) +-------------------------------+ | | Port 3000 (HTTP, internal) v +-------------------------------+ | ECS Fargate Task | Public Subnet (SG: ALB-only) | 0.25 vCPU / 0.5 GB RAM | Next.js 14 + Prisma ORM | Docker (node:20-alpine) | +------+--------+-------+------+ | | | v v v +--------+ +-----+ +----------+ | RDS | | S3 | |CloudWatch| |Postgres| |Cards| | Logs | +--------+ +-----+ +----------+ Private KMS KMS Encrypted Subnets Encrypted 7-yr Retention KMS Enc. SECURITY GROUP CHAIN: Internet → ALB-SG (443/80) → ECS-SG (3000, from ALB only) → RDS-SG (5432, from ECS only)

Data Flow

  1. Patient opens intake form in their browser (HTTPS via ALB).
  2. ALB terminates TLS and forwards the request to the ECS Fargate container on port 3000.
  3. The Next.js application processes the form, encrypts PHI fields using the KMS-backed encryption key, and writes to PostgreSQL via Prisma ORM.
  4. Insurance card images are uploaded directly to S3 with KMS server-side encryption.
  5. All application events are streamed to CloudWatch Logs with 7-year retention.

3 AWS Services Used

ServicePurposeWhy NeededMonthly Cost
VPC Isolated virtual network Network boundary for HIPAA; isolates RDS in private subnets Free
ECS Fargate Serverless container runtime Runs the IntakeIQ Docker container without managing servers ~$9
ECR Docker image registry Stores the application container image ~$1
RDS PostgreSQL Managed relational database Stores patient records, intake sessions, practice data Free (Yr 1) / ~$13
ALB Application Load Balancer HTTPS termination, TLS 1.3, health checks, HTTP→HTTPS redirect ~$16
ACM SSL/TLS certificates Free managed certificate for the application domain Free
S3 Object storage Encrypted insurance card image storage with lifecycle management ~$0.50
KMS Encryption key management AES-256 customer-managed key for all PHI encryption (RDS, S3, Logs) ~$1
CloudWatch Logs Centralized logging HIPAA-required audit logging with 7-year retention, KMS encrypted ~$2–3
SSM Parameter Store Secrets management Securely stores DATABASE_URL, NEXTAUTH_SECRET, PHI_ENCRYPTION_KEY Free
IAM Identity & access management Least-privilege roles for ECS execution, task, and RDS monitoring Free
CloudFormation Infrastructure as Code Reproducible, auditable stack provisioning and teardown Free

4 Cost Breakdown

Year 1 (Free Tier)
~$30–35/mo
RDS free for 12 months
Year 2+
~$50–55/mo
RDS db.t4g.micro adds ~$13

Year 1 Breakdown (New AWS Account)

ServiceDescriptionMonthly Cost
ECS Fargate0.25 vCPU + 0.5 GB RAM, 1 task 24/7~$9
RDS PostgreSQLdb.t4g.micro, 20 GB gp3Free
ALBApplication Load Balancer + LCU hours~$16
S3Insurance card storage (low volume)~$0.50
KMS1 customer-managed key~$1
CloudWatchLog ingestion + storage~$2
ECRDocker image storage~$1
Total~$30–35

Cost Optimizations Applied

OptimizationSavingsTrade-off
NAT Gateway removed $32/month Fargate runs in public subnets with security groups restricting access to ALB only. Database remains in private subnets. No material security reduction.
Container Insights disabled ~$3/month CloudWatch Logs still active for application logs. Advanced container metrics (CPU/memory dashboards) not available until enabled.
Single Fargate task ~$9/month No redundancy. Acceptable for early-stage; scale to 2 tasks when needed.
Single-AZ RDS ~$13/month No Multi-AZ failover. 35-day backup retention mitigates data loss risk.

Scaling Cost Projections

Growth StageUpgradeCost Impact
10+ dental practicesRDS db.t4g.small (2 vCPU, 2 GB)+$15/month
Stricter compliance needsAdd NAT Gateway, move Fargate to private subnets+$32/month
High traffic / uptime SLAAdd second Fargate task+$9/month
1 year of stable usageRDS Reserved Instance (1-year term)−30% on RDS
50+ practicesRDS db.t4g.medium + Multi-AZ~$80/month total for RDS
Budget Headroom With a $150–200/month budget ceiling, the current stack leaves significant room for all projected scaling upgrades before exceeding budget.

5 Network Architecture

VPC Layout

SubnetCIDRAZTypeResources
Public Subnet 110.0.1.0/24us-east-1aPublicALB, ECS Fargate
Public Subnet 210.0.2.0/24us-east-1bPublicALB
Private Subnet 110.0.10.0/24us-east-1aPrivateRDS
Private Subnet 210.0.11.0/24us-east-1bPrivateRDS (subnet group requirement)
VPC: 10.0.0.0/16 +──────────────────────────────────────────────────────+ | | | PUBLIC SUBNETS (Internet Gateway attached) | | +──────────────────+ +──────────────────+ | | | 10.0.1.0/24 | | 10.0.2.0/24 | | | | us-east-1a | | us-east-1b | | | | | | | | | | ALB Endpoint | | ALB Endpoint | | | | ECS Fargate | | | | | +──────────────────+ +──────────────────+ | | | | PRIVATE SUBNETS (No internet route) | | +──────────────────+ +──────────────────+ | | | 10.0.10.0/24 | | 10.0.11.0/24 | | | | us-east-1a | | us-east-1b | | | | | | | | | | RDS PostgreSQL | | (RDS failover) | | | +──────────────────+ +──────────────────+ | | | +──────────────────────────────────────────────────────+

Security Groups

Security GroupInbound RuleSourceProtects
intakeiq-alb-sgTCP 443, TCP 800.0.0.0/0 (internet)Application Load Balancer
intakeiq-ecs-sgTCP 3000intakeiq-alb-sg onlyECS Fargate tasks
intakeiq-rds-sgTCP 5432intakeiq-ecs-sg onlyRDS PostgreSQL
Why is Fargate in public subnets? This is a deliberate cost optimization. A NAT Gateway costs $32/month just to exist. By placing Fargate in public subnets with AssignPublicIp: ENABLED, the container can pull images from ECR and communicate with AWS services without a NAT Gateway. Security is maintained because the ECS security group only accepts inbound traffic from the ALB security group on port 3000. No other traffic can reach the container.

6 Security Model

Defense in Depth

LayerControlDetails
EdgeTLS 1.3ALB SSL Policy ELBSecurityPolicy-TLS13-1-2-2021-06; HTTP auto-redirects to HTTPS (301)
NetworkSecurity Group ChainALB → ECS → RDS. Each layer only accepts traffic from the previous layer. No lateral movement possible.
NetworkPrivate SubnetsRDS has no internet route. Cannot be accessed from outside the VPC.
NetworkInvalid Header RejectionALB drops requests with malformed HTTP headers
ApplicationNon-root ContainerDocker runs as nextjs user (UID 1001), not root
ApplicationPHI Field EncryptionApplication-level AES-256 encryption on PHI fields using KMS-backed key
DataRDS Encryption at RestAES-256 via customer-managed KMS key
DataS3 EncryptionSSE-KMS with BucketKey enabled; TLS-only access policy
DataLog EncryptionCloudWatch Logs encrypted with KMS
SecretsSSM Parameter StoreDATABASE_URL, NEXTAUTH_SECRET, PHI_ENCRYPTION_KEY stored as SSM parameters, injected at runtime
AccessIAM Least PrivilegeECS Task Role only has S3, KMS, and CloudWatch permissions. Execution Role only has ECR pull and SSM read.

KMS Key Policy

The customer-managed KMS key (alias/intakeiq-phi-key) is used for:

  • RDS storage encryption
  • S3 object encryption (insurance cards)
  • CloudWatch log group encryption
  • Application-level PHI field encryption (Encrypt/Decrypt/GenerateDataKey)

Key policy grants kms:* to the root account and kms:Decrypt, kms:Encrypt, kms:GenerateDataKey to the ECS Task Role only.

No Public Database Access The RDS instance has no public IP and resides in private subnets with no internet gateway route. The only way to access the database is through the ECS container via aws ecs execute-command. Direct connections from developer laptops are intentionally blocked.

7 Database

AttributeValue
EnginePostgreSQL 16.4
Instance Classdb.t4g.micro (2 vCPU, 1 GB RAM) — Free Tier eligible for 12 months
Instance Identifierintakeiq-db
Storage20 GB gp3 (SSD) with auto-scaling up to 100 GB
EncryptionAES-256 via customer-managed KMS key
Multi-AZDisabled (cost optimization; enable at scale)
Backup Retention35 days (automated daily snapshots)
Backup Window03:00–04:00 UTC
Maintenance WindowSunday 04:00–05:00 UTC
Deletion ProtectionEnabled
Deletion PolicySnapshot (final snapshot created on stack deletion)
Performance InsightsEnabled (7-day retention, free tier)
Enhanced MonitoringEnabled (60-second granularity)
NetworkPrivate subnets only; ECS-only security group access on port 5432
Master Usernameintakeiq
Database Nameintakeiq
ORMPrisma (schema-driven migrations)

Connection String

Stored in SSM Parameter Store at /intakeiq/DATABASE_URL:

postgresql://intakeiq:<password>@<rds-endpoint>:5432/intakeiq

Storage Auto-Scaling

RDS is configured with MaxAllocatedStorage: 100. When the database reaches 90% of its allocated storage, AWS will automatically increase it in 10 GB increments up to 100 GB, with no downtime. gp3 storage provides 3,000 IOPS baseline regardless of volume size.

8 Container Platform

ECS Fargate Configuration

AttributeValue
Clusterintakeiq-cluster
Serviceintakeiq-service
Task CPU256 units (0.25 vCPU)
Task Memory512 MB
Desired Count1
Container Port3000
Container InsightsDisabled (cost optimization)
Public IPEnabled (replaces NAT Gateway)

Docker Multi-Stage Build

The Dockerfile uses a three-stage build process to minimize image size and attack surface:

StagePurposeBase
depsInstall production dependencies (npm ci --omit=dev)node:20-alpine
builderGenerate Prisma client, build Next.js (npm run build)node:20-alpine
runnerProduction runtime with standalone output onlynode:20-alpine

The final image contains only the Next.js standalone output, static assets, and Prisma client. No dev dependencies, no source code, no build tooling.

Security

  • Non-root user: nextjs (UID 1001, GID 1001)
  • Alpine Linux base (minimal attack surface)
  • Next.js telemetry disabled (NEXT_TELEMETRY_DISABLED=1)

Health Check

# Container-level (ECS Task Definition)
CMD-SHELL wget --no-verbose --tries=1 --spider http://localhost:3000/ || exit 1
# Interval: 30s | Timeout: 5s | Retries: 3 | Start Period: 60s

# ALB Target Group health check
Path: /
Interval: 30s | Timeout: 5s | Healthy: 2 | Unhealthy: 3

Zero-Downtime Deployments

The ECS Service is configured with:

  • MinimumHealthyPercent: 100 — the existing task stays running
  • MaximumPercent: 200 — a new task starts alongside the old one

During deployment, ECS launches a new task with the updated image, waits for it to pass health checks, registers it with the ALB target group, then drains and stops the old task. Zero requests are dropped.

9 Storage

S3 Bucket — Insurance Card Uploads

AttributeValue
Bucket Nameintakeiq-insurance-cards-{AccountId}
EncryptionSSE-KMS with customer-managed key; BucketKey enabled for cost efficiency
VersioningEnabled (required for HIPAA; enables recovery of overwritten/deleted files)
Public AccessAll public access blocked (4-layer block)
TLS RequirementBucket policy denies all non-HTTPS requests
LifecycleTransition to S3 Standard-IA after 90 days (saves ~40% on storage)

Public Access Block

BlockPublicAcls:       true
BlockPublicPolicy:     true
IgnorePublicAcls:      true
RestrictPublicBuckets: true

TLS-Only Bucket Policy

A bucket policy explicitly denies all S3 operations when aws:SecureTransport is false. This ensures insurance card images are never transmitted in plaintext, even within the AWS network.

Access Control

Only the ECS Task Role has S3 permissions, limited to:

  • s3:PutObject, s3:GetObject, s3:DeleteObject on bucket objects
  • s3:ListBucket on the bucket itself

No other IAM principal (including the deployment user) has default access to patient insurance cards.

10 Logging & Monitoring

CloudWatch Logs

AttributeValue
Log Group/ecs/intakeiq
Retention2,557 days (7 years) — HIPAA minimum
EncryptionKMS-encrypted with customer-managed key
Stream Prefixecs/
Log Driverawslogs (built into Fargate)
Cost~$0.50/GB ingested + $0.03/GB/month stored

How to Tail Logs

# Live tail (last 30 minutes)
aws logs tail /ecs/intakeiq --since 30m --region us-east-1

# Follow mode (real-time streaming)
aws logs tail /ecs/intakeiq --follow --region us-east-1

# Search for errors
aws logs filter-log-events \
  --log-group-name /ecs/intakeiq \
  --filter-pattern "ERROR" \
  --start-time $(date -d '1 hour ago' +%s000) \
  --region us-east-1

Cost Monitoring

AWS Budgets should be configured to alert before spending exceeds expectations:

Navigate to AWS Console → Billing & Cost Management → Budgets
Click Create a budget and select Cost budget — Recommended
Set monthly budget amount (recommended: $75)
Configure email alerts at 80% and 100% thresholds
Enable Cost Explorer for daily per-service breakdowns

RDS Monitoring

  • Performance Insights — enabled with 7-day free retention. View query performance, wait events, and top SQL.
  • Enhanced Monitoring — 60-second granularity OS metrics (CPU, memory, disk I/O, network).

11 Deployment Guide

Prerequisites

  • AWS account with credit card on file
  • AWS CLI installed and configured (brew install awscli && aws configure)
  • Docker Desktop installed and running
  • Domain name ready (e.g., app.intakeiq.com)

Step-by-Step Deployment

Sign the AWS BAA

Navigate to AWS Console → AWS Artifact → Agreements. Accept the AWS Business Associate Addendum (BAA). This is free and legally required before any PHI touches AWS infrastructure.

Switch Prisma to PostgreSQL

Edit app/prisma/schema.prisma:

# Change provider from sqlite to postgresql
datasource db {
  provider = "postgresql"
  url      = env("DATABASE_URL")
}

Then install dependencies:

cd app && npm install

Run the Deploy Script

chmod +x deployment/deploy.sh
cd deployment
./deploy.sh

The script will prompt for:

  • Database password — minimum 12 characters, store securely
  • Domain name — e.g., app.intakeiq.com

It will create an ECR repository, build and push the Docker image, and deploy the full CloudFormation stack (~15–20 minutes).

Validate the SSL Certificate

After CloudFormation completes, go to AWS Console → Certificate Manager. Find the pending certificate, copy the CNAME record, and add it to your DNS provider. Validation takes 5–30 minutes.

Point Your Domain

# Get the ALB DNS name
aws cloudformation describe-stacks --stack-name intakeiq-stack \
  --query 'Stacks[0].Outputs[?OutputKey==`ALBDNSName`].OutputValue' \
  --output text

Create a CNAME record in your DNS provider pointing your subdomain to this ALB address.

Update Secrets

# Generate real secrets
openssl rand -base64 32   # NEXTAUTH_SECRET
openssl rand -hex 32      # PHI_ENCRYPTION_KEY

Update the values in AWS Console → Systems Manager → Parameter Store:

  • /intakeiq/NEXTAUTH_SECRET
  • /intakeiq/PHI_ENCRYPTION_KEY

Restart ECS to pick up new values:

aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --force-new-deployment \
  --region us-east-1

Run Database Migrations

# Enable ECS Exec (one-time)
aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --enable-execute-command \
  --region us-east-1

# Get the running task ID
TASK_ID=$(aws ecs list-tasks \
  --cluster intakeiq-cluster \
  --service-name intakeiq-service \
  --query 'taskArns[0]' \
  --output text \
  --region us-east-1)

# SSH into the container
aws ecs execute-command \
  --cluster intakeiq-cluster \
  --task $TASK_ID \
  --container intakeiq \
  --interactive \
  --command "/bin/sh" \
  --region us-east-1

# Inside the container:
npx prisma migrate deploy

12 Operations Runbook

Deploy an Application Update

# 1. Authenticate with ECR
AWS_ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text)
ECR_URI="${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com/intakeiq"

aws ecr get-login-password --region us-east-1 | \
  docker login --username AWS --password-stdin \
  ${AWS_ACCOUNT_ID}.dkr.ecr.us-east-1.amazonaws.com

# 2. Build and push
docker build -t intakeiq ../app
docker tag intakeiq:latest ${ECR_URI}:latest
docker push ${ECR_URI}:latest

# 3. Force new deployment (zero-downtime)
aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --force-new-deployment \
  --region us-east-1

SSH into Running Container (ECS Exec)

TASK_ID=$(aws ecs list-tasks \
  --cluster intakeiq-cluster \
  --service-name intakeiq-service \
  --query 'taskArns[0]' --output text --region us-east-1)

aws ecs execute-command \
  --cluster intakeiq-cluster \
  --task $TASK_ID \
  --container intakeiq \
  --interactive \
  --command "/bin/sh" \
  --region us-east-1

Run Database Migrations

# SSH into the container first (see above), then:
npx prisma migrate deploy

Restart the Service

aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --force-new-deployment \
  --region us-east-1

Check Service Status

# Service overview
aws ecs describe-services \
  --cluster intakeiq-cluster \
  --services intakeiq-service \
  --query 'services[0].{Status:status,Running:runningCount,Desired:desiredCount,Deployments:deployments[*].{Status:status,Running:runningCount,Desired:desiredCount}}' \
  --region us-east-1

# Recent logs
aws logs tail /ecs/intakeiq --since 30m --region us-east-1

# CloudFormation stack outputs
aws cloudformation describe-stacks \
  --stack-name intakeiq-stack \
  --query 'Stacks[0].Outputs' \
  --region us-east-1

Update an SSM Secret

aws ssm put-parameter \
  --name "/intakeiq/NEXTAUTH_SECRET" \
  --value "your-new-secret-value" \
  --type String \
  --overwrite \
  --region us-east-1

# Restart to pick up new value
aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --force-new-deployment \
  --region us-east-1

13 Disaster Recovery

ComponentProtection MechanismRPORTO
Database (RDS) Automated daily snapshots with 35-day retention; DeletionPolicy: Snapshot; DeletionProtection: true 24 hours (last snapshot) ~15 minutes (restore from snapshot)
File Storage (S3) Versioning enabled; can recover overwritten or deleted objects 0 (versioned) Immediate
Application Docker image stored in ECR; source code in Git; ECS auto-recovers failed tasks 0 (image in registry) ~2 minutes (new task launch)
Infrastructure CloudFormation template (Infrastructure as Code); entire stack rebuildable from YAML 0 (template in Git) ~20 minutes (full redeploy)
Secrets SSM Parameter Store (managed by AWS; regionally durable) 0 Immediate
Logs CloudWatch with 7-year retention and KMS encryption 0 Immediate

Recovery Scenarios

Scenario: Database corruption or accidental deletion

# List available snapshots
aws rds describe-db-snapshots \
  --db-instance-identifier intakeiq-db \
  --query 'DBSnapshots[*].{ID:DBSnapshotIdentifier,Time:SnapshotCreateTime,Status:Status}' \
  --region us-east-1

# Restore from a snapshot (creates a new RDS instance)
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier intakeiq-db-restored \
  --db-snapshot-identifier <snapshot-id> \
  --region us-east-1

Scenario: Complete infrastructure loss

  1. Check out the CloudFormation template from Git
  2. Re-run the deploy script (deployment/deploy.sh)
  3. Restore the RDS database from the most recent snapshot
  4. Update SSM parameters with the correct secrets
  5. Point the domain CNAME to the new ALB
RDS Deletion Safeguards Two layers protect against accidental database deletion: DeletionProtection: true prevents the instance from being deleted via API/Console, and DeletionPolicy: Snapshot ensures a final snapshot is created if the CloudFormation stack is deleted.

14 Scaling Guide

When and How to Scale

TriggerWhat to UpgradeHowCost Impact
Database CPU consistently >70% RDS instance class Modify instance to db.t4g.small (2 vCPU, 2 GB) via Console or CLI +$15/month
Need high availability / uptime SLA ECS desired count Set DesiredCount: 2 on the ECS service +$9/month
Compliance audit requires private Fargate Add NAT Gateway Add NAT Gateway to CloudFormation, move Fargate to private subnets +$32/month
Stable 1+ year usage pattern RDS Reserved Instance Purchase 1-year RI via Console (same instance class) −30% on RDS
50+ practices, high concurrency Larger RDS + Multi-AZ Upgrade to db.t4g.medium with Multi-AZ enabled ~$80/month for RDS
Application needs more CPU/memory Fargate task size Update Task Definition to 0.5 vCPU / 1 GB ~$18/month (from $9)

Scaling ECS Tasks (Quick)

aws ecs update-service \
  --cluster intakeiq-cluster \
  --service intakeiq-service \
  --desired-count 2 \
  --region us-east-1

Scaling RDS (Quick)

# Modify instance class (causes ~5 min downtime during maintenance window)
aws rds modify-db-instance \
  --db-instance-identifier intakeiq-db \
  --db-instance-class db.t4g.small \
  --apply-immediately \
  --region us-east-1
Storage scales automatically. RDS is configured with MaxAllocatedStorage: 100 GB. Storage will auto-expand from the initial 20 GB with zero downtime when utilization exceeds 90%. S3 has no storage limits.

15 Troubleshooting

CloudFormation stack creation failed

Symptom: Stack status is CREATE_FAILED or ROLLBACK_COMPLETE.
# Find the first failure event
aws cloudformation describe-stack-events \
  --stack-name intakeiq-stack \
  --region us-east-1 \
  --query 'StackEvents[?ResourceStatus==`CREATE_FAILED`].{Resource:LogicalResourceId,Reason:ResourceStatusReason}'

Common causes: insufficient IAM permissions, invalid DB password (too short), region limits, or service quotas.

ECS tasks keep restarting

Symptom: Running count oscillates; tasks stop shortly after starting.
# Check recent application logs
aws logs tail /ecs/intakeiq --since 30m --region us-east-1

# Check stopped task exit reason
aws ecs describe-tasks \
  --cluster intakeiq-cluster \
  --tasks $(aws ecs list-tasks --cluster intakeiq-cluster --desired-status STOPPED --query 'taskArns[0]' --output text --region us-east-1) \
  --region us-east-1 \
  --query 'tasks[0].{StopCode:stopCode,StoppedReason:stoppedReason,ExitCode:containers[0].exitCode}'

Common causes: missing/invalid DATABASE_URL, Prisma migration not run, secrets still set to CHANGE_ME_AFTER_DEPLOY, container health check failing.

SSL certificate stuck in "Pending validation"

Symptom: Certificate status remains Pending validation for more than 30 minutes.

Fix: Verify the CNAME record in your DNS provider matches exactly what ACM shows (including the trailing dot, if your provider requires it). DNS propagation can take up to 30 minutes. Use dig to verify:

dig CNAME _acme-challenge.app.intakeiq.com

Cannot connect to the database

Symptom: Connection refused or timeout when trying to reach RDS.

By design, RDS is in private subnets with no public access. You cannot connect from your laptop. To access the database, use ECS Exec to shell into the running container, then use psql or Prisma from inside.

Application returns 502 Bad Gateway

Common causes:

  • ECS task hasn't finished starting (wait 60s for the start period)
  • Application crashed — check CloudWatch logs
  • Health check path (/) is returning non-200 status
  • Security group misconfigured — verify ECS SG allows port 3000 from ALB SG

Out of memory errors

The container has 512 MB RAM. If the application needs more:

# Update the task definition to 1 GB
# Edit cloudformation.yml: change Memory from '512' to '1024'
# Then redeploy the CloudFormation stack
aws cloudformation update-stack \
  --stack-name intakeiq-stack \
  --template-body file://cloudformation.yml \
  --capabilities CAPABILITY_NAMED_IAM \
  --parameters ... \
  --region us-east-1

Quick Diagnostic Checklist

CheckCommand
Is the task running?aws ecs list-tasks --cluster intakeiq-cluster --region us-east-1
What do the logs say?aws logs tail /ecs/intakeiq --since 30m --region us-east-1
Is the ALB healthy?aws elbv2 describe-target-health --target-group-arn <tg-arn>
Stack outputs?aws cloudformation describe-stacks --stack-name intakeiq-stack --query 'Stacks[0].Outputs'
RDS status?aws rds describe-db-instances --db-instance-identifier intakeiq-db --query 'DBInstances[0].DBInstanceStatus'