Building Cloud-Native Backend Systems
Building Cloud-Native Backend Systems
Cloud-native architecture represents a fundamental shift in how we build and deploy applications. This guide explores the essential patterns and practices for creating scalable, resilient backend systems in the cloud.
What is Cloud-Native?
Cloud-native applications are designed to take full advantage of cloud computing models. They're built with containers, microservices, and managed services, enabling rapid development and deployment.
Key Principles
- Containerization: Package applications in containers
- Microservices: Build loosely coupled services
- DevOps: Automate development and operations
- Continuous Delivery: Deploy frequently and reliably
- Observability: Monitor and debug effectively
Containerization with Docker
Multi-stage Dockerfile
# Multi-stage build for Node.js application FROM node:18-alpine AS builder WORKDIR /app # Copy package files COPY package*.json ./ RUN npm ci --only=production && npm cache clean --force # Copy source code COPY . . # Build application RUN npm run build # Production stage FROM node:18-alpine AS production # Create non-root user RUN addgroup -g 1001 -S nodejs RUN adduser -S nextjs -u 1001 WORKDIR /app # Copy built application COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist COPY --from=builder --chown=nextjs:nodejs /app/node_modules ./node_modules COPY --from=builder --chown=nextjs:nodejs /app/package.json ./package.json USER nextjs EXPOSE 3000 CMD ["npm", "start"]
Docker Compose for Development
# docker-compose.yml version: '3.8' services: app: build: . ports: - "3000:3000" environment: - NODE_ENV=development - DATABASE_URL=postgresql://user:password@postgres:5432/appdb - REDIS_URL=redis://redis:6379 depends_on: - postgres - redis volumes: - .:/app - /app/node_modules postgres: image: postgres:15-alpine environment: - POSTGRES_DB=appdb - POSTGRES_USER=user - POSTGRES_PASSWORD=password volumes: - postgres_data:/var/lib/postgresql/data ports: - "5432:5432" redis: image: redis:7-alpine ports: - "6379:6379" volumes: - redis_data:/data volumes: postgres_data: redis_data:
Kubernetes Deployment
Deployment Manifest
# k8s/deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: backend-api labels: app: backend-api spec: replicas: 3 selector: matchLabels: app: backend-api template: metadata: labels: app: backend-api spec: containers: - name: backend-api image: backend-api:latest ports: - containerPort: 3000 env: - name: NODE_ENV value: "production" - name: DATABASE_URL valueFrom: secretKeyRef: name: app-secrets key: database-url resources: requests: memory: "256Mi" cpu: "250m" limits: memory: "512Mi" cpu: "500m" livenessProbe: httpGet: path: /health port: 3000 initialDelaySeconds: 30 periodSeconds: 10 readinessProbe: httpGet: path: /ready port: 3000 initialDelaySeconds: 5 periodSeconds: 5
Service and Ingress
# k8s/service.yaml apiVersion: v1 kind: Service metadata: name: backend-api-service spec: selector: app: backend-api ports: - protocol: TCP port: 80 targetPort: 3000 type: ClusterIP --- # k8s/ingress.yaml apiVersion: networking.k8s.io/v1 kind: Ingress metadata: name: backend-api-ingress annotations: nginx.ingress.kubernetes.io/rewrite-target: / cert-manager.io/cluster-issuer: "letsencrypt-prod" spec: tls: - hosts: - api.example.com secretName: api-tls rules: - host: api.example.com http: paths: - path: / pathType: Prefix backend: service: name: backend-api-service port: number: 80
Infrastructure as Code
Terraform Configuration
# terraform/main.tf provider "aws" { region = var.aws_region } # VPC resource "aws_vpc" "main" { cidr_block = "10.0.0.0/16" enable_dns_hostnames = true enable_dns_support = true tags = { Name = "${var.project_name}-vpc" } } # Internet Gateway resource "aws_internet_gateway" "main" { vpc_id = aws_vpc.main.id tags = { Name = "${var.project_name}-igw" } } # EKS Cluster resource "aws_eks_cluster" "main" { name = "${var.project_name}-cluster" role_arn = aws_iam_role.eks_cluster.arn version = "1.28" vpc_config { subnet_ids = aws_subnet.private[*].id } depends_on = [ aws_iam_role_policy_attachment.eks_cluster_AmazonEKSClusterPolicy, ] } # RDS Database resource "aws_db_instance" "main" { identifier = "${var.project_name}-db" engine = "postgres" engine_version = "15.4" instance_class = "db.t3.micro" allocated_storage = 20 storage_type = "gp2" db_name = var.database_name username = var.database_username password = var.database_password vpc_security_group_ids = [aws_security_group.rds.id] db_subnet_group_name = aws_db_subnet_group.main.name backup_retention_period = 7 backup_window = "03:00-04:00" maintenance_window = "sun:04:00-sun:05:00" skip_final_snapshot = true }
Helm Charts
# helm/backend-api/values.yaml replicaCount: 3 image: repository: backend-api tag: latest pullPolicy: IfNotPresent service: type: ClusterIP port: 80 targetPort: 3000 ingress: enabled: true className: "nginx" annotations: nginx.ingress.kubernetes.io/rewrite-target: / hosts: - host: api.example.com paths: - path: / pathType: Prefix tls: - secretName: api-tls hosts: - api.example.com resources: limits: cpu: 500m memory: 512Mi requests: cpu: 250m memory: 256Mi autoscaling: enabled: true minReplicas: 3 maxReplicas: 10 targetCPUUtilizationPercentage: 70 targetMemoryUtilizationPercentage: 80
Observability
Prometheus Monitoring
# k8s/prometheus.yaml apiVersion: v1 kind: ConfigMap metadata: name: prometheus-config namespace: monitoring data: prometheus.yml: | global: scrape_interval: 15s scrape_configs: - job_name: 'backend-api' static_configs: - targets: ['backend-api-service:80'] metrics_path: /metrics scrape_interval: 5s
Application Metrics
// metrics.js const promClient = require('prom-client'); // Create a Registry const register = new promClient.Registry(); // Add default metrics promClient.collectDefaultMetrics({ register }); // Custom metrics const httpRequestDuration = new promClient.Histogram({ name: 'http_request_duration_seconds', help: 'Duration of HTTP requests in seconds', labelNames: ['method', 'route', 'status_code'], buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10] }); const httpRequestTotal = new promClient.Counter({ name: 'http_requests_total', help: 'Total number of HTTP requests', labelNames: ['method', 'route', 'status_code'] }); const activeConnections = new promClient.Gauge({ name: 'active_connections', help: 'Number of active connections' }); // Register custom metrics register.registerMetric(httpRequestDuration); register.registerMetric(httpRequestTotal); register.registerMetric(activeConnections); // Middleware for metrics const metricsMiddleware = (req, res, next) => { const start = Date.now(); res.on('finish', () => { const duration = (Date.now() - start) / 1000; const labels = { method: req.method, route: req.route?.path || req.path, status_code: res.statusCode }; httpRequestDuration.observe(labels, duration); httpRequestTotal.inc(labels); }); next(); }; // Metrics endpoint app.get('/metrics', async (req, res) => { res.set('Content-Type', register.contentType); res.end(await register.metrics()); }); module.exports = { register, metricsMiddleware, activeConnections };
Distributed Tracing
// tracing.js const { NodeSDK } = require('@opentelemetry/sdk-node'); const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node'); const { JaegerExporter } = require('@opentelemetry/exporter-jaeger'); const sdk = new NodeSDK({ instrumentations: [getNodeAutoInstrumentations()], traceExporter: new JaegerExporter({ endpoint: process.env.JAEGER_ENDPOINT || 'http://jaeger:14268/api/traces', }), }); sdk.start(); // Custom span const { trace } = require('@opentelemetry/api'); const tracer = trace.getTracer('backend-api'); app.get('/api/users/:id', async (req, res) => { const span = tracer.startSpan('get-user'); try { span.setAttributes({ 'user.id': req.params.id, 'http.method': req.method, 'http.url': req.url }); const user = await getUserById(req.params.id); span.setStatus({ code: 1 }); // OK res.json(user); } catch (error) { span.setStatus({ code: 2, message: error.message }); // ERROR span.recordException(error); res.status(500).json({ error: 'Internal server error' }); } finally { span.end(); } });
CI/CD Pipeline
GitHub Actions
# .github/workflows/deploy.yml name: Deploy to Kubernetes on: push: branches: [ main ] pull_request: branches: [ main ] env: REGISTRY: ghcr.io IMAGE_NAME: ${{ github.repository }} jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v3 - name: Setup Node.js uses: actions/setup-node@v3 with: node-version: '18' cache: 'npm' - name: Install dependencies run: npm ci - name: Run tests run: npm test - name: Run linting run: npm run lint build-and-deploy: needs: test runs-on: ubuntu-latest if: github.ref == 'refs/heads/main' steps: - uses: actions/checkout@v3 - name: Set up Docker Buildx uses: docker/setup-buildx-action@v2 - name: Log in to Container Registry uses: docker/login-action@v2 with: registry: ${{ env.REGISTRY }} username: ${{ github.actor }} password: ${{ secrets.GITHUB_TOKEN }} - name: Extract metadata id: meta uses: docker/metadata-action@v4 with: images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }} tags: | type=ref,event=branch type=ref,event=pr type=sha,prefix={{branch}}- - name: Build and push Docker image uses: docker/build-push-action@v4 with: context: . push: true tags: ${{ steps.meta.outputs.tags }} labels: ${{ steps.meta.outputs.labels }} - name: Deploy to Kubernetes uses: azure/k8s-deploy@v1 with: manifests: | k8s/deployment.yaml k8s/service.yaml k8s/ingress.yaml images: | ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }} kubectl-version: 'latest'
Security Best Practices
Pod Security
# k8s/pod-security.yaml apiVersion: v1 kind: PodSecurityPolicy metadata: name: restricted-psp spec: privileged: false allowPrivilegeEscalation: false requiredDropCapabilities: - ALL volumes: - 'configMap' - 'emptyDir' - 'projected' - 'secret' - 'downwardAPI' - 'persistentVolumeClaim' runAsUser: rule: 'MustRunAsNonRoot' seLinux: rule: 'RunAsAny' fsGroup: rule: 'RunAsAny'
Network Policies
# k8s/network-policy.yaml apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: backend-api-netpol spec: podSelector: matchLabels: app: backend-api policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx ports: - protocol: TCP port: 3000 egress: - to: - namespaceSelector: matchLabels: name: database ports: - protocol: TCP port: 5432 - to: [] ports: - protocol: TCP port: 53 - protocol: UDP port: 53
Cost Optimization
Horizontal Pod Autoscaler
# k8s/hpa.yaml apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: backend-api-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: backend-api minReplicas: 3 maxReplicas: 20 metrics: - type: Resource resource: name: cpu target: type: Utilization averageUtilization: 70 - type: Resource resource: name: memory target: type: Utilization averageUtilization: 80
Vertical Pod Autoscaler
# k8s/vpa.yaml apiVersion: autoscaling.k8s.io/v1 kind: VerticalPodAutoscaler metadata: name: backend-api-vpa spec: targetRef: apiVersion: apps/v1 kind: Deployment name: backend-api updatePolicy: updateMode: "Auto" resourcePolicy: containerPolicies: - containerName: backend-api minAllowed: cpu: 100m memory: 128Mi maxAllowed: cpu: 1000m memory: 1Gi
Conclusion
Cloud-native backend development requires a shift in mindset and tooling. By embracing containers, Kubernetes, and modern DevOps practices, you can build systems that are scalable, resilient, and cost-effective.
Key takeaways:
- Start with containerization
- Use Kubernetes for orchestration
- Implement comprehensive observability
- Automate everything with CI/CD
- Focus on security from the start
- Optimize for cost and performance
Remember: Cloud-native is not just about technology—it's about building systems that can evolve and scale with your business needs.
Related Articles
Incident Playbook for Beginners: Real-World Monitoring and Troubleshooting Stories
A story-driven, plain English incident playbook for new backend & SRE engineers. Find, fix, and prevent outages with empathy and practical steps.
System Design Power-Guide 2025: What To Learn, In What Order, With Real-World Links
Stop bookmarking random threads. This is a tight, no-fluff map of what to study for system design in 2025 - what each topic is, why it matters in interviews and production, and where to go deeper.
DSA Patterns Master Guide: How To Identify Problems, Pick Patterns, and Practice (With LeetCode Sets)
A practical, pattern-first road map for entry-level engineers. Learn how to identify the right pattern quickly, apply a small algorithm template, know variants and pitfalls, and practice with curated LeetCode problems.