Skip to content
IRC-Coding IRC-Coding
DevOps Grundlagen CI/CD Docker Kubernetes Automation Monitoring Infrastructure as Code

DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

DevOps Grundlagen mit CI/CD, Docker, Kubernetes, Automation, Monitoring und Infrastructure as Code. Continuous Integration, Containerisierung, Orchestrierung, GitOps mit praktischen Beispielen.

S

schutzgeist

2 min read

DevOps Grundlagen: CI/CD, Docker, Kubernetes, Automation, Monitoring & Infrastructure as Code

Dieser Beitrag ist eine umfassende Einführung in die DevOps Grundlagen – inklusive CI/CD, Docker, Kubernetes, Automation, Monitoring und Infrastructure as Code mit praktischen Beispielen.

In a Nutshell

DevOps ist eine Kultur und Methodik, die Softwareentwicklung (Dev) und IT-Operationen (Ops) zusammenbringt, um die Softwarelieferkette zu automatisieren und zu beschleunigen.

Kompakte Fachbeschreibung

DevOps ist ein Ansatz, der durch Automatisierung, Kollaboration und kontinuierliche Verbesserung die Lücke zwischen Entwicklung und Betrieb überwindet.

Kernkomponenten:

Continuous Integration/Continuous Deployment (CI/CD)

  • Version Control: Git, GitHub, GitLab, Bitbucket
  • Build Automation: Jenkins, GitHub Actions, GitLab CI
  • Testing: Unit Tests, Integration Tests, E2E Tests
  • Deployment: Automated Rollouts, Blue/Green, Canary

Containerisierung

  • Docker: Container-Plattform für Anwendungsisolation
  • Docker Compose: Multi-Container-Anwendungen
  • Container Registry: Docker Hub, Harbor, AWS ECR
  • Image Optimization: Multi-stage Builds, Layer Caching

Orchestrierung

  • Kubernetes: Container-Orchestrierungsplattform
  • Services: Pods, Deployments, Services, Ingress
  • Configuration: ConfigMaps, Secrets, Helm Charts
  • Scaling: Horizontal Pod Autoscaling, Cluster Autoscaling

Infrastructure as Code (IaC)

  • Terraform: Multi-Cloud Infrastructure Provisioning
  • Ansible: Configuration Management
  • CloudFormation: AWS-native IaC
  • Pulumi: Programmierbare Infrastruktur

Monitoring & Observability

  • Metrics: Prometheus, Grafana, InfluxDB
  • Logging: ELK Stack, Fluentd, Loki
  • Tracing: Jaeger, Zipkin, OpenTelemetry
  • APM: Application Performance Monitoring

Prüfungsrelevante Stichpunkte

  • DevOps: Kultur und Methodik für Softwareentwicklung und Betrieb
  • CI/CD: Continuous Integration und Continuous Deployment
  • Docker: Container-Plattform für Anwendungsisolation
  • Kubernetes: Container-Orchestrierungsplattform
  • Infrastructure as Code: Automatisierte Infrastrukturverwaltung
  • Monitoring: Überwachung von Systemen und Anwendungen
  • Automation: Automatisierung von wiederkehrenden Aufgaben
  • GitOps: Git-basierte Operations-Workflows
  • IHK-relevant: Moderne DevOps-Praktiken und -Tools

Kernkomponenten

  1. Version Control: Git-Workflows, Branching-Strategien
  2. CI/CD Pipeline: Build, Test, Deploy, Monitor
  3. Containerisierung: Docker, Container-Images, Registry
  4. Orchestrierung: Kubernetes, Services, Scaling
  5. IaC: Terraform, Ansible, Configuration Management
  6. Monitoring: Metrics, Logging, Tracing
  7. Security: Scanning, Compliance, Secret Management
  8. Collaboration: Team-Workflows, Communication

Praxisbeispiele

1. CI/CD Pipeline mit GitHub Actions

# .github/workflows/ci-cd.yml
name: CI/CD Pipeline

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]
  release:
    types: [ published ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}
  NODE_VERSION: '18'
  PYTHON_VERSION: '3.11'

jobs:
  # Code Quality and Security
  quality:
    name: Code Quality & Security
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        fetch-depth: 0
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-dev.txt
    
    - name: Run ESLint
      run: npm run lint
    
    - name: Run Prettier check
      run: npm run format:check
    
    - name: Run Python linting
      run: |
        flake8 src/
        black --check src/
        isort --check-only src/
    
    - name: Run security scan
      run: |
        npm audit --audit-level moderate
        safety check
    
    - name: Run SonarCloud scan
      uses: SonarSource/sonarcloud-github-action@master
      env:
        GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}

  # Testing
  test:
    name: Test Suite
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [16, 18, 20]
        python-version: [3.9, 3.11, 3.12]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js ${{ matrix.node-version }}
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    
    - name: Setup Python ${{ matrix.python-version }}
      uses: actions/setup-python@v4
      with:
        python-version: ${{ matrix.python-version }}
        cache: 'pip'
    
    - name: Install dependencies
      run: |
        npm ci
        pip install -r requirements.txt
        pip install -r requirements-test.txt
    
    - name: Run unit tests
      run: |
        npm run test:unit
        pytest tests/unit/ -v --cov=src --cov-report=xml
    
    - name: Run integration tests
      run: |
        npm run test:integration
        pytest tests/integration/ -v
    
    - name: Upload coverage to Codecov
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage.xml
        flags: unittests
        name: codecov-umbrella

  # Build and Test Docker Image
  build:
    name: Build Docker Image
    runs-on: ubuntu-latest
    needs: [quality, test]
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v3
    
    - name: Log in to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=sha,prefix={{branch}}-
          type=raw,value=latest,enable={{is_default_branch}}
    
    - name: Build and push Docker image
      uses: docker/build-push-action@v5
      with:
        context: .
        platforms: linux/amd64,linux/arm64
        push: true
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max
    
    - name: Run container security scan
      uses: aquasecurity/trivy-action@master
      with:
        image-ref: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}:${{ github.sha }}
        format: 'sarif'
        output: 'trivy-results.sarif'
    
    - name: Upload Trivy scan results to GitHub Security tab
      uses: github/codeql-action/upload-sarif@v2
      with:
        sarif_file: 'trivy-results.sarif'

  # Deploy to Staging
  deploy-staging:
    name: Deploy to Staging
    runs-on: ubuntu-latest
    needs: build
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_STAGING }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes
      run: |
        export KUBECONFIG=kubeconfig
        helm upgrade --install app-staging ./helm/app \
          --namespace staging \
          --create-namespace \
          --set image.tag=${{ github.sha }} \
          --set environment=staging \
          --values helm/values-staging.yaml
    
    - name: Run smoke tests
      run: |
        export KUBECONFIG=kubeconfig
        kubectl wait --for=condition=ready pod -l app=app-staging -n staging --timeout=300s
        npm run test:smoke -- --env=staging
    
    - name: Run integration tests against staging
      run: |
        npm run test:integration -- --env=staging

  # Deploy to Production
  deploy-production:
    name: Deploy to Production
    runs-on: ubuntu-latest
    needs: build
    if: github.event_name == 'release'
    environment: production
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup kubectl
      uses: azure/setup-kubectl@v3
      with:
        version: 'v1.28.0'
    
    - name: Configure kubectl
      run: |
        echo "${{ secrets.KUBE_CONFIG_PRODUCTION }}" | base64 -d > kubeconfig
        export KUBECONFIG=kubeconfig
    
    - name: Deploy to Kubernetes (Blue/Green)
      run: |
        export KUBECONFIG=kubeconfig
        
        # Deploy to green environment
        helm upgrade --install app-green ./helm/app \
          --namespace production \
          --set image.tag=${{ github.sha }} \
          --set environment=production \
          --set deployment.color=green \
          --values helm/values-production.yaml
        
        # Wait for green deployment to be ready
        kubectl wait --for=condition=ready pod -l app=app-green,color=green -n production --timeout=600s
        
        # Run health checks
        npm run test:health -- --env=production-green
        
        # Switch traffic to green
        kubectl patch service app-production -n production -p '{"spec":{"selector":{"color":"green"}}}'
        
        # Wait for traffic switch
        sleep 30
        
        # Run final tests
        npm run test:smoke -- --env=production
    
    - name: Cleanup blue environment
      run: |
        export KUBECONFIG=kubeconfig
        helm uninstall app-blue -n production || true
        kubectl delete deployment app-blue -n production || true
    
    - name: Notify deployment
      uses: 8398a7/action-slack@v3
      with:
        status: ${{ job.status }}
        channel: '#deployments'
        webhook_url: ${{ secrets.SLACK_WEBHOOK }}
      if: always()

  # Performance Testing
  performance:
    name: Performance Testing
    runs-on: ubuntu-latest
    needs: deploy-staging
    if: github.ref == 'refs/heads/develop'
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup k6
      run: |
        sudo gpg -k
        sudo gpg --no-default-keyring --keyring /usr/share/keyrings/k6-archive-keyring.gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys C5AD17C747E3415A3642D57D77C6C491D6AC1D69
        echo "deb [signed-by=/usr/share/keyrings/k6-archive-keyring.gpg] https://dl.k6.io/deb stable main" | sudo tee /etc/apt/sources.list.d/k6.list
        sudo apt-get update
        sudo apt-get install k6
    
    - name: Run performance tests
      run: |
        k6 run --out json=performance-results.json tests/performance/load-test.js
    
    - name: Upload performance results
      uses: actions/upload-artifact@v3
      with:
        name: performance-results
        path: performance-results.json
    
    - name: Analyze performance
      run: |
        npm run analyze:performance -- performance-results.json

  # Documentation
  docs:
    name: Build Documentation
    runs-on: ubuntu-latest
    needs: test
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Build documentation
      run: |
        npm run docs:build
        npm run docs:generate-api
    
    - name: Deploy to GitHub Pages
      uses: peaceiris/actions-gh-pages@v3
      if: github.ref == 'refs/heads/main'
      with:
        github_token: ${{ secrets.GITHUB_TOKEN }}
        publish_dir: ./docs/build

# Workflow for dependency updates
name: Dependency Updates

on:
  schedule:
    - cron: '0 2 * * 1'  # Every Monday at 2 AM
  workflow_dispatch:

jobs:
  update-dependencies:
    name: Update Dependencies
    runs-on: ubuntu-latest
    
    steps:
    - name: Checkout code
      uses: actions/checkout@v4
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ env.NODE_VERSION }}
        cache: 'npm'
    
    - name: Setup Python
      uses: actions/setup-python@v4
      with:
        python-version: ${{ env.PYTHON_VERSION }}
        cache: 'pip'
    
    - name: Update Node.js dependencies
      run: |
        npm update
        npm audit fix
    
    - name: Update Python dependencies
      run: |
        pip-compile requirements.in
        pip-compile requirements-dev.in
    
    - name: Run tests
      run: |
        npm ci
        npm run test
        pip install -r requirements.txt
        pytest tests/
    
    - name: Create Pull Request
      uses: peter-evans/create-pull-request@v5
      with:
        token: ${{ secrets.GITHUB_TOKEN }}
        commit-message: 'chore: update dependencies'
        title: 'chore: update dependencies'
        body: |
          Automated dependency update
          
          - Updated Node.js dependencies
          - Updated Python dependencies
          
          Please review the changes and ensure all tests pass.
        branch: chore/update-dependencies
        delete-branch: true

2. Docker Multi-Stage Build mit Best Practices

# Multi-stage Dockerfile for production-ready application
# Stage 1: Build stage
FROM node:18-alpine AS builder

# Set build arguments
ARG NODE_ENV=production
ARG APP_VERSION=1.0.0

# Set environment variables
ENV NODE_ENV=$NODE_ENV
ENV APP_VERSION=$APP_VERSION

# Install build dependencies
RUN apk add --no-cache \
    python3 \
    make \
    g++ \
    git

# Create app directory
WORKDIR /app

# Copy package files
COPY package*.json ./
COPY requirements.txt ./

# Install Node.js dependencies
RUN npm ci --only=production && npm cache clean --force

# Install Python dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy source code
COPY . .

# Run build and tests
RUN npm run build
RUN npm run test

# Stage 2: Runtime stage
FROM python:3.11-slim AS runtime

# Set runtime arguments
ARG APP_USER=appuser
ARG APP_UID=1001
ARG APP_GID=1001

# Set environment variables
ENV NODE_ENV=production
ENV PYTHONUNBUFFERED=1
ENV PYTHONDONTWRITEBYTECODE=1
ENV APP_PORT=3000

# Install runtime dependencies
RUN apt-get update && apt-get install -y \
    curl \
    ca-certificates \
    && rm -rf /var/lib/apt/lists/*

# Create non-root user
RUN groupadd -g $APP_GID $APP_USER && \
    useradd -m -u $APP_UID -g $APP_GID -s /bin/bash $APP_USER

# Create app directory
WORKDIR /app

# Copy built application from builder stage
COPY --from=builder --chown=$APP_USER:$APP_GID /app/dist ./dist
COPY --from=builder --chown=$APP_USER:$APP_GID /app/node_modules ./node_modules
COPY --from=builder --chown=$APP_USER:$APP_GID /app/requirements.txt ./

# Install Python production dependencies
RUN pip install --no-cache-dir -r requirements.txt

# Copy configuration files
COPY --chown=$APP_USER:$APP_GID config/ ./config/
COPY --chown=$APP_USER:$APP_GID scripts/ ./scripts/

# Set permissions
RUN chmod +x scripts/*.sh

# Switch to non-root user
USER $APP_USER

# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
    CMD curl -f http://localhost:$APP_PORT/health || exit 1

# Expose port
EXPOSE $APP_PORT

# Set entrypoint
ENTRYPOINT ["./scripts/entrypoint.sh"]

# Default command
CMD ["npm", "start"]

# Stage 3: Development stage
FROM runtime AS development

# Override environment for development
ENV NODE_ENV=development

# Install development dependencies
RUN apt-get update && apt-get install -y \
    git \
    vim \
    && rm -rf /var/lib/apt/lists/*

# Install Node.js development dependencies
RUN npm install

# Switch back to root for development tools
USER root

# Install development tools
RUN pip install --no-cache-dir pytest pytest-cov black flake8

# Switch back to app user
USER $APP_USER

# Override command for development
CMD ["npm", "run", "dev"]

# Stage 4: Testing stage
FROM builder AS testing

# Install test dependencies
RUN npm install --no-save
RUN pip install --no-cache-dir pytest pytest-cov

# Run comprehensive tests
RUN npm run test:coverage
RUN pytest tests/ --cov=src --cov-report=xml

# Security scanning
RUN npm audit --audit-level high
RUN safety check

# Stage 5: Security scanning stage
FROM builder AS security

# Install security scanning tools
RUN npm install -g audit-ci
RUN pip install safety bandit

# Run security scans
RUN audit-ci --moderate
RUN safety check --json --output safety-report.json
RUN bandit -r src/ -f json -o bandit-report.json

# Export security reports
COPY --from=security /app/safety-report.json /reports/
COPY --from=security /app/bandit-report.json /reports/

3. Kubernetes Deployment mit Helm und GitOps

# helm/app/Chart.yaml
apiVersion: v2
name: app
description: A Helm chart for deploying the application
type: application
version: 1.0.0
appVersion: "1.0.0"
home: https://github.com/organization/app
sources:
  - https://github.com/organization/app
maintainers:
  - name: DevOps Team
    email: devops@organization.com
keywords:
  - web
  - application
  - devops
annotations:
  category: WebApplication

# helm/app/values.yaml
# Default values for the application
replicaCount: 3

image:
  repository: ghcr.io/organization/app
  pullPolicy: IfNotPresent
  tag: "latest"

nameOverride: ""
fullnameOverride: ""

serviceAccount:
  create: true
  annotations: {}
  name: ""

podAnnotations: {}

podSecurityContext:
  fsGroup: 1001

securityContext:
  allowPrivilegeEscalation: false
  runAsNonRoot: true
  runAsUser: 1001
  runAsGroup: 1001
  capabilities:
    drop:
    - ALL
  readOnlyRootFilesystem: true

service:
  type: ClusterIP
  port: 80
  targetPort: 3000

ingress:
  enabled: true
  className: "nginx"
  annotations:
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
  hosts:
    - host: app.example.com
      paths:
        - path: /
          pathType: Prefix
  tls:
    - secretName: app-tls
      hosts:
        - app.example.com

resources:
  limits:
    cpu: 500m
    memory: 512Mi
  requests:
    cpu: 250m
    memory: 256Mi

autoscaling:
  enabled: true
  minReplicas: 3
  maxReplicas: 10
  targetCPUUtilizationPercentage: 70
  targetMemoryUtilizationPercentage: 80

nodeSelector: {}

tolerations: []

affinity:
  podAntiAffinity:
    preferredDuringSchedulingIgnoredDuringExecution:
    - weight: 100
      podAffinityTerm:
        labelSelector:
          matchExpressions:
          - key: app.kubernetes.io/name
            operator: In
            values:
            - app
        topologyKey: kubernetes.io/hostname

config:
  environment: production
  logLevel: info
  database:
    host: postgres.example.com
    port: 5432
    name: app_prod
  redis:
    host: redis.example.com
    port: 6379
  monitoring:
    enabled: true
    port: 9090

secrets:
  databasePassword: ""
  jwtSecret: ""
  apiKeys: ""

# helm/app/templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  {{- if not .Values.autoscaling.enabled }}
  replicas: {{ .Values.replicaCount }}
  {{- end }}
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  template:
    metadata:
      annotations:
        checksum/config: {{ include (print $.Template.BasePath "/configmap.yaml") . | sha256sum }}
        checksum/secret: {{ include (print $.Template.BasePath "/secret.yaml") . | sha256sum }}
        {{- with .Values.podAnnotations }}
        {{- toYaml . | nindent 8 }}
        {{- end }}
      labels:
        {{- include "app.selectorLabels" . | nindent 8 }}
    spec:
      {{- with .Values.imagePullSecrets }}
      imagePullSecrets:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      serviceAccountName: {{ include "app.serviceAccountName" . }}
      securityContext:
        {{- toYaml .Values.podSecurityContext | nindent 8 }}
      initContainers:
        - name: wait-for-db
          image: postgres:15-alpine
          command:
            - sh
            - -c
            - |
              until pg_isready -h {{ .Values.config.database.host }} -p {{ .Values.config.database.port }}; do
                echo "Waiting for database..."
                sleep 2
              done
        - name: migrate-db
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          command:
            - npm
            - run
            - migrate
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
      containers:
        - name: {{ .Chart.Name }}
          securityContext:
            {{- toYaml .Values.securityContext | nindent 12 }}
          image: "{{ .Values.image.repository }}:{{ .Values.image.tag | default .Chart.AppVersion }}"
          imagePullPolicy: {{ .Values.image.pullPolicy }}
          ports:
            - name: http
              containerPort: {{ .Values.service.targetPort }}
              protocol: TCP
            - name: metrics
              containerPort: {{ .Values.config.monitoring.port }}
              protocol: TCP
          livenessProbe:
            httpGet:
              path: /health
              port: http
            initialDelaySeconds: 30
            periodSeconds: 10
            timeoutSeconds: 5
            failureThreshold: 3
          readinessProbe:
            httpGet:
              path: /ready
              port: http
            initialDelaySeconds: 5
            periodSeconds: 5
            timeoutSeconds: 3
            failureThreshold: 3
          resources:
            {{- toYaml .Values.resources | nindent 12 }}
          envFrom:
            - configMapRef:
                name: {{ include "app.fullname" . }}
            - secretRef:
                name: {{ include "app.fullname" . }}
          volumeMounts:
            - name: tmp
              mountPath: /tmp
            - name: config
              mountPath: /app/config
              readOnly: true
        - name: log-shipper
          image: fluent/fluent-bit:2.0
          resources:
            limits:
              cpu: 100m
              memory: 128Mi
            requests:
              cpu: 50m
              memory: 64Mi
          volumeMounts:
            - name: varlog
              mountPath: /var/log
              readOnly: true
            - name: varlibdockercontainers
              mountPath: /var/lib/docker/containers
              readOnly: true
            - name: fluent-bit-config
              mountPath: /fluent-bit/etc/
      volumes:
        - name: tmp
          emptyDir: {}
        - name: config
          configMap:
            name: {{ include "app.fullname" . }}
        - name: varlog
          hostPath:
            path: /var/log
        - name: varlibdockercontainers
          hostPath:
            path: /var/lib/docker/containers
        - name: fluent-bit-config
          configMap:
            name: fluent-bit-config
      {{- with .Values.nodeSelector }}
      nodeSelector:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.affinity }}
      affinity:
        {{- toYaml . | nindent 8 }}
      {{- end }}
      {{- with .Values.tolerations }}
      tolerations:
        {{- toYaml . | nindent 8 }}
      {{- end }}

# helm/app/templates/hpa.yaml
{{- if .Values.autoscaling.enabled }}
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: {{ include "app.fullname" . }}
  minReplicas: {{ .Values.autoscaling.minReplicas }}
  maxReplicas: {{ .Values.autoscaling.maxReplicas }}
  metrics:
    {{- if .Values.autoscaling.targetCPUUtilizationPercentage }}
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetCPUUtilizationPercentage }}
    {{- end }}
    {{- if .Values.autoscaling.targetMemoryUtilizationPercentage }}
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: {{ .Values.autoscaling.targetMemoryUtilizationPercentage }}
    {{- end }}
{{- end }}

# helm/app/templates/monitoring.yaml
{{- if .Values.config.monitoring.enabled }}
apiVersion: v1
kind: Service
metadata:
  name: {{ include "app.fullname" . }}-metrics
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  type: ClusterIP
  ports:
    - port: {{ .Values.config.monitoring.port }}
      targetPort: metrics
      protocol: TCP
      name: metrics
  selector:
    {{- include "app.selectorLabels" . | nindent 4 }}

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: {{ include "app.fullname" . }}
  labels:
    {{- include "app.labels" . | nindent 4 }}
spec:
  selector:
    matchLabels:
      {{- include "app.selectorLabels" . | nindent 6 }}
  endpoints:
    - port: metrics
      interval: 30s
      path: /metrics
{{- end }}

# GitOps Application Manifest (ArgoCD)
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: app-production
  namespace: argocd
spec:
  project: default
  source:
    repoURL: https://github.com/organization/app-helm
    targetRevision: HEAD
    path: helm/app
    helm:
      valueFiles:
        - values-production.yaml
  destination:
    server: https://kubernetes.default.svc
    namespace: production
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true
    retry:
      limit: 5
      backoff:
        duration: 5s
        factor: 2
        maxDuration: 3m

4. Terraform Infrastructure as Code

# terraform/main.tf
provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      Project     = var.project_name
      ManagedBy   = "terraform"
    }
  }
}

# Terraform backend configuration
terraform {
  backend "s3" {
    bucket         = "terraform-state-${var.project_name}"
    key            = "infrastructure/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks-${var.project_name}"
  }
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = "~> 2.0"
    }
    
    helm = {
      source  = "hashicorp/helm"
      version = "~> 2.0"
    }
    
    random = {
      source  = "hashicorp/random"
      version = "~> 3.0"
    }
    
    null = {
      source  = "hashicorp/null"
      version = "~> 3.0"
    }
  }
}

# terraform/variables.tf
variable "aws_region" {
  description = "AWS region"
  type        = string
  default     = "us-east-1"
}

variable "environment" {
  description = "Environment name"
  type        = string
  default     = "production"
}

variable "project_name" {
  description = "Project name"
  type        = string
  default     = "my-app"
}

variable "vpc_cidr" {
  description = "CIDR block for VPC"
  type        = string
  default     = "10.0.0.0/16"
}

variable "availability_zones" {
  description = "List of availability zones"
  type        = list(string)
  default     = ["us-east-1a", "us-east-1b", "us-east-1c"]
}

variable "cluster_name" {
  description = "EKS cluster name"
  type        = string
  default     = "my-app-cluster"
}

variable "cluster_version" {
  description = "EKS cluster version"
  type        = string
  default     = "1.28"
}

variable "node_groups" {
  description = "EKS node groups configuration"
  type = map(object({
    instance_type = string
    min_size      = number
    max_size      = number
    desired_size  = number
    disk_size     = number
  }))
  
  default = {
    general = {
      instance_type = "t3.medium"
      min_size      = 3
      max_size      = 10
      desired_size  = 3
      disk_size     = 50
    }
    
    compute = {
      instance_type = "c5.large"
      min_size      = 2
      max_size      = 5
      desired_size  = 2
      disk_size     = 100
    }
  }
}

# terraform/vpc.tf
# VPC Configuration
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  
  tags = {
    Name = "${var.project_name}-vpc"
  }
}

# Internet Gateway
resource "aws_internet_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "${var.project_name}-igw"
  }
}

# Public Subnets
resource "aws_subnet" "public" {
  count = length(var.availability_zones)
  
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true
  
  tags = {
    Name = "${var.project_name}-public-${count.index}"
    Type = "Public"
  }
}

# Private Subnets
resource "aws_subnet" "private" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 3)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-private-${count.index}"
    Type = "Private"
  }
}

# Database Subnets
resource "aws_subnet" "database" {
  count = length(var.availability_zones)
  
  vpc_id            = aws_vpc.main.id
  cidr_block        = cidrsubnet(var.vpc_cidr, 8, count.index + 6)
  availability_zone = var.availability_zones[count.index]
  
  tags = {
    Name = "${var.project_name}-database-${count.index}"
    Type = "Database"
  }
}

# Route Tables
resource "aws_route_table" "public" {
  vpc_id = aws_vpc.main.id
  
  route {
    cidr_block = "0.0.0.0/0"
    gateway_id = aws_internet_gateway.main.id
  }
  
  tags = {
    Name = "${var.project_name}-public-rt"
  }
}

resource "aws_route_table_association" "public" {
  count = length(aws_subnet.public)
  
  subnet_id      = aws_subnet.public[count.index].id
  route_table_id = aws_route_table.public.id
}

# EKS Cluster
resource "aws_eks_cluster" "main" {
  name     = var.cluster_name
  role_arn = aws_iam_role.eks_cluster.arn
  version  = var.cluster_version
  
  vpc_config {
    subnet_ids = concat(
      aws_subnet.public[*].id,
      aws_subnet.private[*].id
    )
    
    endpoint_public_access  = true
    endpoint_private_access = true
    
    public_access_cidrs = ["0.0.0.0/0"]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_cluster_policy,
  ]
  
  tags = {
    Name = var.cluster_name
  }
}

# EKS Node Groups
resource "aws_eks_node_group" "main" {
  for_each = var.node_groups
  
  cluster_name    = aws_eks_cluster.main.name
  node_group_name = each.key
  node_role_arn   = aws_iam_role.eks_node.arn
  
  subnet_ids = aws_subnet.private[*].id
  
  scaling_config {
    desired_size = each.value.desired_size
    max_size     = each.value.max_size
    min_size     = each.value.min_size
  }
  
  instance_types = [each.value.instance_type]
  disk_size      = each.value.disk_size
  
  remote_access {
    ec2_ssh_key               = aws_key_pair.main.key_name
    source_security_group_ids = [aws_security_group.eks_nodes.id]
  }
  
  depends_on = [
    aws_iam_role_policy_attachment.eks_worker_node_policy,
    aws_iam_role_policy_attachment.eks_cni_policy,
    aws_iam_role_policy_attachment.eks_container_registry_policy,
  ]
  
  tags = {
    Name = "${var.cluster_name}-${each.key}"
    Type = each.key
  }
}

# IAM Roles
resource "aws_iam_role" "eks_cluster" {
  name = "${var.project_name}-eks-cluster-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "eks.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_cluster_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
  role       = aws_iam_role.eks_cluster.name
}

resource "aws_iam_role" "eks_node" {
  name = "${var.project_name}-eks-node-role"
  
  assume_role_policy = jsonencode({
    Version = "2012-10-17"
    Statement = [
      {
        Action = "sts:AssumeRole"
        Effect = "Allow"
        Principal = {
          Service = "ec2.amazonaws.com"
        }
      }
    ]
  })
}

resource "aws_iam_role_policy_attachment" "eks_worker_node_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_cni_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
  role       = aws_iam_role.eks_node.name
}

resource "aws_iam_role_policy_attachment" "eks_container_registry_policy" {
  policy_arn = "arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"
  role       = aws_iam_role.eks_node.name
}

# Security Groups
resource "aws_security_group" "eks_cluster" {
  name        = "${var.project_name}-eks-cluster-sg"
  description = "Security group for EKS cluster"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-cluster-sg"
  }
}

resource "aws_security_group" "eks_nodes" {
  name        = "${var.project_name}-eks-nodes-sg"
  description = "Security group for EKS nodes"
  vpc_id      = aws_vpc.main.id
  
  egress {
    from_port   = 0
    to_port     = 0
    protocol    = "-1"
    cidr_blocks = ["0.0.0.0/0"]
  }
  
  tags = {
    Name = "${var.project_name}-eks-nodes-sg"
  }
}

# RDS Database
resource "aws_db_subnet_group" "main" {
  name       = "${var.project_name}-db-subnet-group"
  subnet_ids = aws_subnet.database[*].id
  
  tags = {
    Name = "${var.project_name}-db-subnet-group"
  }
}

resource "aws_security_group" "rds" {
  name        = "${var.project_name}-rds-sg"
  description = "Security group for RDS database"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 5432
    to_port         = 5432
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-rds-sg"
  }
}

resource "aws_db_instance" "postgres" {
  identifier = "${var.project_name}-postgres"
  
  engine         = "postgres"
  engine_version = "15.4"
  instance_class = "db.t3.medium"
  
  allocated_storage     = 100
  max_allocated_storage = 1000
  storage_type          = "gp2"
  storage_encrypted     = true
  
  db_name  = "app"
  username = "app_user"
  password = random_password.db_password.result
  
  db_subnet_group_name = aws_db_subnet_group.main.name
  vpc_security_group_ids = [aws_security_group.rds.id]
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "sun:04:00-sun:05:00"
  
  skip_final_snapshot       = false
  final_snapshot_identifier = "${var.project_name}-postgres-final-snapshot"
  
  deletion_protection = true
  
  tags = {
    Name = "${var.project_name}-postgres"
  }
}

# Redis ElastiCache
resource "aws_elasticache_subnet_group" "main" {
  name       = "${var.project_name}-cache-subnet-group"
  subnet_ids = aws_subnet.private[*].id
  
  tags = {
    Name = "${var.project_name}-cache-subnet-group"
  }
}

resource "aws_security_group" "redis" {
  name        = "${var.project_name}-redis-sg"
  description = "Security group for Redis"
  vpc_id      = aws_vpc.main.id
  
  ingress {
    from_port       = 6379
    to_port         = 6379
    protocol        = "tcp"
    security_groups = [aws_security_group.eks_nodes.id]
  }
  
  tags = {
    Name = "${var.project_name}-redis-sg"
  }
}

resource "aws_elasticache_replication_group" "redis" {
  replication_group_id       = "${var.project_name}-redis"
  description                 = "Redis cluster for ${var.project_name}"
  
  node_type                   = "cache.t3.micro"
  port                        = 6379
  parameter_group_name        = "default.redis7"
  
  num_cache_clusters         = 2
  automatic_failover_enabled = true
  multi_az_enabled          = true
  
  subnet_group_name  = aws_elasticache_subnet_group.main.name
  security_group_ids = [aws_security_group.redis.id]
  
  at_rest_encryption_enabled = true
  transit_encryption_enabled = true
  auth_token                 = random_password.redis_auth_token.result
  
  snapshot_retention_limit = 7
  snapshot_window         = "05:00-06:00"
  maintenance_window      = "sun:06:00-sun:07:00"
  
  tags = {
    Name = "${var.project_name}-redis"
  }
}

# S3 Buckets
resource "aws_s3_bucket" "app_storage" {
  bucket = "${var.project_name}-storage-${random_string.bucket_suffix.result}"
  
  tags = {
    Name = "${var.project_name}-storage"
  }
}

resource "aws_s3_bucket_versioning" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  versioning_configuration {
    status = "Enabled"
  }
}

resource "aws_s3_bucket_encryption" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  server_side_encryption_configuration {
    rule {
      apply_server_side_encryption_by_default {
        sse_algorithm = "AES256"
      }
    }
  }
}

resource "aws_s3_bucket_public_access_block" "app_storage" {
  bucket = aws_s3_bucket.app_storage.id
  
  block_public_acls       = true
  block_public_policy     = true
  ignore_public_acls      = true
  restrict_public_buckets = true
}

# Random resources
resource "random_password" "db_password" {
  length           = 32
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_password" "redis_auth_token" {
  length           = 64
  special          = true
  override_special = "!#$%&*()-_=+[]{}<>:?"
}

resource "random_string" "bucket_suffix" {
  length  = 8
  special = false
  upper   = false
}

# Outputs
output "cluster_name" {
  description = "EKS cluster name"
  value       = aws_eks_cluster.main.name
}

output "cluster_endpoint" {
  description = "EKS cluster endpoint"
  value       = aws_eks_cluster.main.endpoint
}

output "cluster_certificate_authority_data" {
  description = "EKS cluster certificate authority data"
  value       = aws_eks_cluster.main.certificate_authority[0].data
}

output "database_endpoint" {
  description = "RDS database endpoint"
  value       = aws_db_instance.postgres.endpoint
  sensitive   = true
}

output "redis_endpoint" {
  description = "Redis endpoint"
  value       = aws_elasticache_replication_group.redis.primary_endpoint_address
  sensitive   = true
}

output "storage_bucket" {
  description = "S3 storage bucket name"
  value       = aws_s3_bucket.app_storage.bucket
}

5. Monitoring mit Prometheus und Grafana

# monitoring/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alert_rules.yml"

alerting:
  alertmanagers:
    - static_configs:
        - targets:
          - alertmanager:9093

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'kubernetes-apiservers'
    kubernetes_sd_configs:
      - role: endpoints
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https

  - job_name: 'kubernetes-nodes'
    kubernetes_sd_configs:
      - role: node
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)

  - job_name: 'kubernetes-pods'
    kubernetes_sd_configs:
      - role: pod
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_pod_name]
        action: replace
        target_label: kubernetes_pod_name

  - job_name: 'kubernetes-services'
    kubernetes_sd_configs:
      - role: service
    relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_probe]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name

# monitoring/alert_rules.yml
groups:
  - name: kubernetes-apps
    rules:
      - alert: KubernetesPodCrashLooping
        expr: rate(kube_pod_container_status_restarts_total[15m]) > 0
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Pod {{ $labels.pod }} is crash looping"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is crash looping."

      - alert: KubernetesPodNotReady
        expr: kube_pod_status_ready{condition="true"} == 0
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Pod {{ $labels.pod }} is not ready"
          description: "Pod {{ $labels.pod }} in namespace {{ $labels.namespace }} is not ready."

      - alert: KubernetesNodeNotReady
        expr: kube_node_status_condition{condition="Ready",status="true"} == 0
        for: 10m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.node }} is not ready"
          description: "Node {{ $labels.node }} has been not ready for more than 10 minutes."

  - name: application
    rules:
      - alert: HighErrorRate
        expr: rate(http_requests_total{status=~"5.."}[5m]) / rate(http_requests_total[5m]) > 0.1
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value | humanizePercentage }} for {{ $labels.job }}."

      - alert: HighResponseTime
        expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High response time detected"
          description: "95th percentile response time is {{ $value }}s for {{ $labels.job }}."

      - alert: LowThroughput
        expr: rate(http_requests_total[5m]) < 10
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Low throughput detected"
          description: "Request rate is {{ $value }} requests/second for {{ $labels.job }}."

  - name: infrastructure
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by(instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"
          description: "Memory usage is {{ $value }}% on {{ $labels.instance }}."

      - alert: DiskSpaceLow
        expr: (node_filesystem_avail_bytes / node_filesystem_size_bytes) * 100 < 10
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"
          description: "Disk space is {{ $value }}% available on {{ $labels.device }}."

# grafana/dashboards/app-dashboard.json
{
  "dashboard": {
    "id": null,
    "title": "Application Dashboard",
    "tags": ["app", "production"],
    "timezone": "browser",
    "panels": [
      {
        "id": 1,
        "title": "Request Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total[5m])",
            "legendFormat": "{{method}} {{status}}"
          }
        ],
        "yAxes": [
          {
            "label": "Requests/sec"
          }
        ]
      },
      {
        "id": 2,
        "title": "Response Time",
        "type": "graph",
        "targets": [
          {
            "expr": "histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "50th percentile"
          },
          {
            "expr": "histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "95th percentile"
          },
          {
            "expr": "histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))",
            "legendFormat": "99th percentile"
          }
        ],
        "yAxes": [
          {
            "label": "Seconds"
          }
        ]
      },
      {
        "id": 3,
        "title": "Error Rate",
        "type": "graph",
        "targets": [
          {
            "expr": "rate(http_requests_total{status=~\"5..\"}[5m]) / rate(http_requests_total[5m])",
            "legendFormat": "Error Rate"
          }
        ],
        "yAxes": [
          {
            "label": "Percentage",
            "max": 1,
            "min": 0
          }
        ]
      },
      {
        "id": 4,
        "title": "Application Status",
        "type": "stat",
        "targets": [
          {
            "expr": "up{job=\"app\"}",
            "legendFormat": "Application Status"
          }
        ],
        "fieldConfig": {
          "defaults": {
            "mappings": [
              {
                "options": {
                  "0": {
                    "text": "DOWN",
                    "color": "red"
                  },
                  "1": {
                    "text": "UP",
                    "color": "green"
                  }
                },
                "type": "value"
              }
            ]
          }
        }
      }
    ],
    "time": {
      "from": "now-1h",
      "to": "now"
    },
    "refresh": "5s"
  }
}

DevOps Pipeline Architektur

CI/CD Pipeline Stages

graph TD
    A[Code Commit] --> B[Build Stage]
    B --> C[Test Stage]
    C --> D[Security Scan]
    D --> E[Package Stage]
    E --> F[Deploy Staging]
    F --> G[Integration Tests]
    G --> H[Approve Production]
    H --> I[Deploy Production]
    I --> J[Monitoring]
    J --> K[Rollback if needed]
    
    A1[Git Push] --> A
    B1[Docker Build] --> B
    C1[Unit Tests] --> C
    C2[Integration Tests] --> C
    D1[Vulnerability Scan] --> D
    E1[Image Registry] --> E
    F1[Kubernetes Deploy] --> F
    G1[E2E Tests] --> G
    H1[Manual Approval] --> H
    I1[Blue/Green Deploy] --> I
    J1[Prometheus/Grafana] --> J
    K1[Automated Rollback] --> K

Containerisierung Vergleich

Container Runtimes

RuntimeSpracheSicherheitPerformanceAnwendung
DockerGoMittelGutGeneral Purpose
containerdGoHochSehr GutProduction
CRI-OGoHochGutKubernetes
PodmanGoHochGutDaemonless

Orchestrierung Plattformen

PlattformKomplexitätSkalierbarkeitCloud-NativeAnwendung
KubernetesHochSehr HochJaEnterprise
Docker SwarmNiedrigMittelTeilweiseSmall/Medium
OpenShiftHochSehr HochJaEnterprise
NomadMittelHochJaMulti-Cloud

Infrastructure as Code Tools

Terraform vs. CloudFormation vs. Pulumi

ToolSpracheMulti-CloudState ManagementAnwendung
TerraformHCLJaEigener StateMulti-Cloud
CloudFormationYAMLNeinAWS ManagedAWS-only
PulumiVerschiedeneJaEigener StateProgrammierbar
AnsibleYAMLJaKein StateConfiguration

IaC Best Practices

  • Modularisierung: Kleine, wiederverwendbare Module
  • Versionierung: Git-basierte Versionskontrolle
  • Testing: Automated Testing von Infrastruktur
  • Documentation: Automatisierte Dokumentation
  • Security: Security Scanning und Compliance

Monitoring und Observability

Observability Pillars

PillarWerkzeugeMetrikenAnwendung
MetricsPrometheus, InfluxDBNumerische DatenPerformance
LogsELK Stack, LokiTextuelle DatenTroubleshooting
TracesJaeger, ZipkinRequest-FlowsDistributed Systems
EventsCloudWatch, EventBridgeZustandsänderungenAudit Trail

Alerting Strategien

  • Threshold-based: Statische Grenzwerte
  • Anomaly Detection: Automatische Anomalieerkennung
  • Predictive: Vorhersage von Problemen
  • Business Metrics: Geschäftsrelevante Metriken

Vorteile und Nachteile

Vorteile von DevOps

  • Schnellere Lieferung: Beschleunigte Softwareentwicklung
  • Höhere Qualität: Automatisierte Tests und Qualitätssicherung
  • Bessere Zusammenarbeit: Integration von Dev und Ops
  • Skalierbarkeit: Automatisierte Skalierung von Infrastruktur
  • Zuverlässigkeit: Konsistente und wiederholbare Deployments

Nachteile

  • Komplexität: Hohe initiale Komplexität
  • Kosten: Investition in Tools und Training
  • Kultureller Wandel: Erfordert Organisationsveränderungen
  • Lernkurve: Steile Lernkurve für Teams
  • Tool-overload: Viele verschiedene Werkzeuge

Häufige Prüfungsfragen

  1. Was ist der Unterschied zwischen CI und CD? CI (Continuous Integration) automatisiert das Build und Testen von Code, CD (Continuous Deployment) automatisiert das Deployment in Produktion.

  2. Erklären Sie Containerisierung mit Docker! Docker isoliert Anwendungen in Containern mit allen Abhängigkeiten, was konsistente Umgebungen über verschiedene Systeme hinweg gewährleistet.

  3. Wann verwendet man Kubernetes vs. Docker Swarm? Kubernetes für komplexe, skalierbare Anwendungen in Enterprise-Umgebungen, Docker Swarm für einfachere Setups und kleine bis mittlere Unternehmen.

  4. Was ist Infrastructure as Code? Infrastructure as Code ist die Praxis, Infrastruktur durch Code zu definieren und zu verwalten, was Automatisierung und Versionierung ermöglicht.

Wichtigste Quellen

  1. https://docs.docker.com/
  2. https://kubernetes.io/docs/
  3. https://www.terraform.io/docs/
  4. https://prometheus.io/docs/
Zurück zum Blog
Share:

Ähnliche Beiträge