Computer Vision Grundlagen: Bildverarbeitung, Objekterkennung, OpenCV, TensorFlow & PyTorch

Dieser Beitrag ist eine umfassende Einführung in die Computer Vision Grundlagen – inklusive Bildverarbeitung, Objekterkennung, OpenCV, TensorFlow und PyTorch mit praktischen Beispielen.

In a Nutshell

Computer Vision ist ein Bereich der künstlichen Intelligenz, der es Computern ermöglicht, visuelle Informationen aus Bildern und Videos zu interpretieren und zu verstehen.

Kompakte Fachbeschreibung

Computer Vision ist die automatische Analyse und Interpretation von visuellen Daten durch Computer, die Bildverarbeitung, Mustererkennung und maschinelles Lernen kombiniert.

Kernkomponenten:

Bildverarbeitung

Bildfilter: Gauß-Filter, Median-Filter, Kantenerkennung
Farbräume: RGB, HSV, Grayscale, YUV
Geometrische Transformationen: Skalierung, Rotation, Translation
Histogramme: Histogramm-Ausgleich, Binarisierung

Feature Extraction

Kantenerkennung: Canny, Sobel, Laplacian
Eckenerkennung: Harris, Shi-Tomasi, FAST
Deskriptoren: SIFT, SURF, ORB, HOG
Segmentierung: Watershed, GrabCut, K-Means

Deep Learning für CV

Convolutional Neural Networks: CNN-Architekturen
Transfer Learning: VGG, ResNet, EfficientNet
Object Detection: YOLO, SSD, R-CNN
Semantic Segmentation: U-Net, Mask R-CNN

Frameworks und Bibliotheken

OpenCV: Open Source Computer Vision Library
TensorFlow: Deep Learning Framework
PyTorch: Dynamic Deep Learning Framework
Keras: High-Level Neural Network API

Prüfungsrelevante Stichpunkte

Computer Vision: Automatische Bildanalyse und Interpretation
Bildverarbeitung: Filter, Transformationen, Histogramme
Feature Extraction: Kanten, Ecken, Deskriptoren, Segmentierung
Deep Learning: CNNs, Transfer Learning, Object Detection
OpenCV: Open Source Computer Vision Bibliothek
TensorFlow: Deep Learning Framework von Google
PyTorch: Dynamic Deep Learning Framework
Objekterkennung: YOLO, SSD, R-CNN, Face Detection
IHK-relevant: Moderne Bildverarbeitung und KI-Anwendungen

Kernkomponenten

Bildakquisition: Kamera, Sensoren, Datenformate
Vorverarbeitung: Filterung, Normalisierung, Augmentierung
Feature Extraction: Kanten, Ecken, Texturen, Formen
Mustererkennung: Klassifikation, Clustering, Segmentierung
Deep Learning: CNNs, Transfer Learning, Fine-Tuning
Objekterkennung: Detection, Tracking, Localization
Anwendungen: Gesichtserkennung, OCR, Autonomous Driving
Evaluation: Metriken, Validierung, Performance

Praxisbeispiele

1. Bildverarbeitung mit OpenCV und Python

import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage import feature, filters, measure, segmentation
from scipy import ndimage

class ImageProcessor:
    def __init__(self, image_path=None):
        """Initialize image processor"""
        self.image = None
        self.processed_image = None
        self.gray_image = None
        
        if image_path:
            self.load_image(image_path)
    
    def load_image(self, image_path):
        """Load image from file"""
        self.image = cv2.imread(image_path)
        if self.image is None:
            raise ValueError(f"Could not load image from {image_path}")
        
        # Convert BGR to RGB for matplotlib
        self.image = cv2.cvtColor(self.image, cv2.COLOR_BGR2RGB)
        self.gray_image = cv2.cvtColor(self.image, cv2.COLOR_RGB2GRAY)
        return self.image
    
    def resize_image(self, width, height, interpolation=cv2.INTER_LINEAR):
        """Resize image to specified dimensions"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        self.processed_image = cv2.resize(self.image, (width, height), interpolation=interpolation)
        return self.processed_image
    
    def rotate_image(self, angle, center=None, scale=1.0):
        """Rotate image by specified angle"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        h, w = self.image.shape[:2]
        
        if center is None:
            center = (w // 2, h // 2)
        
        # Get rotation matrix
        rotation_matrix = cv2.getRotationMatrix2D(center, angle, scale)
        
        # Apply rotation
        self.processed_image = cv2.warpAffine(self.image, rotation_matrix, (w, h))
        return self.processed_image
    
    def apply_gaussian_blur(self, kernel_size=(5, 5), sigma=0):
        """Apply Gaussian blur to reduce noise"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        self.processed_image = cv2.GaussianBlur(self.gray_image, kernel_size, sigma)
        return self.processed_image
    
    def apply_median_filter(self, kernel_size=5):
        """Apply median filter for noise reduction"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        self.processed_image = cv2.medianBlur(self.gray_image, kernel_size)
        return self.processed_image
    
    def detect_edges_canny(self, low_threshold=50, high_threshold=150):
        """Detect edges using Canny edge detection"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Apply Gaussian blur first
        blurred = cv2.GaussianBlur(self.gray_image, (5, 5), 0)
        
        # Canny edge detection
        edges = cv2.Canny(blurred, low_threshold, high_threshold)
        self.processed_image = edges
        return edges
    
    def detect_edges_sobel(self, ksize=3):
        """Detect edges using Sobel operator"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Sobel derivatives
        grad_x = cv2.Sobel(self.gray_image, cv2.CV_64F, 1, 0, ksize=ksize)
        grad_y = cv2.Sobel(self.gray_image, cv2.CV_64F, 0, 1, ksize=ksize)
        
        # Convert to absolute values
        abs_grad_x = cv2.convertScaleAbs(grad_x)
        abs_grad_y = cv2.convertScaleAbs(grad_y)
        
        # Combine gradients
        edges = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
        self.processed_image = edges
        return edges
    
    def detect_corners_harris(self, block_size=2, ksize=3, k=0.04, threshold=0.01):
        """Detect corners using Harris corner detection"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Harris corner detection
        corners = cv2.cornerHarris(self.gray_image, block_size, ksize, k)
        
        # Normalize and threshold
        corners = cv2.normalize(corners, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
        corners[corners > threshold * corners.max()] = 255
        
        # Create output image
        output = self.image.copy()
        output[corners > 0] = [255, 0, 0]  # Red corners
        
        self.processed_image = output
        return corners, output
    
    def apply_histogram_equalization(self):
        """Apply histogram equalization to improve contrast"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        equalized = cv2.equalizeHist(self.gray_image)
        self.processed_image = equalized
        return equalized
    
    def adaptive_threshold(self, max_value=255, adaptive_method=cv2.ADAPTIVE_THRESH_GAUSSIAN_C, 
                         threshold_type=cv2.THRESH_BINARY, block_size=11, C=2):
        """Apply adaptive thresholding"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Apply adaptive threshold
        thresholded = cv2.adaptiveThreshold(
            self.gray_image, max_value, adaptive_method, threshold_type, block_size, C
        )
        
        self.processed_image = thresholded
        return thresholded
    
    def morphological_operations(self, operation='opening', kernel_size=5, kernel_shape=cv2.MORPH_ELLIPSE):
        """Apply morphological operations"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Create structuring element
        kernel = cv2.getStructuringElement(kernel_shape, (kernel_size, kernel_size))
        
        # Apply operation
        if operation == 'erosion':
            result = cv2.erode(self.gray_image, kernel, iterations=1)
        elif operation == 'dilation':
            result = cv2.dilate(self.gray_image, kernel, iterations=1)
        elif operation == 'opening':
            result = cv2.morphologyEx(self.gray_image, cv2.MORPH_OPEN, kernel)
        elif operation == 'closing':
            result = cv2.morphologyEx(self.gray_image, cv2.MORPH_CLOSE, kernel)
        else:
            raise ValueError(f"Unknown operation: {operation}")
        
        self.processed_image = result
        return result
    
    def watershed_segmentation(self):
        """Apply watershed segmentation"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        # Convert to grayscale
        gray = cv2.cvtColor(self.image, cv2.COLOR_RGB2GRAY)
        
        # Apply threshold
        _, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
        
        # Noise removal
        kernel = np.ones((3, 3), np.uint8)
        opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
        
        # Sure background area
        sure_bg = cv2.dilate(opening, kernel, iterations=3)
        
        # Sure foreground area
        dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
        _, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
        sure_fg = np.uint8(sure_fg)
        
        # Unknown region
        unknown = cv2.subtract(sure_bg, sure_fg)
        
        # Label markers
        _, markers = cv2.connectedComponents(sure_fg)
        markers = markers + 1
        markers[unknown == 255] = 0
        
        # Apply watershed
        markers = cv2.watershed(self.image, markers)
        
        # Create output
        output = self.image.copy()
        output[markers == -1] = [255, 0, 0]  # Watershed boundaries in red
        
        self.processed_image = output
        return markers, output
    
    def contour_detection(self, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE):
        """Detect and draw contours"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Find contours
        contours, hierarchy = cv2.findContours(self.gray_image, mode, method)
        
        # Draw contours
        output = self.image.copy()
        cv2.drawContours(output, contours, -1, (0, 255, 0), 2)
        
        self.processed_image = output
        return contours, hierarchy, output
    
    def template_matching(self, template_path, method=cv2.TM_CCOEFF_NORMED):
        """Template matching"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        # Load template
        template = cv2.imread(template_path)
        template = cv2.cvtColor(template, cv2.COLOR_BGR2RGB)
        
        # Convert template to grayscale
        template_gray = cv2.cvtColor(template, cv2.COLOR_RGB2GRAY)
        
        # Template matching
        result = cv2.matchTemplate(self.gray_image, template_gray, method)
        
        # Find best match
        min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
        
        # Draw rectangle around match
        h, w = template_gray.shape
        top_left = max_loc
        bottom_right = (top_left[0] + w, top_left[1] + h)
        
        output = self.image.copy()
        cv2.rectangle(output, top_left, bottom_right, (255, 0, 0), 2)
        
        self.processed_image = output
        return result, output
    
    def face_detection(self, cascade_path='haarcascade_frontalface_default.xml'):
        """Face detection using Haar cascades"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        # Load face cascade
        face_cascade = cv2.CascadeClassifier(cascade_path)
        
        # Detect faces
        faces = face_cascade.detectMultiScale(
            self.gray_image, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
        )
        
        # Draw rectangles around faces
        output = self.image.copy()
        for (x, y, w, h) in faces:
            cv2.rectangle(output, (x, y), (x + w, y + h), (255, 0, 0), 2)
            cv2.putText(output, 'Face', (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)
        
        self.processed_image = output
        return faces, output
    
    def optical_flow(self, prev_frame, current_frame):
        """Calculate optical flow between frames"""
        if prev_frame is None or current_frame is None:
            raise ValueError("Both frames are required")
        
        # Convert to grayscale
        prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_RGB2GRAY)
        curr_gray = cv2.cvtColor(current_frame, cv2.COLOR_RGB2GRAY)
        
        # Calculate optical flow
        flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
        
        return flow
    
    def feature_extraction_sift(self):
        """Extract SIFT features"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Initialize SIFT detector
        sift = cv2.SIFT_create()
        
        # Detect keypoints and compute descriptors
        keypoints, descriptors = sift.detectAndCompute(self.gray_image, None)
        
        # Draw keypoints
        output = cv2.drawKeypoints(self.image, keypoints, None, (0, 255, 0), flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
        
        self.processed_image = output
        return keypoints, descriptors, output
    
    def feature_extraction_orb(self):
        """Extract ORB features"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Initialize ORB detector
        orb = cv2.ORB_create()
        
        # Detect keypoints and compute descriptors
        keypoints, descriptors = orb.detectAndCompute(self.gray_image, None)
        
        # Draw keypoints
        output = cv2.drawKeypoints(self.image, keypoints, None, (0, 255, 0), flags=0)
        
        self.processed_image = output
        return keypoints, descriptors, output
    
    def feature_matching(self, image1, image2, method='sift', ratio_threshold=0.75):
        """Match features between two images"""
        # Convert to grayscale
        gray1 = cv2.cvtColor(image1, cv2.COLOR_RGB2GRAY)
        gray2 = cv2.cvtColor(image2, cv2.COLOR_RGB2GRAY)
        
        # Choose feature detector
        if method == 'sift':
            detector = cv2.SIFT_create()
        elif method == 'orb':
            detector = cv2.ORB_create()
        else:
            raise ValueError(f"Unknown method: {method}")
        
        # Detect keypoints and compute descriptors
        kp1, des1 = detector.detectAndCompute(gray1, None)
        kp2, des2 = detector.detectAndCompute(gray2, None)
        
        # Feature matching
        if method == 'sift':
            matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
        else:  # ORB uses Hamming distance
            matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
        
        matches = matcher.match(des1, des2)
        
        # Sort matches by distance
        matches = sorted(matches, key=lambda x: x.distance)
        
        # Draw matches
        output = cv2.drawMatches(image1, kp1, image2, kp2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
        
        return matches, output
    
    def histogram_analysis(self):
        """Analyze image histogram"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        # Calculate histograms for each channel
        hist_r = cv2.calcHist([self.image], [0], None, [256], [0, 256])
        hist_g = cv2.calcHist([self.image], [1], None, [256], [0, 256])
        hist_b = cv2.calcHist([self.image], [2], None, [256], [0, 256])
        
        return hist_r, hist_g, hist_b
    
    def noise_detection(self):
        """Detect noise in image"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Calculate noise using Laplacian variance
        laplacian_var = cv2.Laplacian(self.gray_image, cv2.CV_64F).var()
        
        return laplacian_var
    
    def image_quality_assessment(self):
        """Assess image quality"""
        if self.gray_image is None:
            raise ValueError("No image loaded")
        
        # Calculate various quality metrics
        metrics = {}
        
        # Sharpness (Laplacian variance)
        metrics['sharpness'] = cv2.Laplacian(self.gray_image, cv2.CV_64F).var()
        
        # Contrast (standard deviation)
        metrics['contrast'] = self.gray_image.std()
        
        # Brightness (mean intensity)
        metrics['brightness'] = self.gray_image.mean()
        
        # Noise level (estimated)
        blurred = cv2.GaussianBlur(self.gray_image, (5, 5), 0)
        metrics['noise'] = np.mean(np.abs(self.gray_image.astype(float) - blurred.astype(float)))
        
        return metrics
    
    def save_image(self, output_path):
        """Save processed image"""
        if self.processed_image is None:
            raise ValueError("No processed image to save")
        
        # Convert RGB to BGR for OpenCV
        if len(self.processed_image.shape) == 3:
            output_image = cv2.cvtColor(self.processed_image, cv2.COLOR_RGB2BGR)
        else:
            output_image = self.processed_image
        
        cv2.imwrite(output_path, output_image)
    
    def display_images(self, titles=None):
        """Display original and processed images"""
        if self.image is None:
            raise ValueError("No image loaded")
        
        fig, axes = plt.subplots(1, 2, figsize=(12, 6))
        
        # Display original image
        axes[0].imshow(self.image)
        axes[0].set_title('Original Image')
        axes[0].axis('off')
        
        # Display processed image
        if self.processed_image is not None:
            if len(self.processed_image.shape) == 2:
                axes[1].imshow(self.processed_image, cmap='gray')
            else:
                axes[1].imshow(self.processed_image)
        else:
            axes[1].imshow(self.image)
        
        if titles and len(titles) == 2:
            axes[1].set_title(titles[1])
        else:
            axes[1].set_title('Processed Image')
        
        axes[1].axis('off')
        
        plt.tight_layout()
        plt.show()

# Usage example
if __name__ == "__main__":
    # Initialize processor
    processor = ImageProcessor("sample_image.jpg")
    
    # Apply various processing techniques
    edges = processor.detect_edges_canny()
    corners, corner_image = processor.detect_corners_harris()
    equalized = processor.apply_histogram_equalization()
    faces, face_image = processor.face_detection()
    
    # Display results
    processor.display_images()

2. Deep Learning mit TensorFlow für Computer Vision

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, applications
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

class TensorFlowVisionModel:
    def __init__(self, input_shape=(224, 224, 3), num_classes=10):
        """Initialize TensorFlow Vision Model"""
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = None
        self.history = None
    
    def build_cnn_model(self, conv_layers=[32, 64, 128], dense_layers=[512, 256], dropout_rate=0.5):
        """Build a custom CNN model"""
        model = models.Sequential()
        
        # Input layer
        model.add(layers.Input(shape=self.input_shape))
        
        # Convolutional layers
        for i, filters in enumerate(conv_layers):
            if i == 0:
                model.add(layers.Conv2D(filters, (3, 3), activation='relu', padding='same'))
            else:
                model.add(layers.Conv2D(filters, (3, 3), activation='relu', padding='same'))
            
            model.add(layers.BatchNormalization())
            model.add(layers.MaxPooling2D((2, 2)))
            model.add(layers.Dropout(0.25))
        
        # Flatten layer
        model.add(layers.Flatten())
        
        # Dense layers
        for units in dense_layers:
            model.add(layers.Dense(units, activation='relu'))
            model.add(layers.BatchNormalization())
            model.add(layers.Dropout(dropout_rate))
        
        # Output layer
        model.add(layers.Dense(self.num_classes, activation='softmax'))
        
        self.model = model
        return model
    
    def build_resnet_model(self, pretrained=True, fine_tune_at=100):
        """Build ResNet model with transfer learning"""
        if pretrained:
            # Load pretrained ResNet50
            base_model = applications.ResNet50(
                weights='imagenet',
                include_top=False,
                input_shape=self.input_shape
            )
            
            # Freeze base model layers
            base_model.trainable = False
            
            # Fine-tune last few layers
            for layer in base_model.layers[fine_tune_at:]:
                layer.trainable = True
        else:
            # Build ResNet from scratch
            base_model = applications.ResNet50(
                weights=None,
                include_top=False,
                input_shape=self.input_shape
            )
        
        # Add custom classification head
        inputs = keras.Input(shape=self.input_shape)
        x = base_model(inputs, training=False)
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dropout(0.2)(x)
        x = layers.Dense(256, activation='relu')(x)
        x = layers.Dropout(0.2)(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)
        
        model = keras.Model(inputs, outputs)
        self.model = model
        return model
    
    def build_vgg_model(self, pretrained=True):
        """Build VGG model with transfer learning"""
        if pretrained:
            # Load pretrained VGG16
            base_model = applications.VGG16(
                weights='imagenet',
                include_top=False,
                input_shape=self.input_shape
            )
            
            # Freeze base model
            base_model.trainable = False
        else:
            # Build VGG from scratch
            base_model = applications.VGG16(
                weights=None,
                include_top=False,
                input_shape=self.input_shape
            )
        
        # Add custom classification head
        inputs = keras.Input(shape=self.input_shape)
        x = base_model(inputs, training=False)
        x = layers.Flatten()(x)
        x = layers.Dense(512, activation='relu')(x)
        x = layers.Dropout(0.5)(x)
        x = layers.Dense(256, activation='relu')(x)
        x = layers.Dropout(0.5)(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)
        
        model = keras.Model(inputs, outputs)
        self.model = model
        return model
    
    def build_efficientnet_model(self, pretrained=True):
        """Build EfficientNet model with transfer learning"""
        if pretrained:
            # Load pretrained EfficientNetB0
            base_model = applications.EfficientNetB0(
                weights='imagenet',
                include_top=False,
                input_shape=self.input_shape
            )
            
            # Freeze base model
            base_model.trainable = False
        else:
            # Build EfficientNet from scratch
            base_model = applications.EfficientNetB0(
                weights=None,
                include_top=False,
                input_shape=self.input_shape
            )
        
        # Add custom classification head
        inputs = keras.Input(shape=self.input_shape)
        x = base_model(inputs, training=False)
        x = layers.GlobalAveragePooling2D()(x)
        x = layers.Dropout(0.2)(x)
        x = layers.Dense(128, activation='relu')(x)
        x = layers.Dropout(0.2)(x)
        outputs = layers.Dense(self.num_classes, activation='softmax')(x)
        
        model = keras.Model(inputs, outputs)
        self.model = model
        return model
    
    def compile_model(self, optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']):
        """Compile the model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        self.model.compile(
            optimizer=optimizer,
            loss=loss,
            metrics=metrics
        )
    
    def train_model(self, train_data, val_data, epochs=10, batch_size=32, callbacks=None):
        """Train the model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        # Default callbacks
        if callbacks is None:
            callbacks = [
                keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
                keras.callbacks.ReduceLROnPlateau(factor=0.2, patience=3),
                keras.callbacks.ModelCheckpoint('best_model.h5', save_best_only=True)
            ]
        
        # Train model
        self.history = self.model.fit(
            train_data,
            validation_data=val_data,
            epochs=epochs,
            batch_size=batch_size,
            callbacks=callbacks
        )
        
        return self.history
    
    def evaluate_model(self, test_data):
        """Evaluate the model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        results = self.model.evaluate(test_data)
        return dict(zip(self.model.metrics_names, results))
    
    def predict(self, data):
        """Make predictions"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        predictions = self.model.predict(data)
        return predictions
    
    def plot_training_history(self):
        """Plot training history"""
        if self.history is None:
            raise ValueError("Model not trained yet")
        
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        
        # Plot training & validation accuracy
        ax1.plot(self.history.history['accuracy'], label='Training Accuracy')
        ax1.plot(self.history.history['val_accuracy'], label='Validation Accuracy')
        ax1.set_title('Model Accuracy')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Accuracy')
        ax1.legend()
        ax1.grid(True)
        
        # Plot training & validation loss
        ax2.plot(self.history.history['loss'], label='Training Loss')
        ax2.plot(self.history.history['val_loss'], label='Validation Loss')
        ax2.set_title('Model Loss')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Loss')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.show()
    
    def plot_confusion_matrix(self, test_data, class_names=None):
        """Plot confusion matrix"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        # Get predictions
        y_pred = np.argmax(self.model.predict(test_data), axis=1)
        y_true = test_data.classes
        
        # Create confusion matrix
        cm = confusion_matrix(y_true, y_pred)
        
        # Plot
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
                   xticklabels=class_names, yticklabels=class_names)
        plt.title('Confusion Matrix')
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.show()
    
    def save_model(self, filepath):
        """Save the model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        self.model.save(filepath)
    
    def load_model(self, filepath):
        """Load a saved model"""
        self.model = keras.models.load_model(filepath)
        return self.model

# Data augmentation and preprocessing
class DataAugmentation:
    def __init__(self, target_size=(224, 224), batch_size=32):
        """Initialize data augmentation"""
        self.target_size = target_size
        self.batch_size = batch_size
    
    def create_train_generator(self, train_dir, validation_split=0.2):
        """Create training data generator with augmentation"""
        train_datagen = keras.preprocessing.image.ImageDataGenerator(
            rescale=1./255,
            rotation_range=20,
            width_shift_range=0.2,
            height_shift_range=0.2,
            horizontal_flip=True,
            vertical_flip=False,
            zoom_range=0.2,
            shear_range=0.2,
            fill_mode='nearest',
            validation_split=validation_split
        )
        
        train_generator = train_datagen.flow_from_directory(
            train_dir,
            target_size=self.target_size,
            batch_size=self.batch_size,
            class_mode='sparse',
            subset='training'
        )
        
        validation_generator = train_datagen.flow_from_directory(
            train_dir,
            target_size=self.target_size,
            batch_size=self.batch_size,
            class_mode='sparse',
            subset='validation'
        )
        
        return train_generator, validation_generator
    
    def create_test_generator(self, test_dir):
        """Create test data generator"""
        test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
        
        test_generator = test_datagen.flow_from_directory(
            test_dir,
            target_size=self.target_size,
            batch_size=self.batch_size,
            class_mode='sparse',
            shuffle=False
        )
        
        return test_generator

# Object Detection with TensorFlow
class ObjectDetectionModel:
    def __init__(self, model_type='yolo'):
        """Initialize object detection model"""
        self.model_type = model_type
        self.model = None
    
    def build_yolo_model(self, input_shape=(416, 416, 3), num_classes=20, anchors=None):
        """Build YOLO model"""
        if anchors is None:
            # Default YOLO anchors
            anchors = np.array([
                [(10, 13), (16, 30), (33, 23)],
                [(30, 61), (62, 45), (59, 119)],
                [(116, 90), (156, 198), (373, 326)]
            ])
        
        # Build YOLO architecture (simplified)
        inputs = keras.Input(shape=input_shape)
        
        # Feature extraction (simplified backbone)
        x = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
        x = layers.MaxPooling2D((2, 2))(x)
        x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        
        # Detection heads (simplified)
        output1 = layers.Conv2D(len(anchors[0]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
        output2 = layers.Conv2D(len(anchors[1]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
        output3 = layers.Conv2D(len(anchors[2]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
        
        model = keras.Model(inputs, [output1, output2, output3])
        self.model = model
        return model
    
    def build_ssd_model(self, input_shape=(300, 300, 3), num_classes=20):
        """Build SSD model"""
        inputs = keras.Input(shape=input_shape)
        
        # Base network (simplified VGG-like)
        x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(inputs)
        x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
        x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        
        # Additional layers for multi-scale feature extraction
        x = layers.Conv2D(256, (3, 3), padding='same', activation='relu')(x)
        x = layers.Conv2D(256, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
        x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
        x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
        x = layers.MaxPooling2D((2, 2))(x)
        
        # Detection heads (simplified)
        detections = []
        for i in range(6):  # 6 detection layers
            detection = layers.Conv2D(num_classes + 12, (3, 3), padding='same', activation='sigmoid')(x)
            detections.append(detection)
        
        model = keras.Model(inputs, detections)
        self.model = model
        return model
    
    def compile_model(self, optimizer='adam', loss='mse', metrics=['accuracy']):
        """Compile the object detection model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        self.model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
    
    def predict_boxes(self, image, confidence_threshold=0.5, nms_threshold=0.4):
        """Predict bounding boxes"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        # Preprocess image
        image = tf.expand_dims(image, axis=0)
        
        # Make prediction
        predictions = self.model.predict(image)
        
        # Post-process predictions (simplified)
        boxes = []
        scores = []
        classes = []
        
        for prediction in predictions:
            # Extract boxes, scores, and classes from prediction
            # This is a simplified version - actual implementation would be more complex
            pass
        
        # Apply Non-Maximum Suppression
        indices = tf.image.non_max_suppression(
            boxes, scores, max_output_size=100, iou_threshold=nms_threshold
        )
        
        # Filter results
        final_boxes = tf.gather(boxes, indices)
        final_scores = tf.gather(scores, indices)
        final_classes = tf.gather(classes, indices)
        
        return final_boxes, final_scores, final_classes

# Semantic Segmentation Model
class SemanticSegmentationModel:
    def __init__(self, input_shape=(256, 256, 3), num_classes=10):
        """Initialize semantic segmentation model"""
        self.input_shape = input_shape
        self.num_classes = num_classes
        self.model = None
    
    def build_unet_model(self):
        """Build U-Net model for semantic segmentation"""
        inputs = keras.Input(shape=self.input_shape)
        
        # Encoder
        c1 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
        c1 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
        p1 = layers.MaxPooling2D((2, 2))(c1)
        
        c2 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(p1)
        c2 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(c2)
        p2 = layers.MaxPooling2D((2, 2))(c2)
        
        c3 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(p2)
        c3 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(c3)
        p3 = layers.MaxPooling2D((2, 2))(c3)
        
        # Bridge
        c4 = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(p3)
        c4 = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(c4)
        
        # Decoder
        u5 = layers.Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(c4)
        u5 = layers.concatenate([u5, c3])
        c5 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(u5)
        c5 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(c5)
        
        u6 = layers.Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(c5)
        u6 = layers.concatenate([u6, c2])
        c6 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(u6)
        c6 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(c6)
        
        u7 = layers.Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(c6)
        u7 = layers.concatenate([u7, c1])
        c7 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(u7)
        c7 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(c7)
        
        # Output
        outputs = layers.Conv2D(self.num_classes, (1, 1), activation='softmax')(c7)
        
        model = keras.Model(inputs, outputs)
        self.model = model
        return model
    
    def compile_model(self, optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']):
        """Compile the segmentation model"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        self.model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
    
    def predict_segmentation(self, image):
        """Predict segmentation mask"""
        if self.model is None:
            raise ValueError("Model not built yet")
        
        # Preprocess image
        image = tf.expand_dims(image, axis=0)
        
        # Make prediction
        prediction = self.model.predict(image)
        
        # Get class with highest probability for each pixel
        segmentation_mask = tf.argmax(prediction, axis=-1)
        
        return segmentation_mask[0]

# Usage example
if __name__ == "__main__":
    # Initialize model
    model = TensorFlowVisionModel(input_shape=(224, 224, 3), num_classes=10)
    
    # Build model
    model.build_resnet_model(pretrained=True)
    
    # Compile model
    model.compile_model(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
    
    # Create data generators
    data_aug = DataAugmentation(target_size=(224, 224), batch_size=32)
    train_gen, val_gen = data_aug.create_train_generator('data/train')
    test_gen = data_aug.create_test_generator('data/test')
    
    # Train model
    history = model.train_model(train_gen, val_gen, epochs=20)
    
    # Evaluate model
    results = model.evaluate_model(test_gen)
    print(f"Test accuracy: {results['accuracy']:.4f}")
    
    # Plot results
    model.plot_training_history()
    model.plot_confusion_matrix(test_gen)

3. PyTorch Computer Vision Implementierung

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
from torchvision import models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns

class PyTorchVisionModel(nn.Module):
    def __init__(self, num_classes=10, input_channels=3):
        """Initialize PyTorch Vision Model"""
        super(PyTorchVisionModel, self).__init__()
        self.num_classes = num_classes
        self.input_channels = input_channels
        
        # Build model architecture
        self.features = self._build_features()
        self.classifier = self._build_classifier()
    
    def _build_features(self):
        """Build feature extraction layers"""
        return nn.Sequential(
            # First convolutional block
            nn.Conv2d(self.input_channels, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Second convolutional block
            nn.Conv2d(64, 128, kernel_size=3, padding=1),
            nn.BatchNorm2d(128),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Third convolutional block
            nn.Conv2d(128, 256, kernel_size=3, padding=1),
            nn.BatchNorm2d(256),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            
            # Fourth convolutional block
            nn.Conv2d(256, 512, kernel_size=3, padding=1),
            nn.BatchNorm2d(512),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
        )
    
    def _build_classifier(self):
        """Build classification layers"""
        return nn.Sequential(
            nn.Dropout(0.5),
            nn.Linear(512 * 14 * 14, 1024),  # Assuming input size 224x224
            nn.ReLU(inplace=True),
            nn.Dropout(0.5),
            nn.Linear(1024, 512),
            nn.ReLU(inplace=True),
            nn.Linear(512, self.num_classes)
        )
    
    def forward(self, x):
        """Forward pass"""
        x = self.features(x)
        x = torch.flatten(x, 1)
        x = self.classifier(x)
        return x

class ResNetModel(nn.Module):
    def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
        """Initialize ResNet model with transfer learning"""
        super(ResNetModel, self).__init__()
        
        # Load pretrained ResNet
        if pretrained:
            self.backbone = models.resnet50(pretrained=True)
        else:
            self.backbone = models.resnet50(pretrained=False)
        
        # Modify final layer
        num_features = self.backbone.fc.in_features
        self.backbone.fc = nn.Linear(num_features, num_classes)
        
        # Freeze layers if not fine-tuning
        if not fine_tune:
            for param in self.backbone.parameters():
                param.requires_grad = False
            # Only train the final layer
            for param in self.backbone.fc.parameters():
                param.requires_grad = True
    
    def forward(self, x):
        """Forward pass"""
        return self.backbone(x)

class VGGModel(nn.Module):
    def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
        """Initialize VGG model with transfer learning"""
        super(VGGModel, self).__init__()
        
        # Load pretrained VGG
        if pretrained:
            self.backbone = models.vgg16(pretrained=True)
        else:
            self.backbone = models.vgg16(pretrained=False)
        
        # Modify classifier
        num_features = self.backbone.classifier[6].in_features
        self.backbone.classifier[6] = nn.Linear(num_features, num_classes)
        
        # Freeze layers if not fine-tuning
        if not fine_tune:
            for param in self.backbone.features.parameters():
                param.requires_grad = False
            # Only train the classifier
            for param in self.backbone.classifier.parameters():
                param.requires_grad = True
    
    def forward(self, x):
        """Forward pass"""
        return self.backbone(x)

class EfficientNetModel(nn.Module):
    def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
        """Initialize EfficientNet model with transfer learning"""
        super(EfficientNetModel, self).__init__()
        
        # Load pretrained EfficientNet
        if pretrained:
            self.backbone = models.efficientnet_b0(pretrained=True)
        else:
            self.backbone = models.efficientnet_b0(pretrained=False)
        
        # Modify classifier
        num_features = self.backbone.classifier[1].in_features
        self.backbone.classifier[1] = nn.Linear(num_features, num_classes)
        
        # Freeze layers if not fine-tuning
        if not fine_tune:
            for param in self.backbone.parameters():
                param.requires_grad = False
            # Only train the classifier
            for param in self.backbone.classifier.parameters():
                param.requires_grad = True
    
    def forward(self, x):
        """Forward pass"""
        return self.backbone(x)

class CustomDataset(Dataset):
    """Custom dataset for image classification"""
    def __init__(self, data_dir, transform=None):
        self.data_dir = data_dir
        self.transform = transform
        self.images = []
        self.labels = []
        self.class_to_idx = {}
        
        # Load dataset
        self._load_dataset()
    
    def _load_dataset(self):
        """Load images and labels from directory"""
        import os
        from PIL import Image
        
        classes = sorted(os.listdir(self.data_dir))
        self.class_to_idx = {cls: idx for idx, cls in enumerate(classes)}
        
        for class_name in classes:
            class_dir = os.path.join(self.data_dir, class_name)
            if os.path.isdir(class_dir):
                class_idx = self.class_to_idx[class_name]
                for img_name in os.listdir(class_dir):
                    if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
                        img_path = os.path.join(class_dir, img_name)
                        self.images.append(img_path)
                        self.labels.append(class_idx)
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, idx):
        img_path = self.images[idx]
        label = self.labels[idx]
        
        # Load image
        from PIL import Image
        image = Image.open(img_path).convert('RGB')
        
        # Apply transforms
        if self.transform:
            image = self.transform(image)
        
        return image, label

class Trainer:
    """Training class for PyTorch models"""
    def __init__(self, model, device='cuda' if torch.cuda.is_available() else 'cpu'):
        self.model = model.to(device)
        self.device = device
        self.train_losses = []
        self.train_accuracies = []
        self.val_losses = []
        self.val_accuracies = []
    
    def train(self, train_loader, val_loader, epochs=10, learning_rate=0.001):
        """Train the model"""
        criterion = nn.CrossEntropyLoss()
        optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
        scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
        
        for epoch in range(epochs):
            # Training phase
            self.model.train()
            train_loss = 0.0
            train_correct = 0
            train_total = 0
            
            for batch_idx, (data, target) in enumerate(train_loader):
                data, target = data.to(self.device), target.to(self.device)
                
                optimizer.zero_grad()
                output = self.model(data)
                loss = criterion(output, target)
                loss.backward()
                optimizer.step()
                
                train_loss += loss.item()
                _, predicted = torch.max(output.data, 1)
                train_total += target.size(0)
                train_correct += (predicted == target).sum().item()
            
            # Validation phase
            self.model.eval()
            val_loss = 0.0
            val_correct = 0
            val_total = 0
            
            with torch.no_grad():
                for data, target in val_loader:
                    data, target = data.to(self.device), target.to(self.device)
                    output = self.model(data)
                    loss = criterion(output, target)
                    
                    val_loss += loss.item()
                    _, predicted = torch.max(output.data, 1)
                    val_total += target.size(0)
                    val_correct += (predicted == target).sum().item()
            
            # Calculate metrics
            train_accuracy = 100 * train_correct / train_total
            val_accuracy = 100 * val_correct / val_total
            
            # Store metrics
            self.train_losses.append(train_loss / len(train_loader))
            self.train_accuracies.append(train_accuracy)
            self.val_losses.append(val_loss / len(val_loader))
            self.val_accuracies.append(val_accuracy)
            
            # Print progress
            print(f'Epoch {epoch+1}/{epochs}:')
            print(f'Train Loss: {train_loss/len(train_loader):.4f}, Train Acc: {train_accuracy:.2f}%')
            print(f'Val Loss: {val_loss/len(val_loader):.4f}, Val Acc: {val_accuracy:.2f}%')
            print('-' * 50)
            
            scheduler.step()
    
    def evaluate(self, test_loader):
        """Evaluate the model"""
        self.model.eval()
        test_loss = 0.0
        test_correct = 0
        test_total = 0
        all_predictions = []
        all_targets = []
        
        criterion = nn.CrossEntropyLoss()
        
        with torch.no_grad():
            for data, target in test_loader:
                data, target = data.to(self.device), target.to(self.device)
                output = self.model(data)
                loss = criterion(output, target)
                
                test_loss += loss.item()
                _, predicted = torch.max(output.data, 1)
                test_total += target.size(0)
                test_correct += (predicted == target).sum().item()
                
                all_predictions.extend(predicted.cpu().numpy())
                all_targets.extend(target.cpu().numpy())
        
        test_accuracy = 100 * test_correct / test_total
        
        print(f'Test Loss: {test_loss/len(test_loader):.4f}')
        print(f'Test Accuracy: {test_accuracy:.2f}%')
        
        return test_accuracy, all_predictions, all_targets
    
    def plot_training_history(self):
        """Plot training history"""
        fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
        
        # Plot training & validation loss
        ax1.plot(self.train_losses, label='Training Loss')
        ax1.plot(self.val_losses, label='Validation Loss')
        ax1.set_title('Model Loss')
        ax1.set_xlabel('Epoch')
        ax1.set_ylabel('Loss')
        ax1.legend()
        ax1.grid(True)
        
        # Plot training & validation accuracy
        ax2.plot(self.train_accuracies, label='Training Accuracy')
        ax2.plot(self.val_accuracies, label='Validation Accuracy')
        ax2.set_title('Model Accuracy')
        ax2.set_xlabel('Epoch')
        ax2.set_ylabel('Accuracy (%)')
        ax2.legend()
        ax2.grid(True)
        
        plt.tight_layout()
        plt.show()
    
    def plot_confusion_matrix(self, predictions, targets, class_names=None):
        """Plot confusion matrix"""
        cm = confusion_matrix(targets, predictions)
        
        plt.figure(figsize=(10, 8))
        sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
                   xticklabels=class_names, yticklabels=class_names)
        plt.title('Confusion Matrix')
        plt.xlabel('Predicted')
        plt.ylabel('True')
        plt.show()
    
    def save_model(self, filepath):
        """Save the model"""
        torch.save(self.model.state_dict(), filepath)
    
    def load_model(self, filepath):
        """Load a saved model"""
        self.model.load_state_dict(torch.load(filepath))
        return self.model

class ObjectDetectionModel(nn.Module):
    """Object Detection Model using PyTorch"""
    def __init__(self, num_classes=20, backbone='resnet50'):
        super(ObjectDetectionModel, self).__init__()
        self.num_classes = num_classes
        self.backbone_name = backbone
        
        # Build backbone
        if backbone == 'resnet50':
            self.backbone = models.resnet50(pretrained=True)
            self.backbone = nn.Sequential(*list(self.backbone.children())[:-2])
        elif backbone == 'vgg16':
            self.backbone = models.vgg16(pretrained=True)
            self.backbone = nn.Sequential(*list(self.backbone.features.children()))
        
        # Detection heads (simplified)
        self.classifier = nn.Conv2d(2048, num_classes * 5, 3, padding=1)  # 5 = 4 bbox + 1 confidence
        
    def forward(self, x):
        """Forward pass"""
        features = self.backbone(x)
        detections = self.classifier(features)
        return detections

class SemanticSegmentationModel(nn.Module):
    """Semantic Segmentation Model using PyTorch"""
    def __init__(self, num_classes=10):
        super(SemanticSegmentationModel, self).__init__()
        self.num_classes = num_classes
        
        # Encoder
        self.encoder1 = self._conv_block(3, 64)
        self.encoder2 = self._conv_block(64, 128)
        self.encoder3 = self._conv_block(128, 256)
        self.encoder4 = self._conv_block(256, 512)
        
        # Bridge
        self.bridge = self._conv_block(512, 1024)
        
        # Decoder
        self.decoder4 = self._conv_block(1024 + 512, 512)
        self.decoder3 = self._conv_block(512 + 256, 256)
        self.decoder2 = self._conv_block(256 + 128, 128)
        self.decoder1 = self._conv_block(128 + 64, 64)
        
        # Output
        self.output = nn.Conv2d(64, num_classes, 1)
        
        # Pooling and upsampling
        self.pool = nn.MaxPool2d(2)
        self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
    
    def _conv_block(self, in_channels, out_channels):
        """Convolutional block"""
        return nn.Sequential(
            nn.Conv2d(in_channels, out_channels, 3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True),
            nn.Conv2d(out_channels, out_channels, 3, padding=1),
            nn.BatchNorm2d(out_channels),
            nn.ReLU(inplace=True)
        )
    
    def forward(self, x):
        """Forward pass"""
        # Encoder
        enc1 = self.encoder1(x)
        enc2 = self.encoder2(self.pool(enc1))
        enc3 = self.encoder3(self.pool(enc2))
        enc4 = self.encoder4(self.pool(enc3))
        
        # Bridge
        bridge = self.bridge(self.pool(enc4))
        
        # Decoder
        dec4 = self.decoder4(torch.cat([self.upsample(bridge), enc4], dim=1))
        dec3 = self.decoder3(torch.cat([self.upsample(dec4), enc3], dim=1))
        dec2 = self.decoder2(torch.cat([self.upsample(dec3), enc2], dim=1))
        dec1 = self.decoder1(torch.cat([self.upsample(dec2), enc1], dim=1))
        
        # Output
        output = self.output(dec1)
        return output

# Data augmentation and preprocessing
def get_transforms(image_size=224):
    """Get data transforms"""
    train_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.RandomHorizontalFlip(),
        transforms.RandomRotation(10),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
        transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    val_transform = transforms.Compose([
        transforms.Resize((image_size, image_size)),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
    ])
    
    return train_transform, val_transform

# Usage example
if __name__ == "__main__":
    # Set device
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")
    
    # Get transforms
    train_transform, val_transform = get_transforms(image_size=224)
    
    # Create datasets
    train_dataset = CustomDataset('data/train', transform=train_transform)
    val_dataset = CustomDataset('data/val', transform=val_transform)
    test_dataset = CustomDataset('data/test', transform=val_transform)
    
    # Create data loaders
    train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
    val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
    test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
    
    # Create model
    model = ResNetModel(num_classes=10, pretrained=True, fine_tune=False)
    
    # Create trainer
    trainer = Trainer(model, device=device)
    
    # Train model
    trainer.train(train_loader, val_loader, epochs=20, learning_rate=0.001)
    
    # Evaluate model
    test_accuracy, predictions, targets = trainer.evaluate(test_loader)
    
    # Plot results
    trainer.plot_training_history()
    trainer.plot_confusion_matrix(predictions, targets, class_names=list(train_dataset.class_to_idx.keys()))
    
    # Save model
    trainer.save_model('best_model.pth')

Computer Vision Architekturen

CNN Architektur Übersicht

graph TD
    A[Input Image] --> B[Conv Layer 1]
    B --> C[ReLU Activation]
    C --> D[Max Pooling]
    D --> E[Conv Layer 2]
    E --> F[ReLU Activation]
    F --> G[Max Pooling]
    G --> H[Conv Layer 3]
    H --> I[ReLU Activation]
    I --> J[Global Average Pooling]
    J --> K[Fully Connected]
    K --> L[Output Layer]
    
    A1[224x224x3] --> B
    B1[224x224x64] --> C
    C1[224x224x64] --> D
    D1[112x112x64] --> E
    E1[112x112x128] --> F
    F1[112x112x128] --> G
    G1[56x56x128] --> H
    H1[56x56x256] --> I
    I1[56x56x256] --> J
    J1[256] --> K
    K1[512] --> L
    L1[10] --> M[Predictions]

Deep Learning Modelle Vergleich

CNN Architekturen

Modell	Parameter	Top-1 Accuracy	Größe	Anwendung
VGG16	138M	71.5%	528 MB	Klassifikation
ResNet50	25.6M	76.1%	98 MB	Transfer Learning
EfficientNet-B0	5.3M	77.1%	20 MB	Mobile
MobileNetV2	3.5M	71.8%	14 MB	Edge Devices

Object Detection Modelle

Modell	mAP	FPS	Backbone	Anwendung
YOLOv5	56.8%	140	CSPDarknet	Real-time
SSD	46.5%	46	VGG16	Balanced
Faster R-CNN	42.0%	7	ResNet101	Accuracy
RetinaNet	39.1%	15	ResNet50	Focal Loss

Bildverarbeitungstechniken

Filter und Transformationen

# Advanced image processing techniques
class AdvancedImageProcessing:
    def __init__(self):
        pass
    
    def gaussian_blur(self, image, kernel_size=(5, 5), sigma=1.0):
        """Apply Gaussian blur"""
        return cv2.GaussianBlur(image, kernel_size, sigma)
    
    def bilateral_filter(self, image, d=9, sigma_color=75, sigma_space=75):
        """Apply bilateral filter for edge-preserving smoothing"""
        return cv2.bilateralFilter(image, d, sigma_color, sigma_space)
    
    def morphological_gradient(self, image, kernel_size=3):
        """Apply morphological gradient"""
        kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_size, kernel_size))
        return cv2.morphologyEx(image, cv2.MORPH_GRADIENT, kernel)
    
    def adaptive_histogram_equalization(self, image, clip_limit=2.0, tile_grid_size=(8, 8)):
        """Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)"""
        clahe = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=tile_grid_size)
        return clahe.apply(image)
    
    def fourier_transform(self, image):
        """Apply Fourier Transform"""
        # Convert to grayscale if needed
        if len(image.shape) == 3:
            gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
        else:
            gray = image
        
        # Apply FFT
        f_transform = np.fft.fft2(gray)
        f_shift = np.fft.fftshift(f_transform)
        
        # Get magnitude spectrum
        magnitude_spectrum = 20 * np.log(np.abs(f_shift))
        
        return magnitude_spectrum
    
    def edge_detection_laplacian(self, image):
        """Apply Laplacian edge detection"""
        return cv2.Laplacian(image, cv2.CV_64F)
    
    def hough_lines(self, image, rho=1, theta=np.pi/180, threshold=100):
        """Detect lines using Hough Transform"""
        edges = cv2.Canny(image, 50, 150, apertureSize=3)
        lines = cv2.HoughLines(edges, rho, theta, threshold)
        return lines
    
    def hough_circles(self, image, dp=1, min_dist=20, param1=50, param2=30, min_radius=0, max_radius=0):
        """Detect circles using Hough Transform"""
        gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) if len(image.shape) == 3 else image
        circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, dp, min_dist, 
                                 param1=param1, param2=param2, 
                                 minRadius=min_radius, maxRadius=max_radius)
        return circles

Vorteile und Nachteile

Vorteile von Computer Vision

Automatisierung: Manuelle Bildanalyse wird überflüssig
Skalierbarkeit: Große Datenmengen können verarbeitet werden
Konsistenz: Gleichbleibende Qualität der Analyse
24/7 Betrieb: Kontinuierliche Verarbeitung möglich
Kosteneffizienz: Reduzierung manueller Arbeit

Nachteile

Datenabhängigkeit: Hoher Bedarf an Trainingsdaten
Rechenintensität: Benötigt leistungsfähige Hardware
Interpretierbarkeit: Black-Box-Problem bei Deep Learning
Bias: Kann Vorurteile aus Trainingsdaten lernen
Komplexität: Implementierung ist komplex

Häufige Prüfungsfragen

Was ist der Unterschied zwischen Bildverarbeitung und Computer Vision? Bildverarbeitung konzentriert sich auf die Verbesserung von Bildern, Computer Vision auf die Interpretation und Verständnis von visuellen Daten.
Erklären Sie die Funktionsweise von Convolutional Neural Networks! CNNs verwenden Faltungsschichten zur Feature-Extraktion, gefolgt von Pooling-Schichten zur Dimensionsreduktion und Fully-Connected-Schichten zur Klassifikation.
Wann verwendet man welche CNN-Architektur? VGG für einfache Klassifikation, ResNet für Transfer Learning, EfficientNet für mobile Anwendungen, MobileNet für Edge Devices.
Was ist der Zweck von Data Augmentation? Data Augmentation erzeugt künstlich Trainingsdaten durch Transformationen wie Rotation, Skalierung und Farbänderungen zur Verbesserung der Generalisierung.