Computer Vision Grundlagen: Bildverarbeitung, Objekterkennung, OpenCV, TensorFlow & PyTorch
Dieser Beitrag ist eine umfassende Einführung in die Computer Vision Grundlagen – inklusive Bildverarbeitung, Objekterkennung, OpenCV, TensorFlow und PyTorch mit praktischen Beispielen.
In a Nutshell
Computer Vision ist ein Bereich der künstlichen Intelligenz, der es Computern ermöglicht, visuelle Informationen aus Bildern und Videos zu interpretieren und zu verstehen.
Kompakte Fachbeschreibung
Computer Vision ist die automatische Analyse und Interpretation von visuellen Daten durch Computer, die Bildverarbeitung, Mustererkennung und maschinelles Lernen kombiniert.
Kernkomponenten:
Bildverarbeitung
- Bildfilter: Gauß-Filter, Median-Filter, Kantenerkennung
- Farbräume: RGB, HSV, Grayscale, YUV
- Geometrische Transformationen: Skalierung, Rotation, Translation
- Histogramme: Histogramm-Ausgleich, Binarisierung
Feature Extraction
- Kantenerkennung: Canny, Sobel, Laplacian
- Eckenerkennung: Harris, Shi-Tomasi, FAST
- Deskriptoren: SIFT, SURF, ORB, HOG
- Segmentierung: Watershed, GrabCut, K-Means
Deep Learning für CV
- Convolutional Neural Networks: CNN-Architekturen
- Transfer Learning: VGG, ResNet, EfficientNet
- Object Detection: YOLO, SSD, R-CNN
- Semantic Segmentation: U-Net, Mask R-CNN
Frameworks und Bibliotheken
- OpenCV: Open Source Computer Vision Library
- TensorFlow: Deep Learning Framework
- PyTorch: Dynamic Deep Learning Framework
- Keras: High-Level Neural Network API
Prüfungsrelevante Stichpunkte
- Computer Vision: Automatische Bildanalyse und Interpretation
- Bildverarbeitung: Filter, Transformationen, Histogramme
- Feature Extraction: Kanten, Ecken, Deskriptoren, Segmentierung
- Deep Learning: CNNs, Transfer Learning, Object Detection
- OpenCV: Open Source Computer Vision Bibliothek
- TensorFlow: Deep Learning Framework von Google
- PyTorch: Dynamic Deep Learning Framework
- Objekterkennung: YOLO, SSD, R-CNN, Face Detection
- IHK-relevant: Moderne Bildverarbeitung und KI-Anwendungen
Kernkomponenten
- Bildakquisition: Kamera, Sensoren, Datenformate
- Vorverarbeitung: Filterung, Normalisierung, Augmentierung
- Feature Extraction: Kanten, Ecken, Texturen, Formen
- Mustererkennung: Klassifikation, Clustering, Segmentierung
- Deep Learning: CNNs, Transfer Learning, Fine-Tuning
- Objekterkennung: Detection, Tracking, Localization
- Anwendungen: Gesichtserkennung, OCR, Autonomous Driving
- Evaluation: Metriken, Validierung, Performance
Praxisbeispiele
1. Bildverarbeitung mit OpenCV und Python
import cv2
import numpy as np
import matplotlib.pyplot as plt
from skimage import feature, filters, measure, segmentation
from scipy import ndimage
class ImageProcessor:
def __init__(self, image_path=None):
"""Initialize image processor"""
self.image = None
self.processed_image = None
self.gray_image = None
if image_path:
self.load_image(image_path)
def load_image(self, image_path):
"""Load image from file"""
self.image = cv2.imread(image_path)
if self.image is None:
raise ValueError(f"Could not load image from {image_path}")
# Convert BGR to RGB for matplotlib
self.image = cv2.cvtColor(self.image, cv2.COLOR_BGR2RGB)
self.gray_image = cv2.cvtColor(self.image, cv2.COLOR_RGB2GRAY)
return self.image
def resize_image(self, width, height, interpolation=cv2.INTER_LINEAR):
"""Resize image to specified dimensions"""
if self.image is None:
raise ValueError("No image loaded")
self.processed_image = cv2.resize(self.image, (width, height), interpolation=interpolation)
return self.processed_image
def rotate_image(self, angle, center=None, scale=1.0):
"""Rotate image by specified angle"""
if self.image is None:
raise ValueError("No image loaded")
h, w = self.image.shape[:2]
if center is None:
center = (w // 2, h // 2)
# Get rotation matrix
rotation_matrix = cv2.getRotationMatrix2D(center, angle, scale)
# Apply rotation
self.processed_image = cv2.warpAffine(self.image, rotation_matrix, (w, h))
return self.processed_image
def apply_gaussian_blur(self, kernel_size=(5, 5), sigma=0):
"""Apply Gaussian blur to reduce noise"""
if self.gray_image is None:
raise ValueError("No image loaded")
self.processed_image = cv2.GaussianBlur(self.gray_image, kernel_size, sigma)
return self.processed_image
def apply_median_filter(self, kernel_size=5):
"""Apply median filter for noise reduction"""
if self.gray_image is None:
raise ValueError("No image loaded")
self.processed_image = cv2.medianBlur(self.gray_image, kernel_size)
return self.processed_image
def detect_edges_canny(self, low_threshold=50, high_threshold=150):
"""Detect edges using Canny edge detection"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Apply Gaussian blur first
blurred = cv2.GaussianBlur(self.gray_image, (5, 5), 0)
# Canny edge detection
edges = cv2.Canny(blurred, low_threshold, high_threshold)
self.processed_image = edges
return edges
def detect_edges_sobel(self, ksize=3):
"""Detect edges using Sobel operator"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Sobel derivatives
grad_x = cv2.Sobel(self.gray_image, cv2.CV_64F, 1, 0, ksize=ksize)
grad_y = cv2.Sobel(self.gray_image, cv2.CV_64F, 0, 1, ksize=ksize)
# Convert to absolute values
abs_grad_x = cv2.convertScaleAbs(grad_x)
abs_grad_y = cv2.convertScaleAbs(grad_y)
# Combine gradients
edges = cv2.addWeighted(abs_grad_x, 0.5, abs_grad_y, 0.5, 0)
self.processed_image = edges
return edges
def detect_corners_harris(self, block_size=2, ksize=3, k=0.04, threshold=0.01):
"""Detect corners using Harris corner detection"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Harris corner detection
corners = cv2.cornerHarris(self.gray_image, block_size, ksize, k)
# Normalize and threshold
corners = cv2.normalize(corners, None, 0, 255, cv2.NORM_MINMAX, cv2.CV_8U)
corners[corners > threshold * corners.max()] = 255
# Create output image
output = self.image.copy()
output[corners > 0] = [255, 0, 0] # Red corners
self.processed_image = output
return corners, output
def apply_histogram_equalization(self):
"""Apply histogram equalization to improve contrast"""
if self.gray_image is None:
raise ValueError("No image loaded")
equalized = cv2.equalizeHist(self.gray_image)
self.processed_image = equalized
return equalized
def adaptive_threshold(self, max_value=255, adaptive_method=cv2.ADAPTIVE_THRESH_GAUSSIAN_C,
threshold_type=cv2.THRESH_BINARY, block_size=11, C=2):
"""Apply adaptive thresholding"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Apply adaptive threshold
thresholded = cv2.adaptiveThreshold(
self.gray_image, max_value, adaptive_method, threshold_type, block_size, C
)
self.processed_image = thresholded
return thresholded
def morphological_operations(self, operation='opening', kernel_size=5, kernel_shape=cv2.MORPH_ELLIPSE):
"""Apply morphological operations"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Create structuring element
kernel = cv2.getStructuringElement(kernel_shape, (kernel_size, kernel_size))
# Apply operation
if operation == 'erosion':
result = cv2.erode(self.gray_image, kernel, iterations=1)
elif operation == 'dilation':
result = cv2.dilate(self.gray_image, kernel, iterations=1)
elif operation == 'opening':
result = cv2.morphologyEx(self.gray_image, cv2.MORPH_OPEN, kernel)
elif operation == 'closing':
result = cv2.morphologyEx(self.gray_image, cv2.MORPH_CLOSE, kernel)
else:
raise ValueError(f"Unknown operation: {operation}")
self.processed_image = result
return result
def watershed_segmentation(self):
"""Apply watershed segmentation"""
if self.image is None:
raise ValueError("No image loaded")
# Convert to grayscale
gray = cv2.cvtColor(self.image, cv2.COLOR_RGB2GRAY)
# Apply threshold
_, thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + cv2.THRESH_OTSU)
# Noise removal
kernel = np.ones((3, 3), np.uint8)
opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, kernel, iterations=2)
# Sure background area
sure_bg = cv2.dilate(opening, kernel, iterations=3)
# Sure foreground area
dist_transform = cv2.distanceTransform(opening, cv2.DIST_L2, 5)
_, sure_fg = cv2.threshold(dist_transform, 0.7 * dist_transform.max(), 255, 0)
sure_fg = np.uint8(sure_fg)
# Unknown region
unknown = cv2.subtract(sure_bg, sure_fg)
# Label markers
_, markers = cv2.connectedComponents(sure_fg)
markers = markers + 1
markers[unknown == 255] = 0
# Apply watershed
markers = cv2.watershed(self.image, markers)
# Create output
output = self.image.copy()
output[markers == -1] = [255, 0, 0] # Watershed boundaries in red
self.processed_image = output
return markers, output
def contour_detection(self, mode=cv2.RETR_EXTERNAL, method=cv2.CHAIN_APPROX_SIMPLE):
"""Detect and draw contours"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Find contours
contours, hierarchy = cv2.findContours(self.gray_image, mode, method)
# Draw contours
output = self.image.copy()
cv2.drawContours(output, contours, -1, (0, 255, 0), 2)
self.processed_image = output
return contours, hierarchy, output
def template_matching(self, template_path, method=cv2.TM_CCOEFF_NORMED):
"""Template matching"""
if self.image is None:
raise ValueError("No image loaded")
# Load template
template = cv2.imread(template_path)
template = cv2.cvtColor(template, cv2.COLOR_BGR2RGB)
# Convert template to grayscale
template_gray = cv2.cvtColor(template, cv2.COLOR_RGB2GRAY)
# Template matching
result = cv2.matchTemplate(self.gray_image, template_gray, method)
# Find best match
min_val, max_val, min_loc, max_loc = cv2.minMaxLoc(result)
# Draw rectangle around match
h, w = template_gray.shape
top_left = max_loc
bottom_right = (top_left[0] + w, top_left[1] + h)
output = self.image.copy()
cv2.rectangle(output, top_left, bottom_right, (255, 0, 0), 2)
self.processed_image = output
return result, output
def face_detection(self, cascade_path='haarcascade_frontalface_default.xml'):
"""Face detection using Haar cascades"""
if self.image is None:
raise ValueError("No image loaded")
# Load face cascade
face_cascade = cv2.CascadeClassifier(cascade_path)
# Detect faces
faces = face_cascade.detectMultiScale(
self.gray_image, scaleFactor=1.1, minNeighbors=5, minSize=(30, 30)
)
# Draw rectangles around faces
output = self.image.copy()
for (x, y, w, h) in faces:
cv2.rectangle(output, (x, y), (x + w, y + h), (255, 0, 0), 2)
cv2.putText(output, 'Face', (x, y - 10), cv2.FONT_HERSHEY_SIMPLEX, 0.9, (255, 0, 0), 2)
self.processed_image = output
return faces, output
def optical_flow(self, prev_frame, current_frame):
"""Calculate optical flow between frames"""
if prev_frame is None or current_frame is None:
raise ValueError("Both frames are required")
# Convert to grayscale
prev_gray = cv2.cvtColor(prev_frame, cv2.COLOR_RGB2GRAY)
curr_gray = cv2.cvtColor(current_frame, cv2.COLOR_RGB2GRAY)
# Calculate optical flow
flow = cv2.calcOpticalFlowPyrLK(prev_gray, curr_gray, None, None)
return flow
def feature_extraction_sift(self):
"""Extract SIFT features"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Initialize SIFT detector
sift = cv2.SIFT_create()
# Detect keypoints and compute descriptors
keypoints, descriptors = sift.detectAndCompute(self.gray_image, None)
# Draw keypoints
output = cv2.drawKeypoints(self.image, keypoints, None, (0, 255, 0), flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
self.processed_image = output
return keypoints, descriptors, output
def feature_extraction_orb(self):
"""Extract ORB features"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Initialize ORB detector
orb = cv2.ORB_create()
# Detect keypoints and compute descriptors
keypoints, descriptors = orb.detectAndCompute(self.gray_image, None)
# Draw keypoints
output = cv2.drawKeypoints(self.image, keypoints, None, (0, 255, 0), flags=0)
self.processed_image = output
return keypoints, descriptors, output
def feature_matching(self, image1, image2, method='sift', ratio_threshold=0.75):
"""Match features between two images"""
# Convert to grayscale
gray1 = cv2.cvtColor(image1, cv2.COLOR_RGB2GRAY)
gray2 = cv2.cvtColor(image2, cv2.COLOR_RGB2GRAY)
# Choose feature detector
if method == 'sift':
detector = cv2.SIFT_create()
elif method == 'orb':
detector = cv2.ORB_create()
else:
raise ValueError(f"Unknown method: {method}")
# Detect keypoints and compute descriptors
kp1, des1 = detector.detectAndCompute(gray1, None)
kp2, des2 = detector.detectAndCompute(gray2, None)
# Feature matching
if method == 'sift':
matcher = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)
else: # ORB uses Hamming distance
matcher = cv2.BFMatcher(cv2.NORM_HAMMING, crossCheck=True)
matches = matcher.match(des1, des2)
# Sort matches by distance
matches = sorted(matches, key=lambda x: x.distance)
# Draw matches
output = cv2.drawMatches(image1, kp1, image2, kp2, matches[:50], None, flags=cv2.DrawMatchesFlags_NOT_DRAW_SINGLE_POINTS)
return matches, output
def histogram_analysis(self):
"""Analyze image histogram"""
if self.image is None:
raise ValueError("No image loaded")
# Calculate histograms for each channel
hist_r = cv2.calcHist([self.image], [0], None, [256], [0, 256])
hist_g = cv2.calcHist([self.image], [1], None, [256], [0, 256])
hist_b = cv2.calcHist([self.image], [2], None, [256], [0, 256])
return hist_r, hist_g, hist_b
def noise_detection(self):
"""Detect noise in image"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Calculate noise using Laplacian variance
laplacian_var = cv2.Laplacian(self.gray_image, cv2.CV_64F).var()
return laplacian_var
def image_quality_assessment(self):
"""Assess image quality"""
if self.gray_image is None:
raise ValueError("No image loaded")
# Calculate various quality metrics
metrics = {}
# Sharpness (Laplacian variance)
metrics['sharpness'] = cv2.Laplacian(self.gray_image, cv2.CV_64F).var()
# Contrast (standard deviation)
metrics['contrast'] = self.gray_image.std()
# Brightness (mean intensity)
metrics['brightness'] = self.gray_image.mean()
# Noise level (estimated)
blurred = cv2.GaussianBlur(self.gray_image, (5, 5), 0)
metrics['noise'] = np.mean(np.abs(self.gray_image.astype(float) - blurred.astype(float)))
return metrics
def save_image(self, output_path):
"""Save processed image"""
if self.processed_image is None:
raise ValueError("No processed image to save")
# Convert RGB to BGR for OpenCV
if len(self.processed_image.shape) == 3:
output_image = cv2.cvtColor(self.processed_image, cv2.COLOR_RGB2BGR)
else:
output_image = self.processed_image
cv2.imwrite(output_path, output_image)
def display_images(self, titles=None):
"""Display original and processed images"""
if self.image is None:
raise ValueError("No image loaded")
fig, axes = plt.subplots(1, 2, figsize=(12, 6))
# Display original image
axes[0].imshow(self.image)
axes[0].set_title('Original Image')
axes[0].axis('off')
# Display processed image
if self.processed_image is not None:
if len(self.processed_image.shape) == 2:
axes[1].imshow(self.processed_image, cmap='gray')
else:
axes[1].imshow(self.processed_image)
else:
axes[1].imshow(self.image)
if titles and len(titles) == 2:
axes[1].set_title(titles[1])
else:
axes[1].set_title('Processed Image')
axes[1].axis('off')
plt.tight_layout()
plt.show()
# Usage example
if __name__ == "__main__":
# Initialize processor
processor = ImageProcessor("sample_image.jpg")
# Apply various processing techniques
edges = processor.detect_edges_canny()
corners, corner_image = processor.detect_corners_harris()
equalized = processor.apply_histogram_equalization()
faces, face_image = processor.face_detection()
# Display results
processor.display_images()
2. Deep Learning mit TensorFlow für Computer Vision
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers, models, applications
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
class TensorFlowVisionModel:
def __init__(self, input_shape=(224, 224, 3), num_classes=10):
"""Initialize TensorFlow Vision Model"""
self.input_shape = input_shape
self.num_classes = num_classes
self.model = None
self.history = None
def build_cnn_model(self, conv_layers=[32, 64, 128], dense_layers=[512, 256], dropout_rate=0.5):
"""Build a custom CNN model"""
model = models.Sequential()
# Input layer
model.add(layers.Input(shape=self.input_shape))
# Convolutional layers
for i, filters in enumerate(conv_layers):
if i == 0:
model.add(layers.Conv2D(filters, (3, 3), activation='relu', padding='same'))
else:
model.add(layers.Conv2D(filters, (3, 3), activation='relu', padding='same'))
model.add(layers.BatchNormalization())
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Dropout(0.25))
# Flatten layer
model.add(layers.Flatten())
# Dense layers
for units in dense_layers:
model.add(layers.Dense(units, activation='relu'))
model.add(layers.BatchNormalization())
model.add(layers.Dropout(dropout_rate))
# Output layer
model.add(layers.Dense(self.num_classes, activation='softmax'))
self.model = model
return model
def build_resnet_model(self, pretrained=True, fine_tune_at=100):
"""Build ResNet model with transfer learning"""
if pretrained:
# Load pretrained ResNet50
base_model = applications.ResNet50(
weights='imagenet',
include_top=False,
input_shape=self.input_shape
)
# Freeze base model layers
base_model.trainable = False
# Fine-tune last few layers
for layer in base_model.layers[fine_tune_at:]:
layer.trainable = True
else:
# Build ResNet from scratch
base_model = applications.ResNet50(
weights=None,
include_top=False,
input_shape=self.input_shape
)
# Add custom classification head
inputs = keras.Input(shape=self.input_shape)
x = base_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(self.num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)
self.model = model
return model
def build_vgg_model(self, pretrained=True):
"""Build VGG model with transfer learning"""
if pretrained:
# Load pretrained VGG16
base_model = applications.VGG16(
weights='imagenet',
include_top=False,
input_shape=self.input_shape
)
# Freeze base model
base_model.trainable = False
else:
# Build VGG from scratch
base_model = applications.VGG16(
weights=None,
include_top=False,
input_shape=self.input_shape
)
# Add custom classification head
inputs = keras.Input(shape=self.input_shape)
x = base_model(inputs, training=False)
x = layers.Flatten()(x)
x = layers.Dense(512, activation='relu')(x)
x = layers.Dropout(0.5)(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(self.num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)
self.model = model
return model
def build_efficientnet_model(self, pretrained=True):
"""Build EfficientNet model with transfer learning"""
if pretrained:
# Load pretrained EfficientNetB0
base_model = applications.EfficientNetB0(
weights='imagenet',
include_top=False,
input_shape=self.input_shape
)
# Freeze base model
base_model.trainable = False
else:
# Build EfficientNet from scratch
base_model = applications.EfficientNetB0(
weights=None,
include_top=False,
input_shape=self.input_shape
)
# Add custom classification head
inputs = keras.Input(shape=self.input_shape)
x = base_model(inputs, training=False)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dropout(0.2)(x)
x = layers.Dense(128, activation='relu')(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(self.num_classes, activation='softmax')(x)
model = keras.Model(inputs, outputs)
self.model = model
return model
def compile_model(self, optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']):
"""Compile the model"""
if self.model is None:
raise ValueError("Model not built yet")
self.model.compile(
optimizer=optimizer,
loss=loss,
metrics=metrics
)
def train_model(self, train_data, val_data, epochs=10, batch_size=32, callbacks=None):
"""Train the model"""
if self.model is None:
raise ValueError("Model not built yet")
# Default callbacks
if callbacks is None:
callbacks = [
keras.callbacks.EarlyStopping(patience=5, restore_best_weights=True),
keras.callbacks.ReduceLROnPlateau(factor=0.2, patience=3),
keras.callbacks.ModelCheckpoint('best_model.h5', save_best_only=True)
]
# Train model
self.history = self.model.fit(
train_data,
validation_data=val_data,
epochs=epochs,
batch_size=batch_size,
callbacks=callbacks
)
return self.history
def evaluate_model(self, test_data):
"""Evaluate the model"""
if self.model is None:
raise ValueError("Model not built yet")
results = self.model.evaluate(test_data)
return dict(zip(self.model.metrics_names, results))
def predict(self, data):
"""Make predictions"""
if self.model is None:
raise ValueError("Model not built yet")
predictions = self.model.predict(data)
return predictions
def plot_training_history(self):
"""Plot training history"""
if self.history is None:
raise ValueError("Model not trained yet")
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
# Plot training & validation accuracy
ax1.plot(self.history.history['accuracy'], label='Training Accuracy')
ax1.plot(self.history.history['val_accuracy'], label='Validation Accuracy')
ax1.set_title('Model Accuracy')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Accuracy')
ax1.legend()
ax1.grid(True)
# Plot training & validation loss
ax2.plot(self.history.history['loss'], label='Training Loss')
ax2.plot(self.history.history['val_loss'], label='Validation Loss')
ax2.set_title('Model Loss')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Loss')
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.show()
def plot_confusion_matrix(self, test_data, class_names=None):
"""Plot confusion matrix"""
if self.model is None:
raise ValueError("Model not built yet")
# Get predictions
y_pred = np.argmax(self.model.predict(test_data), axis=1)
y_true = test_data.classes
# Create confusion matrix
cm = confusion_matrix(y_true, y_pred)
# Plot
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
def save_model(self, filepath):
"""Save the model"""
if self.model is None:
raise ValueError("Model not built yet")
self.model.save(filepath)
def load_model(self, filepath):
"""Load a saved model"""
self.model = keras.models.load_model(filepath)
return self.model
# Data augmentation and preprocessing
class DataAugmentation:
def __init__(self, target_size=(224, 224), batch_size=32):
"""Initialize data augmentation"""
self.target_size = target_size
self.batch_size = batch_size
def create_train_generator(self, train_dir, validation_split=0.2):
"""Create training data generator with augmentation"""
train_datagen = keras.preprocessing.image.ImageDataGenerator(
rescale=1./255,
rotation_range=20,
width_shift_range=0.2,
height_shift_range=0.2,
horizontal_flip=True,
vertical_flip=False,
zoom_range=0.2,
shear_range=0.2,
fill_mode='nearest',
validation_split=validation_split
)
train_generator = train_datagen.flow_from_directory(
train_dir,
target_size=self.target_size,
batch_size=self.batch_size,
class_mode='sparse',
subset='training'
)
validation_generator = train_datagen.flow_from_directory(
train_dir,
target_size=self.target_size,
batch_size=self.batch_size,
class_mode='sparse',
subset='validation'
)
return train_generator, validation_generator
def create_test_generator(self, test_dir):
"""Create test data generator"""
test_datagen = keras.preprocessing.image.ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=self.target_size,
batch_size=self.batch_size,
class_mode='sparse',
shuffle=False
)
return test_generator
# Object Detection with TensorFlow
class ObjectDetectionModel:
def __init__(self, model_type='yolo'):
"""Initialize object detection model"""
self.model_type = model_type
self.model = None
def build_yolo_model(self, input_shape=(416, 416, 3), num_classes=20, anchors=None):
"""Build YOLO model"""
if anchors is None:
# Default YOLO anchors
anchors = np.array([
[(10, 13), (16, 30), (33, 23)],
[(30, 61), (62, 45), (59, 119)],
[(116, 90), (156, 198), (373, 326)]
])
# Build YOLO architecture (simplified)
inputs = keras.Input(shape=input_shape)
# Feature extraction (simplified backbone)
x = layers.Conv2D(32, (3, 3), padding='same', activation='relu')(inputs)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
# Detection heads (simplified)
output1 = layers.Conv2D(len(anchors[0]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
output2 = layers.Conv2D(len(anchors[1]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
output3 = layers.Conv2D(len(anchors[2]) * (num_classes + 5), (1, 1), activation='sigmoid')(x)
model = keras.Model(inputs, [output1, output2, output3])
self.model = model
return model
def build_ssd_model(self, input_shape=(300, 300, 3), num_classes=20):
"""Build SSD model"""
inputs = keras.Input(shape=input_shape)
# Base network (simplified VGG-like)
x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(inputs)
x = layers.Conv2D(64, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
x = layers.Conv2D(128, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
# Additional layers for multi-scale feature extraction
x = layers.Conv2D(256, (3, 3), padding='same', activation='relu')(x)
x = layers.Conv2D(256, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
x = layers.Conv2D(512, (3, 3), padding='same', activation='relu')(x)
x = layers.MaxPooling2D((2, 2))(x)
# Detection heads (simplified)
detections = []
for i in range(6): # 6 detection layers
detection = layers.Conv2D(num_classes + 12, (3, 3), padding='same', activation='sigmoid')(x)
detections.append(detection)
model = keras.Model(inputs, detections)
self.model = model
return model
def compile_model(self, optimizer='adam', loss='mse', metrics=['accuracy']):
"""Compile the object detection model"""
if self.model is None:
raise ValueError("Model not built yet")
self.model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
def predict_boxes(self, image, confidence_threshold=0.5, nms_threshold=0.4):
"""Predict bounding boxes"""
if self.model is None:
raise ValueError("Model not built yet")
# Preprocess image
image = tf.expand_dims(image, axis=0)
# Make prediction
predictions = self.model.predict(image)
# Post-process predictions (simplified)
boxes = []
scores = []
classes = []
for prediction in predictions:
# Extract boxes, scores, and classes from prediction
# This is a simplified version - actual implementation would be more complex
pass
# Apply Non-Maximum Suppression
indices = tf.image.non_max_suppression(
boxes, scores, max_output_size=100, iou_threshold=nms_threshold
)
# Filter results
final_boxes = tf.gather(boxes, indices)
final_scores = tf.gather(scores, indices)
final_classes = tf.gather(classes, indices)
return final_boxes, final_scores, final_classes
# Semantic Segmentation Model
class SemanticSegmentationModel:
def __init__(self, input_shape=(256, 256, 3), num_classes=10):
"""Initialize semantic segmentation model"""
self.input_shape = input_shape
self.num_classes = num_classes
self.model = None
def build_unet_model(self):
"""Build U-Net model for semantic segmentation"""
inputs = keras.Input(shape=self.input_shape)
# Encoder
c1 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(inputs)
c1 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(c1)
p1 = layers.MaxPooling2D((2, 2))(c1)
c2 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(p1)
c2 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(c2)
p2 = layers.MaxPooling2D((2, 2))(c2)
c3 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(p2)
c3 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(c3)
p3 = layers.MaxPooling2D((2, 2))(c3)
# Bridge
c4 = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(p3)
c4 = layers.Conv2D(512, (3, 3), activation='relu', padding='same')(c4)
# Decoder
u5 = layers.Conv2DTranspose(256, (2, 2), strides=(2, 2), padding='same')(c4)
u5 = layers.concatenate([u5, c3])
c5 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(u5)
c5 = layers.Conv2D(256, (3, 3), activation='relu', padding='same')(c5)
u6 = layers.Conv2DTranspose(128, (2, 2), strides=(2, 2), padding='same')(c5)
u6 = layers.concatenate([u6, c2])
c6 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(u6)
c6 = layers.Conv2D(128, (3, 3), activation='relu', padding='same')(c6)
u7 = layers.Conv2DTranspose(64, (2, 2), strides=(2, 2), padding='same')(c6)
u7 = layers.concatenate([u7, c1])
c7 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(u7)
c7 = layers.Conv2D(64, (3, 3), activation='relu', padding='same')(c7)
# Output
outputs = layers.Conv2D(self.num_classes, (1, 1), activation='softmax')(c7)
model = keras.Model(inputs, outputs)
self.model = model
return model
def compile_model(self, optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy']):
"""Compile the segmentation model"""
if self.model is None:
raise ValueError("Model not built yet")
self.model.compile(optimizer=optimizer, loss=loss, metrics=metrics)
def predict_segmentation(self, image):
"""Predict segmentation mask"""
if self.model is None:
raise ValueError("Model not built yet")
# Preprocess image
image = tf.expand_dims(image, axis=0)
# Make prediction
prediction = self.model.predict(image)
# Get class with highest probability for each pixel
segmentation_mask = tf.argmax(prediction, axis=-1)
return segmentation_mask[0]
# Usage example
if __name__ == "__main__":
# Initialize model
model = TensorFlowVisionModel(input_shape=(224, 224, 3), num_classes=10)
# Build model
model.build_resnet_model(pretrained=True)
# Compile model
model.compile_model(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
# Create data generators
data_aug = DataAugmentation(target_size=(224, 224), batch_size=32)
train_gen, val_gen = data_aug.create_train_generator('data/train')
test_gen = data_aug.create_test_generator('data/test')
# Train model
history = model.train_model(train_gen, val_gen, epochs=20)
# Evaluate model
results = model.evaluate_model(test_gen)
print(f"Test accuracy: {results['accuracy']:.4f}")
# Plot results
model.plot_training_history()
model.plot_confusion_matrix(test_gen)
3. PyTorch Computer Vision Implementierung
import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch.utils.data import DataLoader, Dataset
import torchvision
import torchvision.transforms as transforms
from torchvision import models
import numpy as np
import matplotlib.pyplot as plt
from sklearn.metrics import classification_report, confusion_matrix
import seaborn as sns
class PyTorchVisionModel(nn.Module):
def __init__(self, num_classes=10, input_channels=3):
"""Initialize PyTorch Vision Model"""
super(PyTorchVisionModel, self).__init__()
self.num_classes = num_classes
self.input_channels = input_channels
# Build model architecture
self.features = self._build_features()
self.classifier = self._build_classifier()
def _build_features(self):
"""Build feature extraction layers"""
return nn.Sequential(
# First convolutional block
nn.Conv2d(self.input_channels, 64, kernel_size=3, padding=1),
nn.BatchNorm2d(64),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# Second convolutional block
nn.Conv2d(64, 128, kernel_size=3, padding=1),
nn.BatchNorm2d(128),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# Third convolutional block
nn.Conv2d(128, 256, kernel_size=3, padding=1),
nn.BatchNorm2d(256),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
# Fourth convolutional block
nn.Conv2d(256, 512, kernel_size=3, padding=1),
nn.BatchNorm2d(512),
nn.ReLU(inplace=True),
nn.MaxPool2d(kernel_size=2, stride=2),
)
def _build_classifier(self):
"""Build classification layers"""
return nn.Sequential(
nn.Dropout(0.5),
nn.Linear(512 * 14 * 14, 1024), # Assuming input size 224x224
nn.ReLU(inplace=True),
nn.Dropout(0.5),
nn.Linear(1024, 512),
nn.ReLU(inplace=True),
nn.Linear(512, self.num_classes)
)
def forward(self, x):
"""Forward pass"""
x = self.features(x)
x = torch.flatten(x, 1)
x = self.classifier(x)
return x
class ResNetModel(nn.Module):
def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
"""Initialize ResNet model with transfer learning"""
super(ResNetModel, self).__init__()
# Load pretrained ResNet
if pretrained:
self.backbone = models.resnet50(pretrained=True)
else:
self.backbone = models.resnet50(pretrained=False)
# Modify final layer
num_features = self.backbone.fc.in_features
self.backbone.fc = nn.Linear(num_features, num_classes)
# Freeze layers if not fine-tuning
if not fine_tune:
for param in self.backbone.parameters():
param.requires_grad = False
# Only train the final layer
for param in self.backbone.fc.parameters():
param.requires_grad = True
def forward(self, x):
"""Forward pass"""
return self.backbone(x)
class VGGModel(nn.Module):
def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
"""Initialize VGG model with transfer learning"""
super(VGGModel, self).__init__()
# Load pretrained VGG
if pretrained:
self.backbone = models.vgg16(pretrained=True)
else:
self.backbone = models.vgg16(pretrained=False)
# Modify classifier
num_features = self.backbone.classifier[6].in_features
self.backbone.classifier[6] = nn.Linear(num_features, num_classes)
# Freeze layers if not fine-tuning
if not fine_tune:
for param in self.backbone.features.parameters():
param.requires_grad = False
# Only train the classifier
for param in self.backbone.classifier.parameters():
param.requires_grad = True
def forward(self, x):
"""Forward pass"""
return self.backbone(x)
class EfficientNetModel(nn.Module):
def __init__(self, num_classes=10, pretrained=True, fine_tune=False):
"""Initialize EfficientNet model with transfer learning"""
super(EfficientNetModel, self).__init__()
# Load pretrained EfficientNet
if pretrained:
self.backbone = models.efficientnet_b0(pretrained=True)
else:
self.backbone = models.efficientnet_b0(pretrained=False)
# Modify classifier
num_features = self.backbone.classifier[1].in_features
self.backbone.classifier[1] = nn.Linear(num_features, num_classes)
# Freeze layers if not fine-tuning
if not fine_tune:
for param in self.backbone.parameters():
param.requires_grad = False
# Only train the classifier
for param in self.backbone.classifier.parameters():
param.requires_grad = True
def forward(self, x):
"""Forward pass"""
return self.backbone(x)
class CustomDataset(Dataset):
"""Custom dataset for image classification"""
def __init__(self, data_dir, transform=None):
self.data_dir = data_dir
self.transform = transform
self.images = []
self.labels = []
self.class_to_idx = {}
# Load dataset
self._load_dataset()
def _load_dataset(self):
"""Load images and labels from directory"""
import os
from PIL import Image
classes = sorted(os.listdir(self.data_dir))
self.class_to_idx = {cls: idx for idx, cls in enumerate(classes)}
for class_name in classes:
class_dir = os.path.join(self.data_dir, class_name)
if os.path.isdir(class_dir):
class_idx = self.class_to_idx[class_name]
for img_name in os.listdir(class_dir):
if img_name.lower().endswith(('.png', '.jpg', '.jpeg')):
img_path = os.path.join(class_dir, img_name)
self.images.append(img_path)
self.labels.append(class_idx)
def __len__(self):
return len(self.images)
def __getitem__(self, idx):
img_path = self.images[idx]
label = self.labels[idx]
# Load image
from PIL import Image
image = Image.open(img_path).convert('RGB')
# Apply transforms
if self.transform:
image = self.transform(image)
return image, label
class Trainer:
"""Training class for PyTorch models"""
def __init__(self, model, device='cuda' if torch.cuda.is_available() else 'cpu'):
self.model = model.to(device)
self.device = device
self.train_losses = []
self.train_accuracies = []
self.val_losses = []
self.val_accuracies = []
def train(self, train_loader, val_loader, epochs=10, learning_rate=0.001):
"""Train the model"""
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(self.model.parameters(), lr=learning_rate)
scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=7, gamma=0.1)
for epoch in range(epochs):
# Training phase
self.model.train()
train_loss = 0.0
train_correct = 0
train_total = 0
for batch_idx, (data, target) in enumerate(train_loader):
data, target = data.to(self.device), target.to(self.device)
optimizer.zero_grad()
output = self.model(data)
loss = criterion(output, target)
loss.backward()
optimizer.step()
train_loss += loss.item()
_, predicted = torch.max(output.data, 1)
train_total += target.size(0)
train_correct += (predicted == target).sum().item()
# Validation phase
self.model.eval()
val_loss = 0.0
val_correct = 0
val_total = 0
with torch.no_grad():
for data, target in val_loader:
data, target = data.to(self.device), target.to(self.device)
output = self.model(data)
loss = criterion(output, target)
val_loss += loss.item()
_, predicted = torch.max(output.data, 1)
val_total += target.size(0)
val_correct += (predicted == target).sum().item()
# Calculate metrics
train_accuracy = 100 * train_correct / train_total
val_accuracy = 100 * val_correct / val_total
# Store metrics
self.train_losses.append(train_loss / len(train_loader))
self.train_accuracies.append(train_accuracy)
self.val_losses.append(val_loss / len(val_loader))
self.val_accuracies.append(val_accuracy)
# Print progress
print(f'Epoch {epoch+1}/{epochs}:')
print(f'Train Loss: {train_loss/len(train_loader):.4f}, Train Acc: {train_accuracy:.2f}%')
print(f'Val Loss: {val_loss/len(val_loader):.4f}, Val Acc: {val_accuracy:.2f}%')
print('-' * 50)
scheduler.step()
def evaluate(self, test_loader):
"""Evaluate the model"""
self.model.eval()
test_loss = 0.0
test_correct = 0
test_total = 0
all_predictions = []
all_targets = []
criterion = nn.CrossEntropyLoss()
with torch.no_grad():
for data, target in test_loader:
data, target = data.to(self.device), target.to(self.device)
output = self.model(data)
loss = criterion(output, target)
test_loss += loss.item()
_, predicted = torch.max(output.data, 1)
test_total += target.size(0)
test_correct += (predicted == target).sum().item()
all_predictions.extend(predicted.cpu().numpy())
all_targets.extend(target.cpu().numpy())
test_accuracy = 100 * test_correct / test_total
print(f'Test Loss: {test_loss/len(test_loader):.4f}')
print(f'Test Accuracy: {test_accuracy:.2f}%')
return test_accuracy, all_predictions, all_targets
def plot_training_history(self):
"""Plot training history"""
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
# Plot training & validation loss
ax1.plot(self.train_losses, label='Training Loss')
ax1.plot(self.val_losses, label='Validation Loss')
ax1.set_title('Model Loss')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.legend()
ax1.grid(True)
# Plot training & validation accuracy
ax2.plot(self.train_accuracies, label='Training Accuracy')
ax2.plot(self.val_accuracies, label='Validation Accuracy')
ax2.set_title('Model Accuracy')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy (%)')
ax2.legend()
ax2.grid(True)
plt.tight_layout()
plt.show()
def plot_confusion_matrix(self, predictions, targets, class_names=None):
"""Plot confusion matrix"""
cm = confusion_matrix(targets, predictions)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',
xticklabels=class_names, yticklabels=class_names)
plt.title('Confusion Matrix')
plt.xlabel('Predicted')
plt.ylabel('True')
plt.show()
def save_model(self, filepath):
"""Save the model"""
torch.save(self.model.state_dict(), filepath)
def load_model(self, filepath):
"""Load a saved model"""
self.model.load_state_dict(torch.load(filepath))
return self.model
class ObjectDetectionModel(nn.Module):
"""Object Detection Model using PyTorch"""
def __init__(self, num_classes=20, backbone='resnet50'):
super(ObjectDetectionModel, self).__init__()
self.num_classes = num_classes
self.backbone_name = backbone
# Build backbone
if backbone == 'resnet50':
self.backbone = models.resnet50(pretrained=True)
self.backbone = nn.Sequential(*list(self.backbone.children())[:-2])
elif backbone == 'vgg16':
self.backbone = models.vgg16(pretrained=True)
self.backbone = nn.Sequential(*list(self.backbone.features.children()))
# Detection heads (simplified)
self.classifier = nn.Conv2d(2048, num_classes * 5, 3, padding=1) # 5 = 4 bbox + 1 confidence
def forward(self, x):
"""Forward pass"""
features = self.backbone(x)
detections = self.classifier(features)
return detections
class SemanticSegmentationModel(nn.Module):
"""Semantic Segmentation Model using PyTorch"""
def __init__(self, num_classes=10):
super(SemanticSegmentationModel, self).__init__()
self.num_classes = num_classes
# Encoder
self.encoder1 = self._conv_block(3, 64)
self.encoder2 = self._conv_block(64, 128)
self.encoder3 = self._conv_block(128, 256)
self.encoder4 = self._conv_block(256, 512)
# Bridge
self.bridge = self._conv_block(512, 1024)
# Decoder
self.decoder4 = self._conv_block(1024 + 512, 512)
self.decoder3 = self._conv_block(512 + 256, 256)
self.decoder2 = self._conv_block(256 + 128, 128)
self.decoder1 = self._conv_block(128 + 64, 64)
# Output
self.output = nn.Conv2d(64, num_classes, 1)
# Pooling and upsampling
self.pool = nn.MaxPool2d(2)
self.upsample = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True)
def _conv_block(self, in_channels, out_channels):
"""Convolutional block"""
return nn.Sequential(
nn.Conv2d(in_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True),
nn.Conv2d(out_channels, out_channels, 3, padding=1),
nn.BatchNorm2d(out_channels),
nn.ReLU(inplace=True)
)
def forward(self, x):
"""Forward pass"""
# Encoder
enc1 = self.encoder1(x)
enc2 = self.encoder2(self.pool(enc1))
enc3 = self.encoder3(self.pool(enc2))
enc4 = self.encoder4(self.pool(enc3))
# Bridge
bridge = self.bridge(self.pool(enc4))
# Decoder
dec4 = self.decoder4(torch.cat([self.upsample(bridge), enc4], dim=1))
dec3 = self.decoder3(torch.cat([self.upsample(dec4), enc3], dim=1))
dec2 = self.decoder2(torch.cat([self.upsample(dec3), enc2], dim=1))
dec1 = self.decoder1(torch.cat([self.upsample(dec2), enc1], dim=1))
# Output
output = self.output(dec1)
return output
# Data augmentation and preprocessing
def get_transforms(image_size=224):
"""Get data transforms"""
train_transform = transforms.Compose([
transforms.Resize((image_size, image_size)),
transforms.RandomHorizontalFlip(),
transforms.RandomRotation(10),
transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2, hue=0.1),
transforms.RandomAffine(degrees=0, translate=(0.1, 0.1)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
val_transform = transforms.Compose([
transforms.Resize((image_size, image_size)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
return train_transform, val_transform
# Usage example
if __name__ == "__main__":
# Set device
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
print(f"Using device: {device}")
# Get transforms
train_transform, val_transform = get_transforms(image_size=224)
# Create datasets
train_dataset = CustomDataset('data/train', transform=train_transform)
val_dataset = CustomDataset('data/val', transform=val_transform)
test_dataset = CustomDataset('data/test', transform=val_transform)
# Create data loaders
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
val_loader = DataLoader(val_dataset, batch_size=32, shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
# Create model
model = ResNetModel(num_classes=10, pretrained=True, fine_tune=False)
# Create trainer
trainer = Trainer(model, device=device)
# Train model
trainer.train(train_loader, val_loader, epochs=20, learning_rate=0.001)
# Evaluate model
test_accuracy, predictions, targets = trainer.evaluate(test_loader)
# Plot results
trainer.plot_training_history()
trainer.plot_confusion_matrix(predictions, targets, class_names=list(train_dataset.class_to_idx.keys()))
# Save model
trainer.save_model('best_model.pth')
Computer Vision Architekturen
CNN Architektur Übersicht
graph TD
A[Input Image] --> B[Conv Layer 1]
B --> C[ReLU Activation]
C --> D[Max Pooling]
D --> E[Conv Layer 2]
E --> F[ReLU Activation]
F --> G[Max Pooling]
G --> H[Conv Layer 3]
H --> I[ReLU Activation]
I --> J[Global Average Pooling]
J --> K[Fully Connected]
K --> L[Output Layer]
A1[224x224x3] --> B
B1[224x224x64] --> C
C1[224x224x64] --> D
D1[112x112x64] --> E
E1[112x112x128] --> F
F1[112x112x128] --> G
G1[56x56x128] --> H
H1[56x56x256] --> I
I1[56x56x256] --> J
J1[256] --> K
K1[512] --> L
L1[10] --> M[Predictions]
Deep Learning Modelle Vergleich
CNN Architekturen
| Modell | Parameter | Top-1 Accuracy | Größe | Anwendung |
|---|---|---|---|---|
| VGG16 | 138M | 71.5% | 528 MB | Klassifikation |
| ResNet50 | 25.6M | 76.1% | 98 MB | Transfer Learning |
| EfficientNet-B0 | 5.3M | 77.1% | 20 MB | Mobile |
| MobileNetV2 | 3.5M | 71.8% | 14 MB | Edge Devices |
Object Detection Modelle
| Modell | mAP | FPS | Backbone | Anwendung |
|---|---|---|---|---|
| YOLOv5 | 56.8% | 140 | CSPDarknet | Real-time |
| SSD | 46.5% | 46 | VGG16 | Balanced |
| Faster R-CNN | 42.0% | 7 | ResNet101 | Accuracy |
| RetinaNet | 39.1% | 15 | ResNet50 | Focal Loss |
Bildverarbeitungstechniken
Filter und Transformationen
# Advanced image processing techniques
class AdvancedImageProcessing:
def __init__(self):
pass
def gaussian_blur(self, image, kernel_size=(5, 5), sigma=1.0):
"""Apply Gaussian blur"""
return cv2.GaussianBlur(image, kernel_size, sigma)
def bilateral_filter(self, image, d=9, sigma_color=75, sigma_space=75):
"""Apply bilateral filter for edge-preserving smoothing"""
return cv2.bilateralFilter(image, d, sigma_color, sigma_space)
def morphological_gradient(self, image, kernel_size=3):
"""Apply morphological gradient"""
kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (kernel_size, kernel_size))
return cv2.morphologyEx(image, cv2.MORPH_GRADIENT, kernel)
def adaptive_histogram_equalization(self, image, clip_limit=2.0, tile_grid_size=(8, 8)):
"""Apply CLAHE (Contrast Limited Adaptive Histogram Equalization)"""
clahe = cv2.createCLAHE(clipLimit=clip_limit, tileGridSize=tile_grid_size)
return clahe.apply(image)
def fourier_transform(self, image):
"""Apply Fourier Transform"""
# Convert to grayscale if needed
if len(image.shape) == 3:
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
else:
gray = image
# Apply FFT
f_transform = np.fft.fft2(gray)
f_shift = np.fft.fftshift(f_transform)
# Get magnitude spectrum
magnitude_spectrum = 20 * np.log(np.abs(f_shift))
return magnitude_spectrum
def edge_detection_laplacian(self, image):
"""Apply Laplacian edge detection"""
return cv2.Laplacian(image, cv2.CV_64F)
def hough_lines(self, image, rho=1, theta=np.pi/180, threshold=100):
"""Detect lines using Hough Transform"""
edges = cv2.Canny(image, 50, 150, apertureSize=3)
lines = cv2.HoughLines(edges, rho, theta, threshold)
return lines
def hough_circles(self, image, dp=1, min_dist=20, param1=50, param2=30, min_radius=0, max_radius=0):
"""Detect circles using Hough Transform"""
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY) if len(image.shape) == 3 else image
circles = cv2.HoughCircles(gray, cv2.HOUGH_GRADIENT, dp, min_dist,
param1=param1, param2=param2,
minRadius=min_radius, maxRadius=max_radius)
return circles
Vorteile und Nachteile
Vorteile von Computer Vision
- Automatisierung: Manuelle Bildanalyse wird überflüssig
- Skalierbarkeit: Große Datenmengen können verarbeitet werden
- Konsistenz: Gleichbleibende Qualität der Analyse
- 24/7 Betrieb: Kontinuierliche Verarbeitung möglich
- Kosteneffizienz: Reduzierung manueller Arbeit
Nachteile
- Datenabhängigkeit: Hoher Bedarf an Trainingsdaten
- Rechenintensität: Benötigt leistungsfähige Hardware
- Interpretierbarkeit: Black-Box-Problem bei Deep Learning
- Bias: Kann Vorurteile aus Trainingsdaten lernen
- Komplexität: Implementierung ist komplex
Häufige Prüfungsfragen
-
Was ist der Unterschied zwischen Bildverarbeitung und Computer Vision? Bildverarbeitung konzentriert sich auf die Verbesserung von Bildern, Computer Vision auf die Interpretation und Verständnis von visuellen Daten.
-
Erklären Sie die Funktionsweise von Convolutional Neural Networks! CNNs verwenden Faltungsschichten zur Feature-Extraktion, gefolgt von Pooling-Schichten zur Dimensionsreduktion und Fully-Connected-Schichten zur Klassifikation.
-
Wann verwendet man welche CNN-Architektur? VGG für einfache Klassifikation, ResNet für Transfer Learning, EfficientNet für mobile Anwendungen, MobileNet für Edge Devices.
-
Was ist der Zweck von Data Augmentation? Data Augmentation erzeugt künstlich Trainingsdaten durch Transformationen wie Rotation, Skalierung und Farbänderungen zur Verbesserung der Generalisierung.