Skip to content
IRC-Coding IRC-Coding
Machine Learning Grundlagen Supervised Unsupervised Reinforcement Learning ML Algorithmen Künstliche Intelligenz

Machine Learning Grundlagen: Supervised, Unsupervised & Reinforcement Learning mit Algorithmen

Machine Learning Grundlagen mit Supervised, Unsupervised und Reinforcement Learning. Algorithmen, Konzepte, Anwendungen und praktische Beispiele mit Python und Java.

S

schutzgeist

2 min read

Machine Learning Grundlagen: Supervised, Unsupervised & Reinforcement Learning mit Algorithmen

Dieser Beitrag ist eine umfassende Einführung in die Machine Learning Grundlagen – inklusive Supervised, Unsupervised und Reinforcement Learning mit Algorithmen und praktischen Beispielen.

In a Nutshell

Machine Learning ermöglicht Computern das Lernen aus Daten. Supervised Learning lernt mit gelabelten Daten, Unsupervised Learning findet Muster in ungelabelten Daten, Reinforcement Learning lernt durch Belohnung.

Kompakte Fachbeschreibung

Machine Learning ist ein Teilgebiet der künstlichen Intelligenz, bei dem Algorithmen aus Daten lernen, ohne explizit programmiert zu werden.

Lernkategorien:

Supervised Learning

  • Konzept: Lernen mit gelabelten Trainingsdaten
  • Ziel: Vorhersagen für neue, ungesehene Daten
  • Typen: Klassifikation (diskrete Werte), Regression (kontinuierliche Werte)
  • Algorithmen: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks

Unsupervised Learning

  • Konzept: Lernen ohne gelabelte Daten
  • Ziel: Strukturen und Muster in Daten entdecken
  • Typen: Clustering, Dimensionality Reduction, Association
  • Algorithmen: K-Means, Hierarchical Clustering, PCA, Apriori

Reinforcement Learning

  • Konzept: Lernen durch Interaktion mit Umgebung
  • Ziel: Maximierung kumulativer Belohnung
  • Typen: Model-based, Model-free, Multi-agent
  • Algorithmen: Q-Learning, Deep Q-Networks, Policy Gradients

Prüfungsrelevante Stichpunkte

  • Machine Learning: Automatisches Lernen aus Daten
  • Supervised Learning: Lernen mit gelabelten Daten (Klassifikation, Regression)
  • Unsupervised Learning: Lernen ohne Labels (Clustering, Mustererkennung)
  • Reinforcement Learning: Lernen durch Belohnung (Agent, Umgebung, Aktionen)
  • Training/Testing: Datenaufteilung für Modellvalidierung
  • Overfitting/Underfitting: Anpassungsprobleme von Modellen
  • Feature Engineering: Datenvorbereitung und Transformation
  • IHK-relevant: Moderne KI-Technologien und Anwendungen

Kernkomponenten

  1. Daten: Trainings-, Validierungs-, Test-Daten
  2. Features: Eingabevariablen und Merkmale
  3. Modelle: Mathematische Funktionen und Algorithmen
  4. Training: Anpassung der Modellparameter
  5. Evaluation: Leistungsmessung und Validierung
  6. Prediction: Vorhersagen für neue Daten
  7. Optimization: Hyperparameter-Tuning
  8. Deployment: Integration in Produktionssysteme

Praxisbeispiele

1. Supervised Learning mit Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_regression

# Supervised Learning Demo
class SupervisedLearningDemo:
    
    def __init__(self):
        self.models = {}
        self.results = {}
    
    # Lineare Regression
    def linear_regression_demo(self):
        print("=== Lineare Regression Demo ===")
        
        # Synthetische Daten erstellen
        np.random.seed(42)
        X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
        
        # Daten aufteilen
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Modell trainieren
        model = LinearRegression()
        model.fit(X_train, y_train)
        
        # Vorhersagen
        y_train_pred = model.predict(X_train)
        y_test_pred = model.predict(X_test)
        
        # Evaluation
        train_mse = mean_squared_error(y_train, y_train_pred)
        test_mse = mean_squared_error(y_test, y_test_pred)
        
        print(f"Training MSE: {train_mse:.2f}")
        print(f"Test MSE: {test_mse:.2f}")
        print(f"Koeffizient: {model.coef_[0]:.2f}")
        print(f"Intercept: {model.intercept_:.2f}")
        
        # Ergebnisse speichern
        self.models['linear_regression'] = model
        self.results['linear_regression'] = {
            'train_mse': train_mse,
            'test_mse': test_mse,
            'r2_score': model.score(X_test, y_test)
        }
        
        return X_train, X_test, y_train, y_test, y_test_pred
    
    # Logistische Regression (Klassifikation)
    def logistic_regression_demo(self):
        print("\n=== Logistische Regression Demo ===")
        
        # Klassifikationsdaten erstellen
        X, y = make_classification(n_samples=200, n_features=2, n_redundant=0, 
                                  n_informative=2, random_state=42, n_clusters_per_class=1)
        
        # Daten aufteilen
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Features skalieren
        scaler = StandardScaler()
        X_train_scaled = scaler.fit_transform(X_train)
        X_test_scaled = scaler.transform(X_test)
        
        # Modell trainieren
        model = LogisticRegression(random_state=42)
        model.fit(X_train_scaled, y_train)
        
        # Vorhersagen
        y_train_pred = model.predict(X_train_scaled)
        y_test_pred = model.predict(X_test_scaled)
        
        # Evaluation
        train_accuracy = accuracy_score(y_train, y_train_pred)
        test_accuracy = accuracy_score(y_test, y_test_pred)
        
        print(f"Training Accuracy: {train_accuracy:.3f}")
        print(f"Test Accuracy: {test_accuracy:.3f}")
        print("Test Classification Report:")
        print(classification_report(y_test, y_test_pred))
        
        # Ergebnisse speichern
        self.models['logistic_regression'] = model
        self.results['logistic_regression'] = {
            'train_accuracy': train_accuracy,
            'test_accuracy': test_accuracy
        }
        
        return X_train_scaled, X_test_scaled, y_train, y_test, y_test_pred
    
    # Decision Tree Classifier
    def decision_tree_demo(self):
        print("\n=== Decision Tree Demo ===")
        
        # Komplexere Klassifikationsdaten
        X, y = make_classification(n_samples=300, n_features=4, n_redundant=1, 
                                  n_informative=3, random_state=42, n_clusters_per_class=2)
        
        # Daten aufteilen
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Decision Tree mit verschiedenen Tiefen
        depths = [3, 5, 10, None]
        
        for depth in depths:
            model = DecisionTreeClassifier(max_depth=depth, random_state=42)
            model.fit(X_train, y_train)
            
            # Vorhersagen
            y_train_pred = model.predict(X_train)
            y_test_pred = model.predict(X_test)
            
            # Evaluation
            train_accuracy = accuracy_score(y_train, y_train_pred)
            test_accuracy = accuracy_score(y_test, y_test_pred)
            
            print(f"Max Depth {depth if depth else 'None'}:")
            print(f"  Training Accuracy: {train_accuracy:.3f}")
            print(f"  Test Accuracy: {test_accuracy:.3f}")
            
            # Overfitting erkennen
            overfitting = train_accuracy - test_accuracy
            if overfitting > 0.1:
                print(f"  ⚠️  Overfitting detected (diff: {overfitting:.3f})")
        
        # Bestes Modell speichern
        best_model = DecisionTreeClassifier(max_depth=5, random_state=42)
        best_model.fit(X_train, y_train)
        self.models['decision_tree'] = best_model
        
        return X_train, X_test, y_train, y_test
    
    # Random Forest
    def random_forest_demo(self):
        print("\n=== Random Forest Demo ===")
        
        # Hochdimensionale Daten
        X, y = make_classification(n_samples=500, n_features=10, n_redundant=3, 
                                  n_informative=7, random_state=42, n_clusters_per_class=2)
        
        # Daten aufteilen
        X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
        
        # Random Forest mit verschiedenen Baumanzahlen
        n_estimators_list = [10, 50, 100, 200]
        
        best_accuracy = 0
        best_model = None
        
        for n_estimators in n_estimators_list:
            model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
            model.fit(X_train, y_train)
            
            # Vorhersagen
            y_test_pred = model.predict(X_test)
            accuracy = accuracy_score(y_test, y_test_pred)
            
            print(f"Trees: {n_estimators}, Test Accuracy: {accuracy:.3f}")
            
            if accuracy > best_accuracy:
                best_accuracy = accuracy
                best_model = model
        
        print(f"\nBest Random Forest Accuracy: {best_accuracy:.3f}")
        
        # Feature Importance
        feature_importance = best_model.feature_importances_
        print("Top 5 Feature Importances:")
        for i, importance in sorted(enumerate(feature_importance), key=lambda x: x[1], reverse=True)[:5]:
            print(f"  Feature {i}: {importance:.3f}")
        
        self.models['random_forest'] = best_model
        self.results['random_forest'] = {'test_accuracy': best_accuracy}
        
        return X_train, X_test, y_train, y_test
    
    # Model Comparison
    def compare_models(self):
        print("\n=== Model Comparison ===")
        
        # Vergleichstabelle
        comparison_data = []
        
        for model_name, results in self.results.items():
            if 'test_accuracy' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Test Accuracy': f"{results['test_accuracy']:.3f}"
                })
            elif 'test_mse' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Test MSE': f"{results['test_mse']:.2f}",
                    'R² Score': f"{results['r2_score']:.3f}"
                })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Demo ausführen
def supervised_learning_demo():
    demo = SupervisedLearningDemo()
    
    # Lineare Regression
    X_lr_train, X_lr_test, y_lr_train, y_lr_test, y_lr_pred = demo.linear_regression_demo()
    
    # Logistische Regression
    X_log_train, X_log_test, y_log_train, y_log_test, y_log_pred = demo.logistic_regression_demo()
    
    # Decision Tree
    X_dt_train, X_dt_test, y_dt_train, y_dt_test = demo.decision_tree_demo()
    
    # Random Forest
    X_rf_train, X_rf_test, y_rf_train, y_rf_test = demo.random_forest_demo()
    
    # Modelle vergleichen
    comparison = demo.compare_models()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = supervised_learning_demo()

2. Unsupervised Learning mit Python

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs, make_moons, load_iris

# Unsupervised Learning Demo
class UnsupervisedLearningDemo:
    
    def __init__(self):
        self.models = {}
        self.results = {}
    
    # K-Means Clustering
    def kmeans_demo(self):
        print("=== K-Means Clustering Demo ===")
        
        # Synthetische Cluster-Daten
        X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.8, random_state=42)
        
        # Features skalieren
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # K-Means mit verschiedenen Cluster-Anzahlen
        cluster_range = range(2, 8)
        silhouette_scores = []
        inertias = []
        
        for k in cluster_range:
            kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
            cluster_labels = kmeans.fit_predict(X_scaled)
            
            # Silhouette Score
            silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            silhouette_scores.append(silhouette_avg)
            
            # Inertia (Within-cluster sum of squares)
            inertias.append(kmeans.inertia_)
            
            print(f"K={k}: Silhouette Score={silhouette_avg:.3f}, Inertia={inertia_avg:.1f}")
        
        # Optimalen K-Wert basierend auf Silhouette Score
        optimal_k = cluster_range[np.argmax(silhouette_scores)]
        print(f"\nOptimal K based on Silhouette: {optimal_k}")
        
        # Finale K-Means mit optimalem K
        final_kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
        final_labels = final_kmeans.fit_predict(X_scaled)
        
        # Ergebnisse speichern
        self.models['kmeans'] = final_kmeans
        self.results['kmeans'] = {
            'optimal_k': optimal_k,
            'silhouette_score': max(silhouette_scores),
            'inertia': final_kmeans.inertia_
        }
        
        return X_scaled, final_labels, y_true
    
    # DBSCAN Clustering
    def dbscan_demo(self):
        print("\n=== DBSCAN Clustering Demo ===")
        
        # Nicht-kugelförmige Daten
        X, y_true = make_moons(n_samples=200, noise=0.1, random_state=42)
        
        # Features skalieren
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # DBSCAN mit verschiedenen eps-Werten
        eps_values = [0.2, 0.3, 0.4, 0.5]
        min_samples = 5
        
        for eps in eps_values:
            dbscan = DBSCAN(eps=eps, min_samples=min_samples)
            cluster_labels = dbscan.fit_predict(X_scaled)
            
            # Anzahl der Cluster (ignoriert Rauschen)
            n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
            n_noise = list(cluster_labels).count(-1)
            
            if n_clusters > 1:
                silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            else:
                silhouette_avg = -1
            
            print(f"eps={eps}: Clusters={n_clusters}, Noise={n_noise}, Silhouette={silhouette_avg:.3f}")
        
        # Bestes DBSCAN
        best_dbscan = DBSCAN(eps=0.3, min_samples=min_samples)
        best_labels = best_dbscan.fit_predict(X_scaled)
        
        self.models['dbscan'] = best_dbscan
        self.results['dbscan'] = {
            'n_clusters': len(set(best_labels)) - (1 if -1 in best_labels else 0),
            'n_noise': list(best_labels).count(-1)
        }
        
        return X_scaled, best_labels, y_true
    
    # Hierarchical Clustering
    def hierarchical_clustering_demo(self):
        print("\n=== Hierarchical Clustering Demo ===")
        
        # Iris Dataset
        iris = load_iris()
        X = iris.data
        y_true = iris.target
        
        # Features skalieren
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # Agglomerative Clustering mit verschiedenen Linkage-Methoden
        linkage_methods = ['ward', 'complete', 'average', 'single']
        
        for linkage in linkage_methods:
            clustering = AgglomerativeClustering(n_clusters=3, linkage=linkage)
            cluster_labels = clustering.fit_predict(X_scaled)
            
            silhouette_avg = silhouette_score(X_scaled, cluster_labels)
            
            print(f"Linkage={linkage}: Silhouette Score={silhouette_avg:.3f}")
        
        # Bestes Linkage
        best_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')
        best_labels = best_clustering.fit_predict(X_scaled)
        
        self.models['hierarchical'] = best_clustering
        self.results['hierarchical'] = {
            'silhouette_score': silhouette_score(X_scaled, best_labels)
        }
        
        return X_scaled, best_labels, y_true
    
    # PCA (Principal Component Analysis)
    def pca_demo(self):
        print("\n=== PCA Demo ===")
        
        # Hochdimensionale Daten
        np.random.seed(42)
        X = np.random.randn(100, 10)
        
        # Korrelationen erzeugen
        X[:, 1] = X[:, 0] * 0.8 + np.random.randn(100) * 0.2
        X[:, 2] = X[:, 0] * 0.6 + np.random.randn(100) * 0.4
        X[:, 3] = X[:, 1] * 0.7 + np.random.randn(100) * 0.3
        
        # Features skalieren
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # PCA mit verschiedenen Komponentenanzahlen
        n_components_range = range(2, 11)
        explained_variances = []
        
        for n in n_components_range:
            pca = PCA(n_components=n)
            X_pca = pca.fit_transform(X_scaled)
            
            total_explained_variance = np.sum(pca.explained_variance_ratio_)
            explained_variances.append(total_explained_variance)
            
            print(f"Components={n}: Explained Variance={total_explained_variance:.3f}")
        
        # Optimal Anzahl basierend auf 95% Varianz
        optimal_components = next(n for n, var in zip(n_components_range, explained_variances) 
                               if var >= 0.95)
        print(f"\nOptimal components for 95% variance: {optimal_components}")
        
        # Finale PCA
        final_pca = PCA(n_components=optimal_components)
        X_pca_final = final_pca.fit_transform(X_scaled)
        
        # Feature Contributions
        print("Top contributing features for first component:")
        feature_contributions = np.abs(final_pca.components_[0])
        top_features = np.argsort(feature_contributions)[-3:][::-1]
        
        for i, feature_idx in enumerate(top_features):
            print(f"  Feature {feature_idx}: {feature_contributions[feature_idx]:.3f}")
        
        self.models['pca'] = final_pca
        self.results['pca'] = {
            'optimal_components': optimal_components,
            'explained_variance': np.sum(final_pca.explained_variance_ratio_)
        }
        
        return X_scaled, X_pca_final
    
    # t-SNE für Visualisierung
    def tsne_demo(self):
        print("\n=== t-SNE Demo ===")
        
        # Iris Dataset für Visualisierung
        iris = load_iris()
        X = iris.data
        y = iris.target
        
        # Features skalieren
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)
        
        # t-SNE mit verschiedenen Perplexity-Werten
        perplexity_values = [5, 15, 30, 50]
        
        for perplexity in perplexity_values:
            tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
            X_tsne = tsne.fit_transform(X_scaled)
            
            print(f"Perplexity={perplexity}: K-Loss={tsne.kl_divergence_:.3f}")
        
        # Bestes t-SNE
        best_tsne = TSNE(n_components=2, perplexity=30, random_state=42)
        X_tsne_final = best_tsne.fit_transform(X_scaled)
        
        self.models['tsne'] = best_tsne
        
        return X_scaled, X_tsne_final, y
    
    # Clustering Evaluation
    def evaluate_clustering(self, X, labels, true_labels=None):
        print("\n=== Clustering Evaluation ===")
        
        # Silhouette Score
        if len(set(labels)) > 1:
            silhouette_avg = silhouette_score(X, labels)
            print(f"Silhouette Score: {silhouette_avg:.3f}")
        else:
            print("Silhouette Score: N/A (only one cluster)")
        
        # Cluster-Statistiken
        n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
        n_noise = list(labels).count(-1)
        
        print(f"Number of clusters: {n_clusters}")
        print(f"Number of noise points: {n_noise}")
        
        # Cluster-Größen
        if n_clusters > 0:
            cluster_sizes = [np.sum(labels == i) for i in range(n_clusters)]
            print(f"Cluster sizes: {cluster_sizes}")
            print(f"Average cluster size: {np.mean(cluster_sizes):.1f}")
        
        return {
            'silhouette_score': silhouette_avg if len(set(labels)) > 1 else None,
            'n_clusters': n_clusters,
            'n_noise': n_noise
        }
    
    # Model Comparison
    def compare_clustering_models(self):
        print("\n=== Clustering Models Comparison ===")
        
        comparison_data = []
        
        for model_name, results in self.results.items():
            if 'silhouette_score' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Silhouette Score': f"{results['silhouette_score']:.3f}"
                })
            elif 'optimal_k' in results:
                comparison_data.append({
                    'Model': model_name,
                    'Optimal K': results['optimal_k'],
                    'Silhouette Score': f"{results['silhouette_score']:.3f}"
                })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Demo ausführen
def unsupervised_learning_demo():
    demo = UnsupervisedLearningDemo()
    
    # K-Means
    X_km, labels_km, true_km = demo.kmeans_demo()
    demo.evaluate_clustering(X_km, labels_km, true_km)
    
    # DBSCAN
    X_db, labels_db, true_db = demo.dbscan_demo()
    demo.evaluate_clustering(X_db, labels_db, true_db)
    
    # Hierarchical Clustering
    X_hc, labels_hc, true_hc = demo.hierarchical_clustering_demo()
    demo.evaluate_clustering(X_hc, labels_hc, true_hc)
    
    # PCA
    X_pca, X_pca_transformed = demo.pca_demo()
    
    # t-SNE
    X_tsne, X_tsne_transformed, y_tsne = demo.tsne_demo()
    
    # Modelle vergleichen
    comparison = demo.compare_clustering_models()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = unsupervised_learning_demo()

3. Reinforcement Learning mit Python

import numpy as np
import random
import matplotlib.pyplot as plt
from collections import defaultdict

# Reinforcement Learning Demo
class ReinforcementLearningDemo:
    
    def __init__(self):
        self.environments = {}
        self.agents = {}
        self.results = {}
    
    # Grid World Environment
    class GridWorld:
        def __init__(self, width=4, height=4):
            self.width = width
            self.height = height
            self.state = (0, 0)  # Startposition
            self.goal = (width-1, height-1)  # Zielposition
            self.obstacles = [(1, 1), (2, 2)]  # Hindernisse
            self.terminal_states = [self.goal]
            
        def reset(self):
            self.state = (0, 0)
            return self.state
        
        def step(self, action):
            x, y = self.state
            
            # Aktionen ausführen
            if action == 0:  # Hoch
                new_state = (x, max(0, y - 1))
            elif action == 1:  # Runter
                new_state = (x, min(self.height - 1, y + 1))
            elif action == 2:  # Links
                new_state = (max(0, x - 1), y)
            elif action == 3:  # Rechts
                new_state = (min(self.width - 1, x + 1), y)
            else:
                new_state = self.state
            
            # Hindernisse prüfen
            if new_state in self.obstacles:
                new_state = self.state
            
            # Belohnung berechnen
            if new_state == self.goal:
                reward = 10
                done = True
            else:
                reward = -1  # Kleine Strafe für jeden Schritt
                done = False
            
            self.state = new_state
            return new_state, reward, done
        
        def get_valid_actions(self):
            return [0, 1, 2, 3]  # Hoch, Runter, Links, Rechts
        
        def render(self):
            grid = np.zeros((self.height, self.width))
            
            # Hindernisse markieren
            for obs in self.obstacles:
                grid[obs[1], obs[0]] = -1
            
            # Ziel markieren
            grid[self.goal[1], self.goal[0]] = 10
            
            # Aktuelle Position markieren
            grid[self.state[1], self.state[0]] = 1
            
            print("Grid World:")
            print(grid)
            print(f"Position: {self.state}, Goal: {self.goal}")
    
    # Q-Learning Agent
    class QLearningAgent:
        def __init__(self, state_space_size, action_space_size, learning_rate=0.1, 
                     discount_factor=0.9, epsilon=0.1):
            self.state_space_size = state_space_size
            self.action_space_size = action_space_size
            self.learning_rate = learning_rate
            self.discount_factor = discount_factor
            self.epsilon = epsilon
            
            # Q-Tabelle initialisieren
            self.q_table = defaultdict(lambda: np.zeros(action_space_size))
            
        def get_state_index(self, state):
            # 2D-Koordinaten in 1D-Index umwandeln
            x, y = state
            return y * 4 + x
        
        def choose_action(self, state, valid_actions):
            state_idx = self.get_state_index(state)
            
            # Epsilon-Greedy Strategie
            if random.random() < self.epsilon:
                return random.choice(valid_actions)
            else:
                q_values = self.q_table[state_idx]
                valid_q_values = [q_values[action] for action in valid_actions]
                max_q = max(valid_q_values)
                # Bei gleichen Q-Werten zufällig auswählen
                best_actions = [action for action in valid_actions 
                              if q_values[action] == max_q]
                return random.choice(best_actions)
        
        def update_q_value(self, state, action, reward, next_state, valid_next_actions):
            state_idx = self.get_state_index(state)
            next_state_idx = self.get_state_index(next_state)
            
            # Q-Wert aktualisieren
            current_q = self.q_table[state_idx][action]
            
            if len(valid_next_actions) > 0:
                max_next_q = max([self.q_table[next_state_idx][a] for a in valid_next_actions])
            else:
                max_next_q = 0
            
            new_q = current_q + self.learning_rate * (
                reward + self.discount_factor * max_next_q - current_q
            )
            
            self.q_table[state_idx][action] = new_q
        
        def get_policy(self):
            policy = {}
            for state_idx in self.q_table.keys():
                y = state_idx // 4
                x = state_idx % 4
                state = (x, y)
                
                valid_actions = [0, 1, 2, 3]  # Alle Aktionen sind gültig
                q_values = self.q_table[state_idx]
                best_action = np.argmax(q_values)
                
                policy[state] = best_action
            
            return policy
    
    # Q-Learning Demo
    def q_learning_demo(self):
        print("=== Q-Learning Demo ===")
        
        # Environment und Agent erstellen
        env = self.GridWorld(width=4, height=4)
        agent = self.QLearningAgent(state_space_size=16, action_space_size=4)
        
        # Training-Parameter
        episodes = 1000
        max_steps_per_episode = 100
        
        # Training
        episode_rewards = []
        
        for episode in range(episodes):
            state = env.reset()
            total_reward = 0
            done = False
            steps = 0
            
            while not done and steps < max_steps_per_episode:
                valid_actions = env.get_valid_actions()
                action = agent.choose_action(state, valid_actions)
                
                next_state, reward, done = env.step(action)
                valid_next_actions = env.get_valid_actions()
                
                # Q-Wert aktualisieren
                agent.update_q_value(state, action, reward, next_state, valid_next_actions)
                
                state = next_state
                total_reward += reward
                steps += 1
            
            episode_rewards.append(total_reward)
            
            if episode % 100 == 0:
                avg_reward = np.mean(episode_rewards[-100:])
                print(f"Episode {episode}: Average Reward (last 100): {avg_reward:.2f}")
        
        # Ergebnisse analysieren
        final_policy = agent.get_policy()
        
        print(f"\nFinal Policy:")
        for state, action in final_policy.items():
            action_names = {0: 'Hoch', 1: 'Runter', 2: 'Links', 3: 'Rechts'}
            print(f"State {state}: {action_names[action]}")
        
        # Q-Tabelle anzeigen
        print(f"\nQ-Table (selected states):")
        for state_idx in [0, 5, 10, 15]:  # Eckpunkte
            y = state_idx // 4
            x = state_idx % 4
            state = (x, y)
            q_values = agent.q_table[state_idx]
            print(f"State {state}: {q_values}")
        
        self.environments['gridworld'] = env
        self.agents['qlearning'] = agent
        self.results['qlearning'] = {
            'episodes': episodes,
            'final_avg_reward': np.mean(episode_rewards[-100:]),
            'q_table_size': len(agent.q_table)
        }
        
        return episode_rewards
    
    # Einfaches CartPole-like Environment
    class CartPoleSimple:
        def __init__(self):
            self.angle = 0  # Winkel des Poles
            self.angular_velocity = 0  # Winkelgeschwindigkeit
            self.gravity = 9.8
            self.pole_length = 1.0
            self.dt = 0.1
            
        def reset(self):
            self.angle = random.uniform(-0.1, 0.1)
            self.angular_velocity = 0
            return self.get_state()
        
        def get_state(self):
            return (self.angle, self.angular_velocity)
        
        def step(self, action):
            # Aktionen: 0 = Links, 1 = Rechts
            force = -10 if action == 0 else 10
            
            # Physik-Update (vereinfacht)
            angular_acceleration = (self.gravity / self.pole_length) * np.sin(self.angle) + force
            
            self.angular_velocity += angular_acceleration * self.dt
            self.angle += self.angular_velocity * self.dt
            
            # Belohnung und Done-Bedingung
            if abs(self.angle) > np.pi / 4:  # Pole fällt um
                reward = -10
                done = True
            else:
                reward = 1  # Belohnung für Balance
                done = False
            
            return self.get_state(), reward, done
        
        def render(self):
            print(f"Angle: {self.angle:.3f} rad ({np.degrees(self.angle):.1f}°), "
                  f"Angular Velocity: {self.angular_velocity:.3f}")
    
    # Policy Gradient Agent (vereinfacht)
    class PolicyGradientAgent:
        def __init__(self, state_dim=2, action_dim=2, learning_rate=0.01):
            self.state_dim = state_dim
            self.action_dim = action_dim
            self.learning_rate = learning_rate
            
            # Einfache lineare Policy
            self.weights = np.random.randn(state_dim, action_dim) * 0.1
            
        def get_action_probabilities(self, state):
            # Softmax über lineare Kombination
            logits = np.dot(state, self.weights)
            exp_logits = np.exp(logits - np.max(logits))
            return exp_logits / np.sum(exp_logits)
        
        def choose_action(self, state):
            action_probs = self.get_action_probabilities(state)
            return np.random.choice(self.action_dim, p=action_probs)
        
        def update_policy(self, states, actions, rewards):
            # Vereinfachte Policy Gradient Update
            for state, action, reward in zip(states, actions, rewards):
                action_probs = self.get_action_probabilities(state)
                
                # Gradient berechnen
                grad = np.zeros_like(self.weights)
                for a in range(self.action_dim):
                    if a == action:
                        grad[:, a] = state * (1 - action_probs[a])
                    else:
                        grad[:, a] = -state * action_probs[a]
                
                # Update
                self.weights += self.learning_rate * reward * grad
    
    # Policy Gradient Demo
    def policy_gradient_demo(self):
        print("\n=== Policy Gradient Demo ===")
        
        env = self.CartPoleSimple()
        agent = self.PolicyGradientAgent()
        
        episodes = 500
        episode_rewards = []
        
        for episode in range(episodes):
            state = env.reset()
            states, actions, rewards = [], [], []
            total_reward = 0
            done = False
            steps = 0
            max_steps = 100
            
            while not done and steps < max_steps:
                action = agent.choose_action(state)
                next_state, reward, done = env.step(action)
                
                states.append(state)
                actions.append(action)
                rewards.append(reward)
                
                state = next_state
                total_reward += reward
                steps += 1
            
            # Policy Update
            agent.update_policy(states, actions, rewards)
            episode_rewards.append(total_reward)
            
            if episode % 50 == 0:
                avg_reward = np.mean(episode_rewards[-50:])
                print(f"Episode {episode}: Average Reward (last 50): {avg_reward:.2f}")
        
        # Finale Evaluation
        print(f"\nFinal Evaluation:")
        state = env.reset()
        for step in range(20):
            action_probs = agent.get_action_probabilities(state)
            action = np.argmax(action_probs)
            state, reward, done = env.step(action)
            env.render()
            
            if done:
                print("Episode finished!")
                break
        
        self.environments['cartpole'] = env
        self.agents['policy_gradient'] = agent
        self.results['policy_gradient'] = {
            'episodes': episodes,
            'final_avg_reward': np.mean(episode_rewards[-50:])
        }
        
        return episode_rewards
    
    # Model Comparison
    def compare_rl_models(self):
        print("\n=== Reinforcement Learning Models Comparison ===")
        
        comparison_data = []
        
        for model_name, results in self.results.items():
            comparison_data.append({
                'Model': model_name,
                'Episodes': results['episodes'],
                'Final Avg Reward': f"{results['final_avg_reward']:.2f}"
            })
        
        df = pd.DataFrame(comparison_data)
        print(df.to_string(index=False))
        
        return df

# Demo ausführen
def reinforcement_learning_demo():
    demo = ReinforcementLearningDemo()
    
    # Q-Learning
    q_rewards = demo.q_learning_demo()
    
    # Policy Gradient
    pg_rewards = demo.policy_gradient_demo()
    
    # Modelle vergleichen
    comparison = demo.compare_rl_models()
    
    # Rewards visualisieren
    plt.figure(figsize=(12, 4))
    
    plt.subplot(1, 2, 1)
    plt.plot(q_rewards)
    plt.title('Q-Learning Rewards')
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    
    plt.subplot(1, 2, 2)
    plt.plot(pg_rewards)
    plt.title('Policy Gradient Rewards')
    plt.xlabel('Episode')
    plt.ylabel('Total Reward')
    
    plt.tight_layout()
    plt.show()
    
    return demo, comparison

if __name__ == "__main__":
    demo, comparison = reinforcement_learning_demo()

Machine Learning Typen Übersicht

TypDatenZielBeispieleAlgorithmen
SupervisedGelabeltVorhersageKlassifikation, RegressionLinear Regression, Decision Trees
UnsupervisedUngelabeltMuster findenClustering, Dimensionality ReductionK-Means, PCA
ReinforcementUmgebungMaximale BelohnungSpiel-Playing, RobotikQ-Learning, Policy Gradients

Algorithmen Vergleich

Supervised Learning

AlgorithmusTypKomplexitätVorteileNachteile
Linear RegressionRegressionO(n)InterpretierbarNur lineare Beziehungen
Logistic RegressionKlassifikationO(n)Schnell, interpretierbarLinearität
Decision TreesBeidesO(n log n)InterpretierbarOverfitting
Random ForestBeidesO(n log n)Robust, genauKomplex
SVMBeidesO(n²)Hohe GenauigkeitSkaliert schlecht

Unsupervised Learning

AlgorithmusTypKomplexitätVorteileNachteile
K-MeansClusteringO(n k i)SchnellNur kugelförmige Cluster
DBSCANClusteringO(n log n)Beliebige FormenParameterempfindlich
PCADimensionalityO(n d²)Reduziert DimensionLinearität
t-SNEVisualizationO(n²)Nicht-linearLangsam

Reinforcement Learning

AlgorithmusTypKomplexitätVorteileNachteile
Q-LearningModel-freeO(s a)EinfachDiskrete Räume
Deep Q-NetworkModel-freeO(n)KontinuierlichInstabil
Policy GradientsModel-freeO(n)StochastischHohe Varianz

ML-Workflow

1. Datensammlung

# Datenquellen identifizieren
# Qualität sicherstellen
# Ethik und Datenschutz beachten

2. Datenvorbereitung

# Cleaning: Fehlende Werte behandeln
# Feature Engineering: Neue Merkmale erstellen
# Scaling: Normalisierung/Standardisierung
# Splitting: Train/Validierung/Test

3. Modellwahl

# Problemtyp identifizieren
# Baseline-Modell erstellen
# Mehrere Algorithmen testen
# Hyperparameter optimieren

4. Training

# Cross-Validation verwenden
# Overfitting vermeiden
# Early Stopping implementieren
# Metriken überwachen

5. Evaluation

# Performance messen
# Fehler analysieren
# Robustheit testen
# Business-Value bewerten

Evaluation Metriken

Klassifikation

  • Accuracy: Korrekte Vorhersagen / Gesamt
  • Precision: True Positives / (TP + FP)
  • Recall: True Positives / (TP + FN)
  • F1-Score: Harmonisches Mittel von Precision und Recall
  • ROC-AUC: Area Under ROC Curve

Regression

  • MSE: Mean Squared Error
  • RMSE: Root Mean Squared Error
  • MAE: Mean Absolute Error
  • : Bestimmtheitsmaß

Clustering

  • Silhouette Score: Cluster-Qualität
  • Davies-Bouldin Index: Cluster-Trennung
  • Calinski-Harabasz: Cluster-Verhältnis

Overfitting vs Underfitting

Overfitting

  • Symptome: Hohe Trainingsgenauigkeit, niedrige Testgenauigkeit
  • Ursachen: Zu komplexes Modell, zu wenig Daten
  • Lösungen: Regularisierung, mehr Daten, einfacheres Modell

Underfitting

  • Symptome: Niedrige Genauigkeit auf beiden Datensätzen
  • Ursachen: Zu einfaches Modell, zu wenig Features
  • Lösungen: Komplexeres Modell, Feature Engineering

Feature Engineering

Techniken

# Polynomial Features
# Interaction Terms
# Binning/Discretization
# Log-Transformation
# One-Hot Encoding
# Target Encoding
# Feature Selection

Automatisierung

# AutoML Tools
# Feature Importance Analysis
# Recursive Feature Elimination
# Genetic Algorithms

Vorteile und Nachteile

Vorteile von Machine Learning

  • Automatisierung: Manuelle Arbeit reduzieren
  • Mustererkennung: Komplexe Zusammenhänge finden
  • Skalierbarkeit: Große Datenmengen verarbeiten
  • Adaptivität: An neue Daten anpassen

Nachteile

  • Datenabhängigkeit: Qualität der Ergebnisse hängt von Daten ab
  • Komplexität: Black-Box-Problem
  • Rechenkosten: Training kann teuer sein
  • Ethik: Bias und Fairness beachten

Häufige Prüfungsfragen

  1. Was ist der Unterschied zwischen Supervised und Unsupervised Learning? Supervised Learning verwendet gelabelte Daten für Vorhersagen, Unsupervised Learning findet Muster in ungelabelten Daten.

  2. Erklären Sie Overfitting und wie man es vermeidet! Overfitting ist zu starke Anpassung an Trainingsdaten. Vermeidung durch Regularisierung, mehr Daten, Cross-Validation.

  3. Wann verwendet man Reinforcement Learning? Wenn ein Agent durch Interaktion mit einer Umgebung lernen soll, maximale Belohnung zu erreichen.

  4. Was ist der Unterschied zwischen Classification und Regression? Classification sagt diskrete Klassen voraus, Regression sagt kontinuierliche Werte voraus.

Wichtigste Quellen

  1. https://scikit-learn.org/stable/
  2. https://www.coursera.org/learn/machine-learning
  3. https://www.deeplearning.ai/
Zurück zum Blog
Share:

Ähnliche Beiträge