Machine Learning Grundlagen: Supervised, Unsupervised & Reinforcement Learning mit Algorithmen
Dieser Beitrag ist eine umfassende Einführung in die Machine Learning Grundlagen – inklusive Supervised, Unsupervised und Reinforcement Learning mit Algorithmen und praktischen Beispielen.
In a Nutshell
Machine Learning ermöglicht Computern das Lernen aus Daten. Supervised Learning lernt mit gelabelten Daten, Unsupervised Learning findet Muster in ungelabelten Daten, Reinforcement Learning lernt durch Belohnung.
Kompakte Fachbeschreibung
Machine Learning ist ein Teilgebiet der künstlichen Intelligenz, bei dem Algorithmen aus Daten lernen, ohne explizit programmiert zu werden.
Lernkategorien:
Supervised Learning
- Konzept: Lernen mit gelabelten Trainingsdaten
- Ziel: Vorhersagen für neue, ungesehene Daten
- Typen: Klassifikation (diskrete Werte), Regression (kontinuierliche Werte)
- Algorithmen: Linear Regression, Decision Trees, Random Forest, SVM, Neural Networks
Unsupervised Learning
- Konzept: Lernen ohne gelabelte Daten
- Ziel: Strukturen und Muster in Daten entdecken
- Typen: Clustering, Dimensionality Reduction, Association
- Algorithmen: K-Means, Hierarchical Clustering, PCA, Apriori
Reinforcement Learning
- Konzept: Lernen durch Interaktion mit Umgebung
- Ziel: Maximierung kumulativer Belohnung
- Typen: Model-based, Model-free, Multi-agent
- Algorithmen: Q-Learning, Deep Q-Networks, Policy Gradients
Prüfungsrelevante Stichpunkte
- Machine Learning: Automatisches Lernen aus Daten
- Supervised Learning: Lernen mit gelabelten Daten (Klassifikation, Regression)
- Unsupervised Learning: Lernen ohne Labels (Clustering, Mustererkennung)
- Reinforcement Learning: Lernen durch Belohnung (Agent, Umgebung, Aktionen)
- Training/Testing: Datenaufteilung für Modellvalidierung
- Overfitting/Underfitting: Anpassungsprobleme von Modellen
- Feature Engineering: Datenvorbereitung und Transformation
- IHK-relevant: Moderne KI-Technologien und Anwendungen
Kernkomponenten
- Daten: Trainings-, Validierungs-, Test-Daten
- Features: Eingabevariablen und Merkmale
- Modelle: Mathematische Funktionen und Algorithmen
- Training: Anpassung der Modellparameter
- Evaluation: Leistungsmessung und Validierung
- Prediction: Vorhersagen für neue Daten
- Optimization: Hyperparameter-Tuning
- Deployment: Integration in Produktionssysteme
Praxisbeispiele
1. Supervised Learning mit Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression, LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, classification_report
from sklearn.preprocessing import StandardScaler
from sklearn.datasets import make_classification, make_regression
# Supervised Learning Demo
class SupervisedLearningDemo:
def __init__(self):
self.models = {}
self.results = {}
# Lineare Regression
def linear_regression_demo(self):
print("=== Lineare Regression Demo ===")
# Synthetische Daten erstellen
np.random.seed(42)
X, y = make_regression(n_samples=100, n_features=1, noise=10, random_state=42)
# Daten aufteilen
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Modell trainieren
model = LinearRegression()
model.fit(X_train, y_train)
# Vorhersagen
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Evaluation
train_mse = mean_squared_error(y_train, y_train_pred)
test_mse = mean_squared_error(y_test, y_test_pred)
print(f"Training MSE: {train_mse:.2f}")
print(f"Test MSE: {test_mse:.2f}")
print(f"Koeffizient: {model.coef_[0]:.2f}")
print(f"Intercept: {model.intercept_:.2f}")
# Ergebnisse speichern
self.models['linear_regression'] = model
self.results['linear_regression'] = {
'train_mse': train_mse,
'test_mse': test_mse,
'r2_score': model.score(X_test, y_test)
}
return X_train, X_test, y_train, y_test, y_test_pred
# Logistische Regression (Klassifikation)
def logistic_regression_demo(self):
print("\n=== Logistische Regression Demo ===")
# Klassifikationsdaten erstellen
X, y = make_classification(n_samples=200, n_features=2, n_redundant=0,
n_informative=2, random_state=42, n_clusters_per_class=1)
# Daten aufteilen
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Features skalieren
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Modell trainieren
model = LogisticRegression(random_state=42)
model.fit(X_train_scaled, y_train)
# Vorhersagen
y_train_pred = model.predict(X_train_scaled)
y_test_pred = model.predict(X_test_scaled)
# Evaluation
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Training Accuracy: {train_accuracy:.3f}")
print(f"Test Accuracy: {test_accuracy:.3f}")
print("Test Classification Report:")
print(classification_report(y_test, y_test_pred))
# Ergebnisse speichern
self.models['logistic_regression'] = model
self.results['logistic_regression'] = {
'train_accuracy': train_accuracy,
'test_accuracy': test_accuracy
}
return X_train_scaled, X_test_scaled, y_train, y_test, y_test_pred
# Decision Tree Classifier
def decision_tree_demo(self):
print("\n=== Decision Tree Demo ===")
# Komplexere Klassifikationsdaten
X, y = make_classification(n_samples=300, n_features=4, n_redundant=1,
n_informative=3, random_state=42, n_clusters_per_class=2)
# Daten aufteilen
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Decision Tree mit verschiedenen Tiefen
depths = [3, 5, 10, None]
for depth in depths:
model = DecisionTreeClassifier(max_depth=depth, random_state=42)
model.fit(X_train, y_train)
# Vorhersagen
y_train_pred = model.predict(X_train)
y_test_pred = model.predict(X_test)
# Evaluation
train_accuracy = accuracy_score(y_train, y_train_pred)
test_accuracy = accuracy_score(y_test, y_test_pred)
print(f"Max Depth {depth if depth else 'None'}:")
print(f" Training Accuracy: {train_accuracy:.3f}")
print(f" Test Accuracy: {test_accuracy:.3f}")
# Overfitting erkennen
overfitting = train_accuracy - test_accuracy
if overfitting > 0.1:
print(f" ⚠️ Overfitting detected (diff: {overfitting:.3f})")
# Bestes Modell speichern
best_model = DecisionTreeClassifier(max_depth=5, random_state=42)
best_model.fit(X_train, y_train)
self.models['decision_tree'] = best_model
return X_train, X_test, y_train, y_test
# Random Forest
def random_forest_demo(self):
print("\n=== Random Forest Demo ===")
# Hochdimensionale Daten
X, y = make_classification(n_samples=500, n_features=10, n_redundant=3,
n_informative=7, random_state=42, n_clusters_per_class=2)
# Daten aufteilen
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Random Forest mit verschiedenen Baumanzahlen
n_estimators_list = [10, 50, 100, 200]
best_accuracy = 0
best_model = None
for n_estimators in n_estimators_list:
model = RandomForestClassifier(n_estimators=n_estimators, random_state=42)
model.fit(X_train, y_train)
# Vorhersagen
y_test_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_test_pred)
print(f"Trees: {n_estimators}, Test Accuracy: {accuracy:.3f}")
if accuracy > best_accuracy:
best_accuracy = accuracy
best_model = model
print(f"\nBest Random Forest Accuracy: {best_accuracy:.3f}")
# Feature Importance
feature_importance = best_model.feature_importances_
print("Top 5 Feature Importances:")
for i, importance in sorted(enumerate(feature_importance), key=lambda x: x[1], reverse=True)[:5]:
print(f" Feature {i}: {importance:.3f}")
self.models['random_forest'] = best_model
self.results['random_forest'] = {'test_accuracy': best_accuracy}
return X_train, X_test, y_train, y_test
# Model Comparison
def compare_models(self):
print("\n=== Model Comparison ===")
# Vergleichstabelle
comparison_data = []
for model_name, results in self.results.items():
if 'test_accuracy' in results:
comparison_data.append({
'Model': model_name,
'Test Accuracy': f"{results['test_accuracy']:.3f}"
})
elif 'test_mse' in results:
comparison_data.append({
'Model': model_name,
'Test MSE': f"{results['test_mse']:.2f}",
'R² Score': f"{results['r2_score']:.3f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Demo ausführen
def supervised_learning_demo():
demo = SupervisedLearningDemo()
# Lineare Regression
X_lr_train, X_lr_test, y_lr_train, y_lr_test, y_lr_pred = demo.linear_regression_demo()
# Logistische Regression
X_log_train, X_log_test, y_log_train, y_log_test, y_log_pred = demo.logistic_regression_demo()
# Decision Tree
X_dt_train, X_dt_test, y_dt_train, y_dt_test = demo.decision_tree_demo()
# Random Forest
X_rf_train, X_rf_test, y_rf_train, y_rf_test = demo.random_forest_demo()
# Modelle vergleichen
comparison = demo.compare_models()
return demo, comparison
if __name__ == "__main__":
demo, comparison = supervised_learning_demo()
2. Unsupervised Learning mit Python
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, DBSCAN, AgglomerativeClustering
from sklearn.decomposition import PCA, TSNE
from sklearn.preprocessing import StandardScaler
from sklearn.metrics import silhouette_score
from sklearn.datasets import make_blobs, make_moons, load_iris
# Unsupervised Learning Demo
class UnsupervisedLearningDemo:
def __init__(self):
self.models = {}
self.results = {}
# K-Means Clustering
def kmeans_demo(self):
print("=== K-Means Clustering Demo ===")
# Synthetische Cluster-Daten
X, y_true = make_blobs(n_samples=300, centers=4, cluster_std=0.8, random_state=42)
# Features skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# K-Means mit verschiedenen Cluster-Anzahlen
cluster_range = range(2, 8)
silhouette_scores = []
inertias = []
for k in cluster_range:
kmeans = KMeans(n_clusters=k, random_state=42, n_init=10)
cluster_labels = kmeans.fit_predict(X_scaled)
# Silhouette Score
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
silhouette_scores.append(silhouette_avg)
# Inertia (Within-cluster sum of squares)
inertias.append(kmeans.inertia_)
print(f"K={k}: Silhouette Score={silhouette_avg:.3f}, Inertia={inertia_avg:.1f}")
# Optimalen K-Wert basierend auf Silhouette Score
optimal_k = cluster_range[np.argmax(silhouette_scores)]
print(f"\nOptimal K based on Silhouette: {optimal_k}")
# Finale K-Means mit optimalem K
final_kmeans = KMeans(n_clusters=optimal_k, random_state=42, n_init=10)
final_labels = final_kmeans.fit_predict(X_scaled)
# Ergebnisse speichern
self.models['kmeans'] = final_kmeans
self.results['kmeans'] = {
'optimal_k': optimal_k,
'silhouette_score': max(silhouette_scores),
'inertia': final_kmeans.inertia_
}
return X_scaled, final_labels, y_true
# DBSCAN Clustering
def dbscan_demo(self):
print("\n=== DBSCAN Clustering Demo ===")
# Nicht-kugelförmige Daten
X, y_true = make_moons(n_samples=200, noise=0.1, random_state=42)
# Features skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# DBSCAN mit verschiedenen eps-Werten
eps_values = [0.2, 0.3, 0.4, 0.5]
min_samples = 5
for eps in eps_values:
dbscan = DBSCAN(eps=eps, min_samples=min_samples)
cluster_labels = dbscan.fit_predict(X_scaled)
# Anzahl der Cluster (ignoriert Rauschen)
n_clusters = len(set(cluster_labels)) - (1 if -1 in cluster_labels else 0)
n_noise = list(cluster_labels).count(-1)
if n_clusters > 1:
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
else:
silhouette_avg = -1
print(f"eps={eps}: Clusters={n_clusters}, Noise={n_noise}, Silhouette={silhouette_avg:.3f}")
# Bestes DBSCAN
best_dbscan = DBSCAN(eps=0.3, min_samples=min_samples)
best_labels = best_dbscan.fit_predict(X_scaled)
self.models['dbscan'] = best_dbscan
self.results['dbscan'] = {
'n_clusters': len(set(best_labels)) - (1 if -1 in best_labels else 0),
'n_noise': list(best_labels).count(-1)
}
return X_scaled, best_labels, y_true
# Hierarchical Clustering
def hierarchical_clustering_demo(self):
print("\n=== Hierarchical Clustering Demo ===")
# Iris Dataset
iris = load_iris()
X = iris.data
y_true = iris.target
# Features skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# Agglomerative Clustering mit verschiedenen Linkage-Methoden
linkage_methods = ['ward', 'complete', 'average', 'single']
for linkage in linkage_methods:
clustering = AgglomerativeClustering(n_clusters=3, linkage=linkage)
cluster_labels = clustering.fit_predict(X_scaled)
silhouette_avg = silhouette_score(X_scaled, cluster_labels)
print(f"Linkage={linkage}: Silhouette Score={silhouette_avg:.3f}")
# Bestes Linkage
best_clustering = AgglomerativeClustering(n_clusters=3, linkage='ward')
best_labels = best_clustering.fit_predict(X_scaled)
self.models['hierarchical'] = best_clustering
self.results['hierarchical'] = {
'silhouette_score': silhouette_score(X_scaled, best_labels)
}
return X_scaled, best_labels, y_true
# PCA (Principal Component Analysis)
def pca_demo(self):
print("\n=== PCA Demo ===")
# Hochdimensionale Daten
np.random.seed(42)
X = np.random.randn(100, 10)
# Korrelationen erzeugen
X[:, 1] = X[:, 0] * 0.8 + np.random.randn(100) * 0.2
X[:, 2] = X[:, 0] * 0.6 + np.random.randn(100) * 0.4
X[:, 3] = X[:, 1] * 0.7 + np.random.randn(100) * 0.3
# Features skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# PCA mit verschiedenen Komponentenanzahlen
n_components_range = range(2, 11)
explained_variances = []
for n in n_components_range:
pca = PCA(n_components=n)
X_pca = pca.fit_transform(X_scaled)
total_explained_variance = np.sum(pca.explained_variance_ratio_)
explained_variances.append(total_explained_variance)
print(f"Components={n}: Explained Variance={total_explained_variance:.3f}")
# Optimal Anzahl basierend auf 95% Varianz
optimal_components = next(n for n, var in zip(n_components_range, explained_variances)
if var >= 0.95)
print(f"\nOptimal components for 95% variance: {optimal_components}")
# Finale PCA
final_pca = PCA(n_components=optimal_components)
X_pca_final = final_pca.fit_transform(X_scaled)
# Feature Contributions
print("Top contributing features for first component:")
feature_contributions = np.abs(final_pca.components_[0])
top_features = np.argsort(feature_contributions)[-3:][::-1]
for i, feature_idx in enumerate(top_features):
print(f" Feature {feature_idx}: {feature_contributions[feature_idx]:.3f}")
self.models['pca'] = final_pca
self.results['pca'] = {
'optimal_components': optimal_components,
'explained_variance': np.sum(final_pca.explained_variance_ratio_)
}
return X_scaled, X_pca_final
# t-SNE für Visualisierung
def tsne_demo(self):
print("\n=== t-SNE Demo ===")
# Iris Dataset für Visualisierung
iris = load_iris()
X = iris.data
y = iris.target
# Features skalieren
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
# t-SNE mit verschiedenen Perplexity-Werten
perplexity_values = [5, 15, 30, 50]
for perplexity in perplexity_values:
tsne = TSNE(n_components=2, perplexity=perplexity, random_state=42)
X_tsne = tsne.fit_transform(X_scaled)
print(f"Perplexity={perplexity}: K-Loss={tsne.kl_divergence_:.3f}")
# Bestes t-SNE
best_tsne = TSNE(n_components=2, perplexity=30, random_state=42)
X_tsne_final = best_tsne.fit_transform(X_scaled)
self.models['tsne'] = best_tsne
return X_scaled, X_tsne_final, y
# Clustering Evaluation
def evaluate_clustering(self, X, labels, true_labels=None):
print("\n=== Clustering Evaluation ===")
# Silhouette Score
if len(set(labels)) > 1:
silhouette_avg = silhouette_score(X, labels)
print(f"Silhouette Score: {silhouette_avg:.3f}")
else:
print("Silhouette Score: N/A (only one cluster)")
# Cluster-Statistiken
n_clusters = len(set(labels)) - (1 if -1 in labels else 0)
n_noise = list(labels).count(-1)
print(f"Number of clusters: {n_clusters}")
print(f"Number of noise points: {n_noise}")
# Cluster-Größen
if n_clusters > 0:
cluster_sizes = [np.sum(labels == i) for i in range(n_clusters)]
print(f"Cluster sizes: {cluster_sizes}")
print(f"Average cluster size: {np.mean(cluster_sizes):.1f}")
return {
'silhouette_score': silhouette_avg if len(set(labels)) > 1 else None,
'n_clusters': n_clusters,
'n_noise': n_noise
}
# Model Comparison
def compare_clustering_models(self):
print("\n=== Clustering Models Comparison ===")
comparison_data = []
for model_name, results in self.results.items():
if 'silhouette_score' in results:
comparison_data.append({
'Model': model_name,
'Silhouette Score': f"{results['silhouette_score']:.3f}"
})
elif 'optimal_k' in results:
comparison_data.append({
'Model': model_name,
'Optimal K': results['optimal_k'],
'Silhouette Score': f"{results['silhouette_score']:.3f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Demo ausführen
def unsupervised_learning_demo():
demo = UnsupervisedLearningDemo()
# K-Means
X_km, labels_km, true_km = demo.kmeans_demo()
demo.evaluate_clustering(X_km, labels_km, true_km)
# DBSCAN
X_db, labels_db, true_db = demo.dbscan_demo()
demo.evaluate_clustering(X_db, labels_db, true_db)
# Hierarchical Clustering
X_hc, labels_hc, true_hc = demo.hierarchical_clustering_demo()
demo.evaluate_clustering(X_hc, labels_hc, true_hc)
# PCA
X_pca, X_pca_transformed = demo.pca_demo()
# t-SNE
X_tsne, X_tsne_transformed, y_tsne = demo.tsne_demo()
# Modelle vergleichen
comparison = demo.compare_clustering_models()
return demo, comparison
if __name__ == "__main__":
demo, comparison = unsupervised_learning_demo()
3. Reinforcement Learning mit Python
import numpy as np
import random
import matplotlib.pyplot as plt
from collections import defaultdict
# Reinforcement Learning Demo
class ReinforcementLearningDemo:
def __init__(self):
self.environments = {}
self.agents = {}
self.results = {}
# Grid World Environment
class GridWorld:
def __init__(self, width=4, height=4):
self.width = width
self.height = height
self.state = (0, 0) # Startposition
self.goal = (width-1, height-1) # Zielposition
self.obstacles = [(1, 1), (2, 2)] # Hindernisse
self.terminal_states = [self.goal]
def reset(self):
self.state = (0, 0)
return self.state
def step(self, action):
x, y = self.state
# Aktionen ausführen
if action == 0: # Hoch
new_state = (x, max(0, y - 1))
elif action == 1: # Runter
new_state = (x, min(self.height - 1, y + 1))
elif action == 2: # Links
new_state = (max(0, x - 1), y)
elif action == 3: # Rechts
new_state = (min(self.width - 1, x + 1), y)
else:
new_state = self.state
# Hindernisse prüfen
if new_state in self.obstacles:
new_state = self.state
# Belohnung berechnen
if new_state == self.goal:
reward = 10
done = True
else:
reward = -1 # Kleine Strafe für jeden Schritt
done = False
self.state = new_state
return new_state, reward, done
def get_valid_actions(self):
return [0, 1, 2, 3] # Hoch, Runter, Links, Rechts
def render(self):
grid = np.zeros((self.height, self.width))
# Hindernisse markieren
for obs in self.obstacles:
grid[obs[1], obs[0]] = -1
# Ziel markieren
grid[self.goal[1], self.goal[0]] = 10
# Aktuelle Position markieren
grid[self.state[1], self.state[0]] = 1
print("Grid World:")
print(grid)
print(f"Position: {self.state}, Goal: {self.goal}")
# Q-Learning Agent
class QLearningAgent:
def __init__(self, state_space_size, action_space_size, learning_rate=0.1,
discount_factor=0.9, epsilon=0.1):
self.state_space_size = state_space_size
self.action_space_size = action_space_size
self.learning_rate = learning_rate
self.discount_factor = discount_factor
self.epsilon = epsilon
# Q-Tabelle initialisieren
self.q_table = defaultdict(lambda: np.zeros(action_space_size))
def get_state_index(self, state):
# 2D-Koordinaten in 1D-Index umwandeln
x, y = state
return y * 4 + x
def choose_action(self, state, valid_actions):
state_idx = self.get_state_index(state)
# Epsilon-Greedy Strategie
if random.random() < self.epsilon:
return random.choice(valid_actions)
else:
q_values = self.q_table[state_idx]
valid_q_values = [q_values[action] for action in valid_actions]
max_q = max(valid_q_values)
# Bei gleichen Q-Werten zufällig auswählen
best_actions = [action for action in valid_actions
if q_values[action] == max_q]
return random.choice(best_actions)
def update_q_value(self, state, action, reward, next_state, valid_next_actions):
state_idx = self.get_state_index(state)
next_state_idx = self.get_state_index(next_state)
# Q-Wert aktualisieren
current_q = self.q_table[state_idx][action]
if len(valid_next_actions) > 0:
max_next_q = max([self.q_table[next_state_idx][a] for a in valid_next_actions])
else:
max_next_q = 0
new_q = current_q + self.learning_rate * (
reward + self.discount_factor * max_next_q - current_q
)
self.q_table[state_idx][action] = new_q
def get_policy(self):
policy = {}
for state_idx in self.q_table.keys():
y = state_idx // 4
x = state_idx % 4
state = (x, y)
valid_actions = [0, 1, 2, 3] # Alle Aktionen sind gültig
q_values = self.q_table[state_idx]
best_action = np.argmax(q_values)
policy[state] = best_action
return policy
# Q-Learning Demo
def q_learning_demo(self):
print("=== Q-Learning Demo ===")
# Environment und Agent erstellen
env = self.GridWorld(width=4, height=4)
agent = self.QLearningAgent(state_space_size=16, action_space_size=4)
# Training-Parameter
episodes = 1000
max_steps_per_episode = 100
# Training
episode_rewards = []
for episode in range(episodes):
state = env.reset()
total_reward = 0
done = False
steps = 0
while not done and steps < max_steps_per_episode:
valid_actions = env.get_valid_actions()
action = agent.choose_action(state, valid_actions)
next_state, reward, done = env.step(action)
valid_next_actions = env.get_valid_actions()
# Q-Wert aktualisieren
agent.update_q_value(state, action, reward, next_state, valid_next_actions)
state = next_state
total_reward += reward
steps += 1
episode_rewards.append(total_reward)
if episode % 100 == 0:
avg_reward = np.mean(episode_rewards[-100:])
print(f"Episode {episode}: Average Reward (last 100): {avg_reward:.2f}")
# Ergebnisse analysieren
final_policy = agent.get_policy()
print(f"\nFinal Policy:")
for state, action in final_policy.items():
action_names = {0: 'Hoch', 1: 'Runter', 2: 'Links', 3: 'Rechts'}
print(f"State {state}: {action_names[action]}")
# Q-Tabelle anzeigen
print(f"\nQ-Table (selected states):")
for state_idx in [0, 5, 10, 15]: # Eckpunkte
y = state_idx // 4
x = state_idx % 4
state = (x, y)
q_values = agent.q_table[state_idx]
print(f"State {state}: {q_values}")
self.environments['gridworld'] = env
self.agents['qlearning'] = agent
self.results['qlearning'] = {
'episodes': episodes,
'final_avg_reward': np.mean(episode_rewards[-100:]),
'q_table_size': len(agent.q_table)
}
return episode_rewards
# Einfaches CartPole-like Environment
class CartPoleSimple:
def __init__(self):
self.angle = 0 # Winkel des Poles
self.angular_velocity = 0 # Winkelgeschwindigkeit
self.gravity = 9.8
self.pole_length = 1.0
self.dt = 0.1
def reset(self):
self.angle = random.uniform(-0.1, 0.1)
self.angular_velocity = 0
return self.get_state()
def get_state(self):
return (self.angle, self.angular_velocity)
def step(self, action):
# Aktionen: 0 = Links, 1 = Rechts
force = -10 if action == 0 else 10
# Physik-Update (vereinfacht)
angular_acceleration = (self.gravity / self.pole_length) * np.sin(self.angle) + force
self.angular_velocity += angular_acceleration * self.dt
self.angle += self.angular_velocity * self.dt
# Belohnung und Done-Bedingung
if abs(self.angle) > np.pi / 4: # Pole fällt um
reward = -10
done = True
else:
reward = 1 # Belohnung für Balance
done = False
return self.get_state(), reward, done
def render(self):
print(f"Angle: {self.angle:.3f} rad ({np.degrees(self.angle):.1f}°), "
f"Angular Velocity: {self.angular_velocity:.3f}")
# Policy Gradient Agent (vereinfacht)
class PolicyGradientAgent:
def __init__(self, state_dim=2, action_dim=2, learning_rate=0.01):
self.state_dim = state_dim
self.action_dim = action_dim
self.learning_rate = learning_rate
# Einfache lineare Policy
self.weights = np.random.randn(state_dim, action_dim) * 0.1
def get_action_probabilities(self, state):
# Softmax über lineare Kombination
logits = np.dot(state, self.weights)
exp_logits = np.exp(logits - np.max(logits))
return exp_logits / np.sum(exp_logits)
def choose_action(self, state):
action_probs = self.get_action_probabilities(state)
return np.random.choice(self.action_dim, p=action_probs)
def update_policy(self, states, actions, rewards):
# Vereinfachte Policy Gradient Update
for state, action, reward in zip(states, actions, rewards):
action_probs = self.get_action_probabilities(state)
# Gradient berechnen
grad = np.zeros_like(self.weights)
for a in range(self.action_dim):
if a == action:
grad[:, a] = state * (1 - action_probs[a])
else:
grad[:, a] = -state * action_probs[a]
# Update
self.weights += self.learning_rate * reward * grad
# Policy Gradient Demo
def policy_gradient_demo(self):
print("\n=== Policy Gradient Demo ===")
env = self.CartPoleSimple()
agent = self.PolicyGradientAgent()
episodes = 500
episode_rewards = []
for episode in range(episodes):
state = env.reset()
states, actions, rewards = [], [], []
total_reward = 0
done = False
steps = 0
max_steps = 100
while not done and steps < max_steps:
action = agent.choose_action(state)
next_state, reward, done = env.step(action)
states.append(state)
actions.append(action)
rewards.append(reward)
state = next_state
total_reward += reward
steps += 1
# Policy Update
agent.update_policy(states, actions, rewards)
episode_rewards.append(total_reward)
if episode % 50 == 0:
avg_reward = np.mean(episode_rewards[-50:])
print(f"Episode {episode}: Average Reward (last 50): {avg_reward:.2f}")
# Finale Evaluation
print(f"\nFinal Evaluation:")
state = env.reset()
for step in range(20):
action_probs = agent.get_action_probabilities(state)
action = np.argmax(action_probs)
state, reward, done = env.step(action)
env.render()
if done:
print("Episode finished!")
break
self.environments['cartpole'] = env
self.agents['policy_gradient'] = agent
self.results['policy_gradient'] = {
'episodes': episodes,
'final_avg_reward': np.mean(episode_rewards[-50:])
}
return episode_rewards
# Model Comparison
def compare_rl_models(self):
print("\n=== Reinforcement Learning Models Comparison ===")
comparison_data = []
for model_name, results in self.results.items():
comparison_data.append({
'Model': model_name,
'Episodes': results['episodes'],
'Final Avg Reward': f"{results['final_avg_reward']:.2f}"
})
df = pd.DataFrame(comparison_data)
print(df.to_string(index=False))
return df
# Demo ausführen
def reinforcement_learning_demo():
demo = ReinforcementLearningDemo()
# Q-Learning
q_rewards = demo.q_learning_demo()
# Policy Gradient
pg_rewards = demo.policy_gradient_demo()
# Modelle vergleichen
comparison = demo.compare_rl_models()
# Rewards visualisieren
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(q_rewards)
plt.title('Q-Learning Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.subplot(1, 2, 2)
plt.plot(pg_rewards)
plt.title('Policy Gradient Rewards')
plt.xlabel('Episode')
plt.ylabel('Total Reward')
plt.tight_layout()
plt.show()
return demo, comparison
if __name__ == "__main__":
demo, comparison = reinforcement_learning_demo()
Machine Learning Typen Übersicht
| Typ | Daten | Ziel | Beispiele | Algorithmen |
|---|---|---|---|---|
| Supervised | Gelabelt | Vorhersage | Klassifikation, Regression | Linear Regression, Decision Trees |
| Unsupervised | Ungelabelt | Muster finden | Clustering, Dimensionality Reduction | K-Means, PCA |
| Reinforcement | Umgebung | Maximale Belohnung | Spiel-Playing, Robotik | Q-Learning, Policy Gradients |
Algorithmen Vergleich
Supervised Learning
| Algorithmus | Typ | Komplexität | Vorteile | Nachteile |
|---|---|---|---|---|
| Linear Regression | Regression | O(n) | Interpretierbar | Nur lineare Beziehungen |
| Logistic Regression | Klassifikation | O(n) | Schnell, interpretierbar | Linearität |
| Decision Trees | Beides | O(n log n) | Interpretierbar | Overfitting |
| Random Forest | Beides | O(n log n) | Robust, genau | Komplex |
| SVM | Beides | O(n²) | Hohe Genauigkeit | Skaliert schlecht |
Unsupervised Learning
| Algorithmus | Typ | Komplexität | Vorteile | Nachteile |
|---|---|---|---|---|
| K-Means | Clustering | O(n k i) | Schnell | Nur kugelförmige Cluster |
| DBSCAN | Clustering | O(n log n) | Beliebige Formen | Parameterempfindlich |
| PCA | Dimensionality | O(n d²) | Reduziert Dimension | Linearität |
| t-SNE | Visualization | O(n²) | Nicht-linear | Langsam |
Reinforcement Learning
| Algorithmus | Typ | Komplexität | Vorteile | Nachteile |
|---|---|---|---|---|
| Q-Learning | Model-free | O(s a) | Einfach | Diskrete Räume |
| Deep Q-Network | Model-free | O(n) | Kontinuierlich | Instabil |
| Policy Gradients | Model-free | O(n) | Stochastisch | Hohe Varianz |
ML-Workflow
1. Datensammlung
# Datenquellen identifizieren
# Qualität sicherstellen
# Ethik und Datenschutz beachten
2. Datenvorbereitung
# Cleaning: Fehlende Werte behandeln
# Feature Engineering: Neue Merkmale erstellen
# Scaling: Normalisierung/Standardisierung
# Splitting: Train/Validierung/Test
3. Modellwahl
# Problemtyp identifizieren
# Baseline-Modell erstellen
# Mehrere Algorithmen testen
# Hyperparameter optimieren
4. Training
# Cross-Validation verwenden
# Overfitting vermeiden
# Early Stopping implementieren
# Metriken überwachen
5. Evaluation
# Performance messen
# Fehler analysieren
# Robustheit testen
# Business-Value bewerten
Evaluation Metriken
Klassifikation
- Accuracy: Korrekte Vorhersagen / Gesamt
- Precision: True Positives / (TP + FP)
- Recall: True Positives / (TP + FN)
- F1-Score: Harmonisches Mittel von Precision und Recall
- ROC-AUC: Area Under ROC Curve
Regression
- MSE: Mean Squared Error
- RMSE: Root Mean Squared Error
- MAE: Mean Absolute Error
- R²: Bestimmtheitsmaß
Clustering
- Silhouette Score: Cluster-Qualität
- Davies-Bouldin Index: Cluster-Trennung
- Calinski-Harabasz: Cluster-Verhältnis
Overfitting vs Underfitting
Overfitting
- Symptome: Hohe Trainingsgenauigkeit, niedrige Testgenauigkeit
- Ursachen: Zu komplexes Modell, zu wenig Daten
- Lösungen: Regularisierung, mehr Daten, einfacheres Modell
Underfitting
- Symptome: Niedrige Genauigkeit auf beiden Datensätzen
- Ursachen: Zu einfaches Modell, zu wenig Features
- Lösungen: Komplexeres Modell, Feature Engineering
Feature Engineering
Techniken
# Polynomial Features
# Interaction Terms
# Binning/Discretization
# Log-Transformation
# One-Hot Encoding
# Target Encoding
# Feature Selection
Automatisierung
# AutoML Tools
# Feature Importance Analysis
# Recursive Feature Elimination
# Genetic Algorithms
Vorteile und Nachteile
Vorteile von Machine Learning
- Automatisierung: Manuelle Arbeit reduzieren
- Mustererkennung: Komplexe Zusammenhänge finden
- Skalierbarkeit: Große Datenmengen verarbeiten
- Adaptivität: An neue Daten anpassen
Nachteile
- Datenabhängigkeit: Qualität der Ergebnisse hängt von Daten ab
- Komplexität: Black-Box-Problem
- Rechenkosten: Training kann teuer sein
- Ethik: Bias und Fairness beachten
Häufige Prüfungsfragen
-
Was ist der Unterschied zwischen Supervised und Unsupervised Learning? Supervised Learning verwendet gelabelte Daten für Vorhersagen, Unsupervised Learning findet Muster in ungelabelten Daten.
-
Erklären Sie Overfitting und wie man es vermeidet! Overfitting ist zu starke Anpassung an Trainingsdaten. Vermeidung durch Regularisierung, mehr Daten, Cross-Validation.
-
Wann verwendet man Reinforcement Learning? Wenn ein Agent durch Interaktion mit einer Umgebung lernen soll, maximale Belohnung zu erreichen.
-
Was ist der Unterschied zwischen Classification und Regression? Classification sagt diskrete Klassen voraus, Regression sagt kontinuierliche Werte voraus.
Wichtigste Quellen
- https://scikit-learn.org/stable/
- https://www.coursera.org/learn/machine-learning
- https://www.deeplearning.ai/