Group Project

Tabula Rasa — Sisyphus Reinforcement Learning

PythonPPOStable-Baselines3GymnasiumPyTorchBox2DPygame

Overview

ML2 course project at INN. A group project in name — all technical implementation was done independently. A reinforcement learning agent trained with PPO (Proximal Policy Optimization) across three transfer learning phases: 10M steps learning to walk in BipedalWalker-v3, 20M steps learning to push a boulder in a custom SisyphusWalker environment, and 30M steps learning to push the boulder up a progressively steepening slope. The project is inspired by Camus's interpretation of the Sisyphus myth — the agent endlessly struggles uphill, yet the training continues. The custom environment was built from scratch using Box2D physics, including boulder dynamics, a custom reward system, and an exponential slope curve that starts gentle and gets steeper towards the top.

My Contributions

Originated the Sisyphus concept and pitched it to the group
Trained the BipedalWalker agent from scratch over 10M steps with extensive hyperparameter tuning
Built the custom SisyphusWalker Gymnasium environment (sisyphus_env.py) with boulder physics, custom reward system, and Box2D rendering
Implemented transfer learning across all three training phases
Implemented the slope from scratch — exponential curve with tuned friction (2.0) and boulder density (0.5) for optimal agent training
Fixed rendering flicker by switching from manual Pygame to BipedalWalker's drawlist system
Produced the project trailer from scratch (filmed twice), including glitch effects and custom AI voiceover — motivational Bane (Batman) voice via Fish.audio for narration, and Rene Morgan voice for system reboot sequences
Wrote technical documentation, data flow diagram, README, and interim presentation
Managed the group as team leader: GitHub setup, Discord, contract, meeting coordination

Tech Breakdown

Python, Stable-Baselines3 PPO, Farama Gymnasium (BipedalWalker-v3 + custom SisyphusWalker-v0), PyTorch (backend), Box2D (physics), Pygame (rendering), transfer learning across 3 phases, custom reward shaping, CheckpointCallback for model snapshots every 100k steps, TensorBoard logging, 32 parallel environments via make_vec_env

View Code on GitHub Request Report