Program
Wednesday 17.09
Abstract
Reinforcement learning (RL) with primitive actions often leads to inefficient exploration and brittle behaviors. Extended action representations, such as motion primitives (MPs), offer a more structured approach: they encode trajectories with a concise set of parameters, naturally yielding smooth behaviors and enabling exploration in parameter space rather than in raw action space. This parametrization allows black-box RL algorithms to adapt MP parameters to diverse contexts and initial states, providing a pathway toward versatile skill acquisition. However, standard MP-based approaches result in open-loop policies; to address this, we extend them with online replanning of MP trajectories and off-policy learning strategies that exploit single-time step information. Building on this foundation, we introduce a novel algorithm for skill discovery with MPs that leverages maximum entropy RL and mixture-of-expert models to autonomously acquire diverse, reusable skills. Finally, we present diffusion policies as a more expressive policy class for maximum entropy RL, and highlight their advantageous properties for stability, flexibility, and scalability in complex domains. Together, these contributions demonstrate how extended action representations and advanced policy models can advance the efficiency and versatility of RL.
Thursday 18.09
Abstract
As reinforcement learners, humans and other animals are excellent at improving their otherwise miserable lot in life. This is often described in terms of optimizing utility. However, understanding utility in a non-circular manner is surprisingly difficult. I will talk about an example of the complexity that has important psychological and neural resonance - namely the distinct concepts of 'liking' and 'wanting'. The former characterizes an immediate hedonic experience; and the latter the motivational force associated with that experience. How could it be that we, or an agent, could `want' something that it does not `like', or `like' something that it would not be willing to exert any effort to acquire? I will suggest a framework for answering these questions through the medium of potential-based shaping - in which 'liking' provides immediate, but preliminary and ultimately cancellable, information about the true, long-run worth of outcomes.
Friday 19.09
Abstract
Modeling complex environments and realistic human behaviors within them is a key goal of artificial intelligence research. Progress towards this goal has exciting potential for applications in video games, from new tools that empower game developers to realize new creative visions, to enabling new kinds of immersive player experiences. This talk focuses on recent research advances from the Game Intelligence team at Microsoft Research, towards scalable machine learning architectures that effectively model human gameplay, and our vision of how these innovations could empower creatives in the future.
Poster Sessions
Poster Session A (Wednesday Morning)
- Probably Correct Optimal Stable Matching for Two-Sided Market Under Uncertainty (andreas athanasopoulos, Anne-Marie George, Christos Dimitrakakis)
- When Lower-Order Terms Dominate: Adaptive Expert Algorithms for Heavy-Tailed Losses (Antoine Moulin, Emmanuel Esposito, Dirk van der Hoeven)
- Linear Bandits with Non-i.i.d. Noise (Baptiste Abélès, Eugenio Clerico, Hamish Flynn, Gergely Neu)
- CaRL: Learning Scalable Planning Policies with Simple Rewards (Bernhard Jaeger, Daniel Dauner, Jens Beißwenger, Simon Gerstenecker, Kashyap Chitta, Andreas Geiger)
- Online Episodic Convex Reinforcement Learning (Bianca Marin Moreno, Khaled Eldowa, Pierre Gaillard, Margaux Brégère, Nadia Oudjane)
- PB²: Preference Space Exploration via Population-Based Methods in Preference-Based Reinforcement Learning (Brahim Driss, Alex Davey, Riad Akrour)
- CrossQ+WN: Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalization (Daniel Palenicek, Florian Vogt, Joe Watson, Jan Peters)
- Reinforcement learning with non-ergodic reward increments: robustness via ergodicity transformations (Dominik Baumann, Erfaun Noorani, James Price, Ole Peters, Colm Connaughton, Thomas B. Schön)
- Conservative Value Priors: A Bayesian Path to Offline Reinforcement Learning (Filippo Valdettaro, Yingzhen Li, Aldo A. Faisal)
- Multi-Objective Utility Actor Critic with Utility Critic for Nonlinear Utility Function (Gao Peng, Eric Pauwels, Hendrik Baier)
- Exploiting Curvature in Online Convex Optimization with Delayed Feedback (Hao Qiu, Emmanuel Esposito, Mengxiao Zhang)
- Average-Reward Soft Actor-Critic (Jacob Adamczyk, Volodymyr Makarenko, Stas Tiomkin, Rahul V Kulkarni)
- Hadamax Encoding: Elevating Performance in Model-Free Atari (Jacob Eeuwe Kooi, Zhao Yang, Vincent Francois-Lavet)
- Pink Noise LQR: How does Colored Noise affect the Optimal Policy in RL? (Jakob Hollenstein, Marko Zaric, Samuele Tosatto, Justus Piater)
- Shared dynamic model aligned hypernetworks for contextual reinforcement learning (Jan Benad, Frank Röder, Martin V. Butz, Manfred Eppe)
- Curriculum Reinforcement Learning for Complex Reward Functions (Kilian Freitag, Kristian Ceder, Rita Laezza, Knut Åkesson, Morteza Haghir Chehreghani)
- Provably Efficient Long-Horizon Exploration in Monte Carlo Tree Search through State Occupancy Regularization (Liam Schramm)
- Test-time Offline Reinforcement Learning on Goal-related Experience (Marco Bagatella, Mert Albaba, Jonas Hübotter, Georg Martius, Andreas Krause)
- Budgeted Improving Bandits (Matilde Tullii, Nadav Merlis, Vianney Perchet)
- On the Convergence of Single-Timescale Actor-Critic (Navdeep Kumar, Priyank Agrawal, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor)
- A Markov Decision Process for Variable Selection in Branch & Bound (Paul STRANG, Zacharie ALES, Côme Bissuel, Safia Kedad-Sidhoum, Olivier Juan, Emmanuel Rachelson)
- Studying Exploration in RL: An Optimal Transport Analysis of Occupancy Measure Trajectories (Reabetswe M. Nkhumise, Tony J. Prescott, Debabrota Basu, Aditya Gilra)
- Latent Inference for Effective Multi-Agent Reinforcement Learning under Partial Observability (Salma Kharrat, Fares Fourati, Marco Canini, Mohamed-Slim Alouini, Vaneet Aggarwal)
- Distances for Markov chains from sample streams (Sergio Calo, Anders Jonsson, Gergely Neu, Ludovic Schwartz, Javier Segovia-Aguas)
- Sample, Predict, then Proceed: Self-Verification Sampling for Tool Use of LLMs (Shangmin Guo, Omar Darwiche Domingues, Raphaël Avalos, Aaron Courville, Florian Strub)
- IL-SOAR: Imitation Learning with soft optimistic actor critic (Stefano Viel, Luca Viano, Volkan Cevher)
- Bridging the Performance Gap Between Target-Free and Target-Based Reinforcement Learning With Iterated Q-Learning (Théo Vincent, Yogesh Tripathi, Tim Faust, Yaniv Oren, Jan Peters, Carlo D’Eramo)
- Active Perception for Tactile Sensing: A Task-Agnostic Attention-Based Approach (Tim Schneider, Cristiana de Farias, Roberto Calandra, Liming Chen, Jan Peters)
- MEAL: A Benchmark for Continual Multi-Agent Reinforcement Learning (Tristan Tomilin, Luka van den Boogaard, Samuel Garcin, Bram Grooten, Meng Fang, Mykola Pechenizkiy)
- Convergence and Sample Complexity of First-Order Methods for Agnostic Reinforcement Learning (Uri Sherman, Tomer Koren, Yishay Mansour)
Poster Session B (Wednesday Afternoon)
- Best of Both Worlds: Regret Minimization versus Minimax Play (Adrian Müller, Jon Schneider, Stratis Skoulakis, Luca Viano, Volkan Cevher)
- Behind the Myth of Exploration in Policy Gradients (Adrien Bolland, Gaspard Lambrechts, Damien Ernst)
- Learning Compact Regular Decision Processes using Priors and Cascades (Ahana Deb, Anders Jonsson, Alessandro Ronca, Mohammad Sadegh Talebi)
- Wasserstein-Barycenter Consensus for Cooperative Multi-Agent Reinforcement Learning (Ali Baheri)
- Locally Differentially Private Thresholding Bandits (Annalisa Barbara, Joseph Lazzaro, Ciara Pike-Burke)
- Optimistically Optimistic Exploration for Provably Efficient Infinite-Horizon Reinforcement and Imitation Learning (Antoine Moulin, Gergely Neu, Luca Viano)
- Inverse Q-Learning Done Right: Offline Imitation Learning in $Q^\pi$-Realizable MDPs (Antoine Moulin, Gergely Neu, Luca Viano)
- Unifying (Federated) (Private) High-Dimensional Bandits via ADMM (Apurv Shukla, Debabrota Basu)
- Language Models For PDDL Planning: Generating Sound and Programmatic Policies (Dillon Ze Chen, Johannes Zenn, Tristan Cinquin)
- Taming Adversarial Constraints in Constrained MDPs (Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti)
- Constraint-Aware Diffusion Guidance for Imitation Learning (Hao Ma, Sabrina Bodmer, Andrea Carron, Melanie Zeilinger, Michael Muehlebach)
- On the Variance of Temporal Difference Learning and its Reduction Using Control Variates (Hsiao-Ru Pan, Bernhard Schölkopf)
- ConceptACT: Integrating High-Level Semantic Concepts into Transformer-Based Imitation Learning (Jakob Karalus)
- On the Effect of Regularization in Policy Mirror Descent (Jan Felix Kleuker, Thomas M. Moerland, Aske Plaat)
- Performance Prediction In Reinforcement Learning: The Bad And The Ugly (Julian Dierkes, Theresa Eimer, Marius Lindauer, Holger Hoos)
- Co-Design and Control of a Biomimetic Snake Robot and its Contact Surfaces with Reinforcement Learning (Liza Darwesh, Riccardo Pretto, Shivam Chaubey, Kevin Sebastian Luck)
- Sparse Optimistic Information Directed Sampling (Ludovic Schwartz, Hamish Flynn, Gergely Neu)
- Fine-tuning Behavioral Cloning Policies with Preference-Based Reinforcement Learning (Mael Macuglia, Paul Friedrich, Giorgia Ramponi)
- Revisiting Proximal Policy Optimization (Mahdi Kallel, Jose-Luis Holgado-Alvarez, Samuele Tosatto, Carlo D’Eramo)
- Synthesizing Depowdering Trajectories for Robot Arms Using Deep Reinforcement Learning (Maximilian Maurer, Simon Seefeldt, Jan R. Seyler, Shahram Eivazi)
- Market Making without Regret (Nicolò Cesa-Bianchi, Tommaso Cesari, Roberto Colomboni, Luigi Foscari, Vinayak Pathak)
- Targeted Poisoning of Reinforcement Learning Agents (Omran Shahbazi Gholiabad, Mohammad Mahmoody)
- DIME: Diffusion-Based Maximum Entropy Reinforcement Learning (Onur Celik, Zechu Li, Denis Blessing, Ge Li, Daniel Palenicek, Jan Peters, Georgia Chalvatzaki, Gerhard Neumann)
- ShiQ: Bringing back Bellman to LLMs (Pierre Clavier, Nathan Grinsztajn, Raphaël Avalos, Yannis Flet-Berliac, Irem Ergun, Omar Darwiche Domingues, Eugene Tarassov, Olivier Pietquin, Pierre Harvey Richemond, Florian Strub, Matthieu Geist)
- Utilising the Parameter-Performance Relationship for Efficient Multi-Objective Reinforcement Learning (Qiyue Xia, J. Michael Herrmann)
- Deep Reinforcement Learning Agents are not even close to Human Intelligence (Quentin Delfosse, Jannis Blüml, Fabian Tatai, Théo Vincent, Bjarne Gregori, Elisabeth Dillies, Jan Peters, Constantin A. Rothkopf, Kristian Kersting)
- Learning to Explore in Diverse Reward Settings via Temporal-Difference-Error Maximization (Sebastian Griesbach, Carlo D’Eramo)
- Barycenter Policy Design for Multiple Policy Evaluation (Simon Weissmann, Till Freihaut, Claire Vernade, Giorgia Ramponi, Leif Döring)
- A Theoretical Perspective on Sequential Decision Making with Preference Feedback (Simone Drago, Marco Mussi, Alberto Maria Metelli)
- FraPPE: Fast and Efficient Preference-based Pure Exploration (Udvas Das, Apurv Shukla, Debabrota Basu)
- Unsupervised Action-Policy Quantization via Maximum Entropy Mixture Policies with Minimum Entropy Components (Yamen Habib, Dmytro Grytskyy, Rubén Moreno-Bote)
- EVarEst: Error-Variance penalized Estimation for Deep Reinforcement Learning (Yann Berthelot, Timothée Mathieu, Riad Akrour, Philippe Preux)
- Stackelberg Learning from Human Feedback: Preference Optimization as a Sequential Game (Barna Pásztor, Thomas Kleine Buening, Andreas Krause)
- SENSEI: Semantic Exploration Guided by Foundation Models to Learn Versatile World Models (Cansu Sancaktar, Christian Gumbsch, Andrii Zadaianchuk, Pavel Kolev, Georg Martius)
Poster Session C (Thursday Morning)
- Mighty: A Comprehensive Tool for studying Generalization, Meta-RL and AutoRL (Aditya Mohan, Theresa Eimer, Carolin Benjamins, Marius Lindauer, André Biedenkapp)
- The Batch Complexity of Bandit Pure Exploration (Adrienne Tuynman, Rémy Degenne)
- Sparsity-Driven Plasticity in Multi-Task Reinforcement Learning (Aleksandar Todorov, Juan Cardenas-Cartagena, Rafael F. Cunha, Marco Zullich, Matthia Sabatelli)
- Convergence of regularized agent-state based Q-learning in POMDPs (Amit Sinha, Matthieu Geist, Aditya Mahajan)
- Offline Goal-Conditioned Reinforcement Learning with Projective Quasimetric Planning (Anthony Kobanda, Waris Radji, Mathieu Petitbois, Odalric-Ambrym Maillard, Rémy Portelas)
- Closing the gap between SVRG and TD-SVRG with Gradient Splitting (Arsenii Mustafin, Alex Olshevsky, Ioannis Paschalidis)
- MDP Geometry, Normalization and Reward Balancing Solvers (Arsenii Mustafin, Alex Olshevsky, Ioannis Paschalidis, Aleksei Pakharev)
- $K$-Level Policy Gradients for Multi-Agent Reinforcement Learning (Aryaman Reddi, Gabriele Tiboni, Jan Peters, Carlo D’Eramo)
- On Rollouts in Model-Based Reinforcement Learning (Bernd Frauenknecht, Devdutt Subhasish, Friedrich Solowjow, Sebastian Trimpe)
- Generalization with a SPARC: Single-Phase Adaptation for Reinforcement Learning in Contextual Environments (Bram Grooten, Patrick MacAlpine, Kaushik Subramanian, Peter R. Wurman, Peter Stone)
- MaDi: Learning to Mask Distractions for Generalization in Visual Deep Reinforcement Learning (Bram Grooten, Tristan Tomilin, Gautham Vasan, Matthew E. Taylor, A. Rupam Mahmood, Meng Fang, Mykola Pechenizkiy, Decebal Constantin Mocanu)
- Does Stochastic Gradient really succeed for bandits? (Dorian Baudry, Emmeran Johnson, Simon Vary, Ciara Pike-Burke, Patrick Rebeschini)
- Optimizing the coalition gain in Online Auctions with Greedy Structured Bandits (Dorian Baudry, Hugo Richard, Maria Cherifa, Clément Calauzènes, Vianney Perchet)
- How to craft a deep reinforcement learning policy for wind farm flow control (Elie KADOCHE, PASCAL BIANCHI, Florence Carton, Philippe Ciblat, Damien Ernst)
- Object Empowerment-Driven Tool Selection for Exploration in Reinforcement Learning (Faizan Rasheed, Kenzo Clauw, Nicola Catenacci Volpi, Daniel Polani)
- A Theoretical Justification for Asymmetric Actor-Critic Algorithms (Gaspard Lambrechts, Damien Ernst, Aditya Mahajan)
- Efficient Prior Selection in Gaussian Process Bandits with Thompson Sampling (Jack Sandberg, Morteza Haghir Chehreghani)
- Upside Down Reinforcement Learning with Policy Generators (Jacopo Di Ventura, Dylan R. Ashley, Vincent Herrmann, Francesco Faccio, Jürgen Schmidhuber)
- COGENT: Co-design of Robots with GFlowNets (Kishan Reddy Nagiredla, Arun Kumar A V, Thommen George Karimpanal, Kevin Sebastian Luck, Santu Rana)
- Bandit Optimal Transport (Lorenzo Croissant)
- On Evaluating Policies for Robust POMDPs (Merlijn Krale, Eline M. Bovy, Maris F. L. Galesloot, Thiago D. Simão, Nils Jansen)
- Understanding Exploration in Bandits with Switching Constraints: A Batched Approach in Fixed-Confidence Pure Exploration (Newton Mwai, Milad Malekipirbazari, Fredrik D. Johansson)
- High Effort, Low Gain: Fundamental Limits of Active Learning for Linear Dynamical Systems (Nicolas Chatzikiriakos, Kevin Jamieson, Andrea Iannelli)
- Epistemically-guided forward-backward exploration (Núria Armengol Urpí, Marin Vlastelica, Georg Martius, Stelian Coros)
- The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes (Pedro Pinto Santos, Alberto Sardinha, Francisco S. Melo)
- Self-Supervised Multisensory Pretraining for Contact-Rich Robot Reinforcement Learning (Rickmer Krohn, Vignesh Prasad, Gabriele Tiboni, Georgia Chalvatzaki)
- Learning Reward Structure with Subtasks in Reinforcement Learning (Shuai Han, Mehdi Dastani, Shihan Wang)
- Geometry-aware RL for Manipulation of Varying Shapes and Deformable Objects (Tai Hoang, Huy Le, Philipp Becker, Vien Anh Ngo, Gerhard Neumann)
- Learning Equilibria from Data: Provably Efficient Multi-Agent Imitation Learning (Till Freihaut, Luca Viano, Volkan Cevher, Matthieu Geist, Giorgia Ramponi)
- Value Improved Actor Critic Algorithms (Yaniv Oren, Moritz Akiya Zanger, Pascal R. Van der Vaart, Mustafa Mert Çelikok, Wendelin Boehmer, Matthijs T. J. Spaan)
- Model-free Low-rank Reinforcement Learning via Leveraged Entry-wise Matrix Estimation (Yassir Jedra, Alexandre Proutiere, Stefan Stojanovic)
- AREPO: Uncertainty-Aware Robot Ensemble Learning Under Extreme Partial Observability (Yurui Du, Louis Hanut, Herman Bruyninckx, Renaud Detry)
Poster Session D (Thursday Afternoon)
- Overcoming Non-stationary Dynamics with Evidential Proximal Policy Optimization (Abdullah Akgül, Gulcin Baykal, Manuel Haussmann, Melih Kandemir)
- Safe exploration in reproducing kernel Hilbert spaces (Abdullah Tokmak, Kiran G. Krishnan, Thomas B. Schön, Dominik Baumann)
- A Novel Self-Normalized Bernstein-Like Dimension-Free Inequality and Regret Bounds for Generalized Kernelized Bandits (Alberto Maria Metelli, Simone Drago, Marco Mussi)
- Efficient Risk-sensitive Planning via Entropic Risk Measures (Alexandre Marthe, Samuel Bounan, Aurélien Garivier, Claire Vernade)
- Posterior Sampling using Prior-Data Fitted Networks for Optimizing Complex AutoML Pipelines (Amir Rezaei Balef, Katharina Eggensperger)
- Learning Equilibria in Matching Games with Bandit Feedback (andreas athanasopoulos, Christos Dimitrakakis)
- Deep Actor-Critics with Tight Risk Certificates (Bahareh Tasdighi, Manuel Haussmann, Yi-Shan Wu, Andres R Masegosa, Melih Kandemir)
- Data, Auxiliary Losses, or Normalization Layers for Plasticity? A case study with PPO on Atari (Daniil Pyatko, Andrea Miele, Skander Moalla, Caglar Gulcehre)
- Reinforcement Learning vs Optimal Control: Sparse Nonlinear Dynamical Systems Between Theory and Practice (Davide Maran, Gianmarco Tedeschi, Enea Gusmeroli, Marcello Restelli)
- Gym4ReaL: A Benchmark Suite for Evaluating Reinforcement Learning in Realistic Domains (Davide Salaorni, Vincenzo De Paola, Samuele Delpero, Giovanni Dispoto, Paolo Bonetti, Alessio Russo, Giuseppe Calcagno, Francesco Trovò, Matteo Papini, Alberto Maria Metelli, Marco Mussi, Marcello Restelli)
- Two-Player Zero-Sum Games with Bandit Feedback (Elif Yılmaz, Christos Dimitrakakis)
- Trading-off Reward Maximization and Stability in Sequential Decision Making (Federico Corso, Marco Mussi, Alberto Maria Metelli)
- Entropic Risk Optimization in Discounted MDPs: Sample Complexity Bounds with a Generative Model (Oliver Mortensen, Mohammad Sadegh Talebi)
- Policy Optimization for CMDPs with Bandit Feedback: Learning Stochastic and Adversarial Constraints (Francesco Emanuele Stradi, Anna Lunghi, Matteo Castiglioni, Alberto Marchesi, Nicola Gatti)
- Enhancing de novo Drug Design by Incorporating Diversity into Reinforcement Learning (Hampus Gummesson Svensson, Christian Tyrchan, Ola Engkvist, Morteza Haghir Chehreghani)
- Fair Contracts in Principal-Agent Games with Heterogeneous Types (Jakub Tłuczek, Victor Villin, Christos Dimitrakakis)
- TE-RoboNet: Transfer Enhanced RoboNet for Sample-Efficient Generation of Robot Co-Designs (Kishan Reddy Nagiredla, Arun Kumar A V, Kevin Sebastian Luck, Thommen George Karimpanal, Santu Rana)
- Learning the Minimum Action Distance (Lorenzo Steccanella, Joshua Benjamin Evans, Özgür Şimşek, Anders Jonsson)
- Optimal Best Arm Identification under Differential Privacy (Marc Jourdan, Achraf Azize)
- Non-rectangular Robust MDPs with Normed Uncertainty Sets (Navdeep Kumar, Adarsh Gupta, Maxence Mohamed ELFATIHI, Giorgia Ramponi, Kfir Yehuda Levy, Shie Mannor)
- Partially Observable Reinforcement Learning with Memory Traces (Onno Eberhard, Michael Muehlebach, Claire Vernade)
- AlphaZero Neural Scaling and Zipf’s Law: a Tale of Board Games and Power Laws (Oren Neumann, Claudius Gros)
- Learning Collusion in Episodic, Inventory-Constrained Markets (Paul Friedrich, Barna Pásztor, Giorgia Ramponi)
- Interpretable Reinforcement Learning via Meta-Policy Guidance (Raban Emunds, Jannis Blüml, Quentin Delfosse, Kristian Kersting)
- Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning (Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh, Zebang Shen, Niao He, Andreas Krause)
- One Does Not Simply Estimate State: Comparing Model-based and Model-free Reinforcement Learning on the Partially Observable MordorHike Benchmark (Sai Prasanna, André Biedenkapp, Raghu Rajan)
- ACING: Actor-Critic for Instruction Learning in Black-Box LLMs (Salma Kharrat, Fares Fourati, Marco Canini)
- Non-Stationary Lipschitz Bandits (Solenne Gaucher, Claire Vernade, Nicolas Nguyen)
- Leveraging priors on distribution functions for multi arm bandits (Sumit Vashishtha, Odalric-Ambrym Maillard)
- Learning Abstract World Models with a Group-Structured Latent Space (Thomas Delliaux, Nguyen-Khanh Vu, Vincent Francois-Lavet, Elise van der Pol, Emmanuel Rachelson)
- Inference of Intrinsic Rewards and Fairness in Multi-Agent Systems (Victor Villin, Christos Dimitrakakis)
- Learning Robust Representations for World Models without Reward Signals (Zeqiang Zhang, Fabian Wurzberger, Sebastian Gottwald, Daniel Alexander Braun)
Poster Session E (Friday Morning)
- Improving Reward-Based Hindsight Credit Assignment (Aditya A. Ramesh, Jiamin He, Jürgen Schmidhuber, Martha White)
- Off-Policy Maximum Entropy RL with Future State and Action Visitation Measures (Adrien Bolland, Gaspard Lambrechts, Damien Ernst)
- It is All Connected: Multi-Task Reinforcement Learning via Mode Connectivity (Ahmed Hendawy, Henrik Metternich, Jan Peters, Gabriele Tiboni, Carlo D’Eramo)
- StaQ it! Growing neural networks for Policy Mirror Descent (Alena Shilova, Alex Davey, Brahim Driss, Riad Akrour)
- Contrastive Representations for Combinatorial Reasoning (Alicja Ziarko, Michał Bortkiewicz, Michał Zawalski, Benjamin Eysenbach, Piotr Miłoś)
- An Open-Loop Baseline for Reinforcement Learning Locomotion Tasks (Antonin Raffin, Olivier Sigaud, Jens Kober, Alin Albu-Schaeffer, João Silvério, Freek Stulp)
- Addressing Rotational Learning Dynamics in Multi-Agent Reinforcement Learning (Baraah A. M. Sidahmed, Tatjana Chavdarova)
- Convergence Guarantees of Model-free Policy Gradient Methods for LQR with Stochastic Data (Bowen Song, Andrea Iannelli)
- The challenge of hidden gifts in multi-agent reinforcement learning (Dane Malenfant, Blake Aaron Richards)
- Informed Asymmetric Actor-Critic: Theoretical Insights and Open Questions (Daniel Ebi, Gaspard Lambrechts, Damien Ernst, Klemens Böhm)
- Stochastic Shortest Path with Sparse Adversarial Costs (Emmeran Johnson, Alberto Rumi, Ciara Pike-Burke, Patrick Rebeschini)
- Online Optimization of Closed-Loop Control Systems (Hao Ma, Melanie Zeilinger, Michael Muehlebach)
- Interpretable Rule Learning for Reactive Activity Recognition in Event-Driven RL Environments (Ivelina Stoyanova, Nicolas MUSEUX, Sao Mai Nguyen, David Filliat)
- Exploiting Model Errors for Exploration in Model-Based Reinforcement Learning (Jared Swift, Matteo Leonetti)
- Embedding Safety into RL: A New Take on Trust Region Methods (Johannes Müller, Nikola Milosevic, Nico Scherf)
- Missingness-MDPs: Bridging the Theory of Missing Data and POMDPs (Joshua Wendland, Markel Zubia, Roman Andriushchenko, Maris F. L. Galesloot, Milan Ceska, Henrik von Kleist, Thiago D. Simão, Maximilian Weininger, Nils Jansen)
- Chargax: A JAX Accelerated EV Charging Simulator (Koen Ponse, Jan Felix Kleuker, Aske Plaat, Thomas M. Moerland)
- Modular Recurrence in Contextual MDPs for Universal Morphology Control (Laurens Engwegen, Daan Brinks, Wendelin Boehmer)
- Bellman Diffusion Models for Offline Reinforcement Learning (Liam Schramm, Abdeslam Boularias)
- rfPG: Robust Finite-Memory Policy Gradients for Hidden-Model POMDPs (Maris F. L. Galesloot, Roman Andriushchenko, Milan Ceska, Sebastian Junges, Nils Jansen)
- The Sample Complexity of Online Reinforcement Learning: A Multi-model Perspective (Michael Muehlebach, Zhiyu He, Michael I. Jordan)
- Gait in Eight: Efficient On-Robot Learning for Omnidirectional Quadruped Locomotion (Nico Bohlinger, Jonathan Kinzel, Daniel Palenicek, Łukasz Antczak, Jan Peters)
- Long-Horizon Planning with Predictable Skills (Nico Gürtler, Georg Martius)
- Cluster Agnostic Network Lasso Bandits (Sofien Dhouib, Steven Bilaj, Behzad Nourani-Koliji, Setareh Maghsudi)
- Exploring a Graph-based Approach to Reinforcement Learning for Sepsis Treatment (Taisiya Khakharova, Lucas Sakizloglou, Leen Lambers)
- On Feasible Rewards in Multi-Agent Inverse Reinforcement Learning (Till Freihaut, Giorgia Ramponi)
- HASARD: A Benchmark for Vision-Based Safe Reinforcement Learning in Embodied Agents (Tristan Tomilin, Meng Fang, Mykola Pechenizkiy)
- Information-Based Exploration via Random Features (Waris Radji, Odalric-Ambrym Maillard)
- Mental Modelling of Reinforcement Learning Agents by Language Models (Wenhao Lu, Xufeng Zhao, Josua Spisak, Jae Hee Lee, Stefan Wermter)
- Learning Multimodal Behaviors from Scratch with Diffusion Policy Gradient (Zechu Li, Rickmer Krohn, Tao Chen, Anurag Ajay, Pulkit Agrawal, Georgia Chalvatzaki)
- Strategyproof Reinforcement Learning from Human Feedback (Thomas Kleine Buening, Jiarui Gan, Debmalya Mandal, Marta Kwiatkowska)
- Parameter-Free Dynamic Regret for Unconstrained Linear Bandits (Alberto Rumi, Andrew Jacobsen, Nicolò Cesa-Bianchi, Fabio Vitale)
- Sparse Nonparametric Contextual Bandits (Hamish Flynn, Julia Olkhovskaya, Paul Rognon-Vael)