On-Policy Optimization of ANFIS Policies Using Proximal Policy Optimization
Abstract
We present a reinforcement learning method for training neuro-fuzzy controllers using Proximal Policy Optimization (PPO). Unlike prior approaches that used Deep Q-Networks (DQN) with Adaptive Neuro-Fuzzy Inference Systems (ANFIS), our PPO-based framework leverages a stable on-policy actor-critic setup. Evaluated on the CartPole-v1 environment across multiple seeds, PPO-trained fuzzy agents consistently achieved the maximum return of 500 with zero variance after 20000 updates, outperforming ANFIS-DQN baselines in both stability and convergence speed. This highlights PPO's potential for training explainable neuro-fuzzy agents in reinforcement learning tasks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.