A method for the online construction of the set of states of a Markov decision process using answer set programming

FERREIRA, L. A.Reinaldo BianchiSANTOS, P. E.DE MANTARAS, R. L.2022-01-122022-01-122018-06-28FERREIRA, L. A.; BIANCHI, R.; SANTOS, P. E.; DE MANTARAS, R. L. A method for the online construction of the set of states of a Markov decision process using answer set programming. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 10868 LNAI, p. 3-15, Jun. 2018.1611-3349https://repositorio.fei.edu.br/handle/FEI/3807© 2018, Springer International Publishing AG, part of Springer Nature.Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.Acesso RestritoA method for the online construction of the set of states of a Markov decision process using answer set programmingArtigo de evento10.1007/978-3-319-92058-0_1