A method for the online construction of the set of states of a Markov decision process using answer set programming
N/D
Tipo de produção
Artigo de evento
Data de publicação
2018-06-28
Texto completo (DOI)
Periódico
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Editor
Texto completo na Scopus
Citações na Scopus
2
Autores
FERREIRA, L. A.
Reinaldo Bianchi
SANTOS, P. E.
DE MANTARAS, R. L.
Orientadores
Resumo
© 2018, Springer International Publishing AG, part of Springer Nature.Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named Online ASP for MDP (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.
Citação
FERREIRA, L. A.; BIANCHI, R.; SANTOS, P. E.; DE MANTARAS, R. L. A method for the online construction of the set of states of a Markov decision process using answer set programming. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), v. 10868 LNAI, p. 3-15, Jun. 2018.
Palavras-chave
Keywords
Assuntos Scopus
Answer set programming; Changing environment; Domain constraint; Finding solutions; Markov Decision Processes; Optimal policies; Sequential decision making; Value function approximation