online-pomdp-planning¶
Partially observable Markov decision processes (POMDP [kaelbling_planning_1998]) is a mathematical framework for defining reinforcement learning (RL) in environments with hidden state. To solve the RL problem means to come up with a policy, a mapping from the past observations of the environment to an action.
Online planning is the family of methods that assumes access to (a simulator of) the dynamics and infers what action to take during execution. For this it requires the belief, a probability distribution over the current state. The planner takes a current belief of the current state of the environment and a simulator, and spits out its favorite action.
- kaelbling_planning_1998
Kaelbling, Leslie Pack, Michael L. Littman, and Anthony R. Cassandra. “Planning and acting in partially observable stochastic domains.“ Artificial intelligence 101.1-2 (1998): 99-134.
This library implements a set of these methods:
Concretely, this package provides factory functions to construct
Planner. A planner is a function that
is called with a Belief, and returns a
Action.
-
Planner.__call__(belief)[source] The main functionality this package offers: a method that takes in a belief and returns an action
- Parameters
belief (
Belief) –- Return type
Tuple[Hashable,Dict[str,Any]]- Returns
the chosen action and run-time information
Types¶
I am unreasonably terrified of dynamic typed languages and have gone to extremes to define as many as possible. Most of these are for internal use, but you will come across some as a user of this library. Most of these types will have no actual meaning, in particular:
The abstract type representing actions requires to be hash-able |
|
The abstract type representing observations requires to be hash-able |
|
The abstract type for a state, no particular protocol is expected |
Are domain specific and unimportant. All that is required is that the
Action and
Observation are hashable. The
State is not used by the library code
whatsoever.
A notable exception is the Belief,
which is assumed to a callable that produces states. This represent that we
assume the belief is a way of sampling states.
-
Belief.__call__()[source] Required implementation of belief: the ability to sample states
- Return type
Any