online-pomdp-planning

Partially observable Markov decision processes (POMDP [kaelbling_planning_1998]) is a mathematical framework for defining reinforcement learning (RL) in environments with hidden state. To solve the RL problem means to come up with a policy, a mapping from the past observations of the environment to an action.

Online planning is the family of methods that assumes access to (a simulator of) the dynamics and infers what action to take during execution. For this it requires the belief, a probability distribution over the current state. The planner takes a current belief of the current state of the environment and a simulator, and spits out its favorite action.

kaelbling_planning_1998

Kaelbling, Leslie Pack, Michael L. Littman, and Anthony R. Cassandra. “Planning and acting in partially observable stochastic domains.“ Artificial intelligence 101.1-2 (1998): 99-134.

This library implements a set of these methods:

Concretely, this package provides factory functions to construct Planner. A planner is a function that is called with a Belief, and returns a Action.

Planner.__call__(belief)[source]

The main functionality this package offers: a method that takes in a belief and returns an action

Parameters

belief (Belief) –

Return type

Tuple[Hashable, Dict[str, Any]]

Returns

the chosen action and run-time information

Types

I am unreasonably terrified of dynamic typed languages and have gone to extremes to define as many as possible. Most of these are for internal use, but you will come across some as a user of this library. Most of these types will have no actual meaning, in particular:

online_pomdp_planning.types.Action

The abstract type representing actions requires to be hash-able

online_pomdp_planning.types.Observation

The abstract type representing observations requires to be hash-able

online_pomdp_planning.types.State

The abstract type for a state, no particular protocol is expected

Are domain specific and unimportant. All that is required is that the Action and Observation are hashable. The State is not used by the library code whatsoever.

A notable exception is the Belief, which is assumed to a callable that produces states. This represent that we assume the belief is a way of sampling states.

Belief.__call__()[source]

Required implementation of belief: the ability to sample states

Return type

Any