MinigridSafe
MinigridSafe is an extension of the Minigrid library. It is aimed at Reinforcement Learning researchers that conduct research with a focus on safety-critical aspects.
Installation
Download the current version of MinigridSafe from github:
git clone https://github.com/PrangerStefan/MinigridSafe
Assuming that you are using a virtual environment for python (as described in the installation guide for tempestpy), installing MinigridSafe is as easy as:
pip install -e MinigridSafe
Safety-Critical Aspects
MinigridSafe is built on top of the popular Farama Minigrid library. In order to foster research of safety-critical aspects in reinforcement learning, we extend the feature set of Minigrid by adding probabilistic behaviour and additional, potentially adversarial, actors.
Uncertain Movement
MinigridSafe adds the possibility to incorporate probabilistic movement for agents to a grid world environment. This is based on a different type of “slippery” tiles. If the agent moves onto a slippery tile, its next movement is affected by a probabilistic update, depending on the direction the agent is moving and the “tilt” of the tile. The desired movement will only happen with a probability of pa.
In the example above, the slippery tiles are tilted towards the north. If the agent moves along the direction of the tilt, it will be uniformly diverted onto an adjacent tile. Moving orthogonal to the tilt of the tile potentially moves the agent into the direction of the tilt. If the agents want to move in the opposite direction, it will fail with a probability of 1 - pa.
This code snippet shows how a block of slippery tiles of length 3 can be added in the _gen_grid
method of an environment class:
def _gen_grid(self, width, height):
...
slippery_north = SlipperyNorth(probability_intended=self.probability_intended, probability_turn_intended=self.probability_turn_intended
self.grid.horz_wall(2, 1, 3, slippery_north)
...
Faulty Actions
A second interesting source of uncertainty are faults in the action execution of the agent.
MinigridSafe
implements this by adapting the step
method.
Faults can occur multiple times, independent of whether a fault occured in the previous step.
Additional Actors
MinigridSafe allows you to add multiple actors to an environment. The behaviour of these actors is defined via lists of tasks that they try to accomplish.
The base class AdversaryEnv
implements a modified version of the step method to handle the behaviour of the additional actors.
Any environment that adds actors needs to be derived from this environment class:
class AdversaryEnv(MiniGridEnv)
Currently, the task list of an additional actor can consist of the following tasks:
- FollowAgent. The actor follows the actor specified, for the specified duration. A duration of 0 means, that the actor will follow the specified actor indefinitely. It will try to crash into the other actor if possible.
- GoTo. The actor follows the shortest path (computed via A*) to the specified destination. If no path to the destination exists, the agent stays idle and will retry to compute a path in the next time step.
- PickUp. The agent picks up the specified object at the specified position if its inventory is empty.
- Place. The agent places the specified object from its inventory at the specified position.
- Noop. The agent stays at its current position for a given amount of time steps or until the end of the episode.
- Random. The agent behaves randomly throughout the rest of the episode.
The example above shows a yellow actor continuously moving the yellow ball between the position depicted in the image and position (2,1). By doing so, it blocks the red agent from finishing its task of reaching the green goal square.
...
self.put_obj(Goal(), *goal_pos)
self.put_obj(Ball("yellow"), *ball_pos)
ball = self.grid.get(*ball_pos)
self.grid.horz_wall(2, height - 3)
tasks=[GoTo((width - 4, height - 2)),
PickUpObject(ball_pos, ball),
GoTo((1,1)),
PlaceObject((2, 1), ball),
DoNothing(duration=2),
PickUpObject((2, 1), ball),
GoTo((width - 4, height - 2)),
PlaceObject(ball_pos, ball)]
yellow_adv = self.add_adversary(1, 1, "yellow", direction=1, tasks=tasks, repeating=True)
...
Including New Environments
All of the additional features can directly be implemented in the _gen_grid
method of a MiniGridEnv
.
We recommend giving this short tutorial a read.
It gives a nice overview of how new environments can be added easily to Minigrid and MinigridSafe
After creating your own environment, be sure to add it here and here in order to register it.