MinigridSafe

github

Minigrid^Safe is an extension of the Minigrid library. It is aimed at Reinforcement Learning researchers that conduct research with a focus on safety-critical aspects.

Installation

Download the current version of Minigrid^Safe from github:

git clone https://github.com/PrangerStefan/MinigridSafe

Assuming that you are using a virtual environment for python (as described in the installation guide for tempestpy), installing Minigrid^Safe is as easy as:

pip install -e MinigridSafe

Safety-Critical Aspects

Minigrid^Safe is built on top of the popular Farama Minigrid library. In order to foster research of safety-critical aspects in reinforcement learning, we extend the feature set of Minigrid by adding probabilistic behaviour and additional, potentially adversarial, actors.

Uncertain Movement

Minigrid^Safe adds the possibility to incorporate probabilistic movement for agents to a grid world environment. This is based on a different type of “slippery” tiles. If the agent moves onto a slippery tile, its next movement is affected by a probabilistic update, depending on the direction the agent is moving and the “tilt” of the tile. The desired movement will only happen with a probability of p_a.

In the example above, the slippery tiles are tilted towards the north. If the agent moves along the direction of the tilt, it will be uniformly diverted onto an adjacent tile. Moving orthogonal to the tilt of the tile potentially moves the agent into the direction of the tilt. If the agents want to move in the opposite direction, it will fail with a probability of 1 - p_a.

This code snippet shows how a block of slippery tiles of length 3 can be added in the _gen_grid method of an environment class:

def _gen_grid(self, width, height):
    ...
    slippery_north = SlipperyNorth(probability_intended=self.probability_intended, probability_turn_intended=self.probability_turn_intended
    self.grid.horz_wall(2, 1, 3, slippery_north)
    ...

Faulty Actions

An environment where the agent experiences faults in its execution.

A second interesting source of uncertainty are faults in the action execution of the agent. Minigrid^Safe implements this by adapting the step method. Faults can occur multiple times, independent of whether a fault occured in the previous step.

Additional Actors

Minigrid^Safe allows you to add multiple actors to an environment. The behaviour of these actors is defined via lists of tasks that they try to accomplish.

The base class AdversaryEnv implements a modified version of the step method to handle the behaviour of the additional actors. Any environment that adds actors needs to be derived from this environment class:

class AdversaryEnv(MiniGridEnv)

Currently, the task list of an additional actor can consist of the following tasks:

FollowAgent. The actor follows the actor specified, for the specified duration. A duration of 0 means, that the actor will follow the specified actor indefinitely. It will try to crash into the other actor if possible.
GoTo. The actor follows the shortest path (computed via A^*) to the specified destination. If no path to the destination exists, the agent stays idle and will retry to compute a path in the next time step.
PickUp. The agent picks up the specified object at the specified position if its inventory is empty.
Place. The agent places the specified object from its inventory at the specified position.
Noop. The agent stays at its current position for a given amount of time steps or until the end of the episode.
Random. The agent behaves randomly throughout the rest of the episode.

An environment with an additional actor.

The example above shows a yellow actor continuously moving the yellow ball between the position depicted in the image and position (2,1). By doing so, it blocks the red agent from finishing its task of reaching the green goal square.

...
self.put_obj(Goal(), *goal_pos)
self.put_obj(Ball("yellow"), *ball_pos)
ball = self.grid.get(*ball_pos)

self.grid.horz_wall(2, height - 3)
tasks=[GoTo((width - 4, height - 2)),
       PickUpObject(ball_pos, ball),
       GoTo((1,1)),
       PlaceObject((2, 1), ball),
       DoNothing(duration=2),
       PickUpObject((2, 1), ball),
       GoTo((width - 4, height - 2)),
       PlaceObject(ball_pos, ball)]
yellow_adv = self.add_adversary(1, 1, "yellow", direction=1, tasks=tasks, repeating=True)
...

Including New Environments

All of the additional features can directly be implemented in the _gen_grid method of a MiniGridEnv. We recommend giving this short tutorial a read. It gives a nice overview of how new environments can be added easily to Minigrid and Minigrid^Safe

After creating your own environment, be sure to add it here and here in order to register it.