Docker
Tempest in Action
In case you want to try out tempestpy and MinigridSafe in action, without installing it locally on your machine, we recommend you to try out “Tempest in Action”, available through this repository.
If you have docker installed on your machine, you can go ahead and execute the following code to build an image. If you do not have docker installed on your machine yet, please go to the installation guide and install it on your machine.
git clone https://git.pranger.xyz/sp/Tempest_in_Action
cd Tempest_in_Action
sudo docker build -t tempest_in_action .
This will build the docker image tempest_in_action
.
The image contains all the necessary binaries to automate the workflow for shielded RL training in MinigridSafe
environments, all with a convenient jupyter notebook frontend.
Jupyter Notebooks
Start the container using the docker_run_jupyter.sh
script.
Within the repository, we have prepared several notebooks for you to play around with.
HelloLavaGap
A very simplistic environment, showcasing the typical workflow. You can find the python code of the notebook here.
The main method of this script declares a safety specification that states that the agent should not visit any lava state, a shield value used for training, and a shield comparison type of absolute.
After instantiating the MinigridSafe environment, shields for different values are created. Since this is a fully deterministic environment, only the shield with a safety threshold of 1.0 is of interest.
Next, the symbolic model and a visual representation of the shield is printed. A shield ensuring safety in this environemnt, blocks the agent from entering a lava state:
A red triangle means that the agent is not allowed to move forward on the adjacent tile. Note, that since the model does not include actions that would make the agent move into a wall, the shield does not allow forward movement in these states.
You can find a the MDP model of this simple environment with annotations for unsafe states in the repository.
SlipperyCliff
The slippery cliff environment models the task of reaching the goal in the shortest possible way, while staying safe. When traversing the blue slippery tiles, tilted towards the lava, the agent only has a chance of 0.9 to move to its desired adjacent tile. Otherwise the agent slips in the direction of the lava pool, depending on its cardinal direction.
A shield that ensures safety with a probability of 0.99 does not allow the agent to move too close to the lava. If the agent does slip into an undesired state, the shield only allows moving away from the lava.
Faulty Actions
This last environment shows the effect of shielding when the agents movement is affected by random faults. With a probability of 0.01 the last executed action will be executed again, overwriting the action chosen by the agent.
Note, that the omnidirectional agent can only move forward in cardinal directions. A shield that ensures safety with 0.999 probability will therefore block movement in the direction of any lava tile, as soon as the agent is too close:
A shield that tries to ensure safety almost surely cannot be synthesized without effectively deadlocking the agent. Any movement towards a lava tile implies a non-negative probability of causing a safety violation.
Custom Environments in the Jupyter Notebook
Before the notebook server is being started in the container, the MinigridSafe library is mounted from the local notebooks directory, enabling you to play around with existing environments and add your own custom environments.
A good start for this is the Playground
notebook and the according environment.
You can find the environment class file at environments/Minigrid/minigrid/envs/Playground.py
.