Reinforcement Learning I: OpenAI Gym Environment

This tutorial will introduce you to FFAI’s implementations of the Open AI Gym interface that will allow for easy integration of reinforcement learning algorithms.

You can run examples/gym.py to se a random agent play Blood Bowl through the FFAI Gym environment. The rendering is simplified for faster execution and looks like this: FFAI Gym GUI

examples/gym.py demonstrated how you can run multiple instance of the environment in parallel. Notice, that the render() function doesn’t work across multiple processes. Instead a custom renderer is used in this example.

Agents receive numerical observations from the FFAI environment at every step and sends back and action with an action type and in some cases a position. Along with the observations, the environment also sends a scalar reward value to the agent. We will describe the structure of the three components: observations, actions, and rewards.

Observations

An observation object is a dictionary containing four differet parts:

  1. ‘board’: a list of two-dimensional feature leayers describing the board state.
  2. ‘state’: a vector of normalized values (e.g. turn number, half, scores, etc.) describing the non-spatial game state.
  3. ‘procedures’: a one-hot vector describing which of the 16 procedures the game is currently in.
  4. ‘available-action-types’: a one-hot vector describing which actions types that are available.

Observation: ‘board’

The default feature layers in obs[‘board’] are:

  1. OccupiedLayer()
  2. OwnPlayerLayer()
  3. OppPlayerLayer()
  4. OwnTackleZoneLayer()
  5. OppTackleZoneLayer()
  6. UpLayer()
  7. StunnedLayer()
  8. UsedLayer()
  9. AvailablePlayerLayer()
  10. AvailablePositionLayer()
  11. RollProbabilityLayer()
  12. BlockDiceLayer()
  13. ActivePlayerLayer()
  14. TargetPlayerLayer()
  15. MALayer()
  16. STLayer()
  17. AGLayer()
  18. AVLayer()
  19. MovemenLeftLayer()
  20. BallLayer()
  21. OwnHalfLayer()
  22. OwnTouchdownLayer()
  23. OppTouchdownLayer()
  24. SkillLayer(Skill.BLOCK)
  25. SkillLayer(Skill.DODGE)
  26. SkillLayer(Skill.SURE_HANDS)
  27. SkillLayer(Skill.CATCH)
  28. SkillLayer(Skill.PASS)

A layer is a 2-D array of scalars in [0,1] with the size of the board including crowd padding. Some layers have binary values, e.g. indicating whether a square is occupied by player (OccupiedLayer()), a standing player (UpLayer()), or a player with the Block skill (SkillLayer(Skill.BLOCK)). Other layers contain normalized values such as OwnTackleZoneLayer() that represents the number of frendly tackle zones squares are covered by divided by 8, or MALayer() where the values are equal to the movement allowence of players divided by 10.

FfAI environments have the above 45 layers by defaults. Custom layers can, however, be implemented by implementing the FeatureLayer:

from ffai.ai import FeatureLayer
class MyCustomLayer(FeatureLayer):

    def produce(self, game):
        out = np.zeros((game.arena.height, game.arena.width))
        for y in range(len(game.state.pitch.board)):
            for x in range(len(game.state.pitch.board[0])):
                player = game.state.pitch.board[y][x]
                out[y][x] = 1.0 if player is not None and player.role.cost > 80000 else 0.0
        return out

    def name(self):
        return "expensive players"

Layers can then be added to an environment like this this:

env.layers.append(MyCustomLayer())

To visualize the feature layers, use the feature_layers option when calling render():

env.render(feature_layers=True)

FFAI Gym Feature Layers

Observation: ‘state’

The ‘state’ part of the observation contains normailized values for folliwng 50 features:

  1. ‘half’
  2. ‘round’
  3. ‘is sweltering heat’
  4. ‘is very sunny’
  5. ‘is nice’
  6. ‘is pouring rain’
  7. ‘is blizzard’
  8. ‘is own turn’available_positions
  9. ‘is kicking first half’
  10. ‘is kicking this drive’
  11. ‘own reserves’
  12. ‘own kods’
  13. ‘own casualites’
  14. ‘opp reserves’
  15. ‘opp kods’
  16. ‘opp casualties’
  17. ‘own score’
  18. ‘own turns’
  19. ‘own starting rerolls’
  20. ‘own rerolls left’
  21. ‘own ass coaches’
  22. ‘own cheerleaders’
  23. ‘own bribes’
  24. ‘own babes’
  25. ‘own apothecary available’
  26. ‘own reroll available’
  27. ‘own fame’
  28. ‘opp score’
  29. ‘opp turns’
  30. ‘opp starting rerolls’
  31. ‘opp rerolls left’
  32. ‘opp ass coaches’
  33. ‘opp cheerleaders’
  34. ‘opp bribes’
  35. ‘opp babes’
  36. ‘opp apothecary available’
  37. ‘opp reroll available’
  38. ‘opp fame’
  39. ‘is blitz available’
  40. ‘is pass available’
  41. ‘is handoff available’
  42. ‘is foul available’
  43. ‘is blitz’
  44. ‘is quick snap’
  45. ‘is move action’
  46. ‘is block action’
  47. ‘is blitz action’
  48. ‘is pass action’
  49. ‘is handoff action’
  50. ‘is foul action’

Some values are boolean, either 0 or 1, while others are normalized.

Observation: ‘procedure’

The 19 procedures represented in the one-hot vector obs[‘procedure’] are:

  1. StartGame
  2. CoinTossFlip
  3. CoinTossKickReceive
  4. Setup
  5. PlaceBall
  6. HighKick
  7. Touchback
  8. Turn
  9. PlayerAction
  10. Block
  11. Push
  12. FollowUp
  13. Apothecary
  14. PassAction
  15. Catch
  16. Interception
  17. GFI
  18. Dodge
  19. Pickup

Action Types

Actions consists of 31 action types. Some action types, denoted by <position> also requires an x and y-coordinate.

  1. ActionType.START_GAME
  2. ActionType.HEADS
  3. ActionType.TAILS
  4. ActionType.KICK
  5. ActionType.RECEIVE
  6. ActionType.END_PLAYER_TURN
  7. ActionType.USE_REROLL
  8. ActionType.DONT_USE_REROLL
  9. ActionType.END_TURN
  10. ActionType.STAND_UP
  11. ActionType.SELECT_ATTACKER_DOWN
  12. ActionType.SELECT_BOTH_DOWN
  13. ActionType.SELECT_PUSH
  14. ActionType.SELECT_DEFENDER_STUMBLES
  15. ActionType.SELECT_DEFENDER_DOWN
  16. ActionType.SELECT_NONE
  17. ActionType.PLACE_PLAYER<Position>
  18. ActionType.PLACE_BALL<Position>
  19. ActionType.PUSH<Position>
  20. ActionType.FOLLOW_UP<Position>
  21. ActionType.SELECT_PLAYER<Position> (position of the player)
  22. ActionType.MOVE<Position>
  23. ActionType.BLOCK<Position>
  24. ActionType.PASS<Position>
  25. ActionType.FOUL<Position>
  26. ActionType.HANDOFF<Position>
  27. ActionType.LEAP
  28. ActionType.START_MOVE<Position> (position of the player)
  29. ActionType.START_BLOCK<Position> (position of the player)
  30. ActionType.START_BLITZ<Position> (position of the player)
  31. ActionType.START_PASS<Position> (position of the player)
  32. ActionType.START_FOUL<Position> (position of the player)
  33. ActionType.START_HANDOFF<Position> (position of the player)
  34. ActionType.USE_SKILL
  35. ActionType.DONT_USE_SKILL
  36. ActionType.SETUP_FORMATION_WEDGE
  37. ActionType.SETUP_FORMATION_LINE
  38. ActionType.SETUP_FORMATION_SPREAD
  39. ActionType.SETUP_FORMATION_ZONE

Observation: ‘procedure’

The ‘procedure’ part of the observation contains a one-hot vector with 16 values representing which procedures the game is in:

  1. StartGame
  2. CoinTossFlip
  3. CoinTossKickReceive
  4. Setup
  5. PlaceBall
  6. HighKick
  7. Touchback
  8. Turn
  9. PlayerAction
  10. Block
  11. Push
  12. FollowUp
  13. Apothecary
  14. PassAction
  15. Interception
  16. Reroll

Observation: ‘available-action-types’

The ‘available-action-types’ part of the observation contains a one-hot vector describing which action types that are currently available.

  1. ActionType.START_GAME
  2. ActionType.HEADS
  3. ActionType.TAILS
  4. ActionType.KICK
  5. ActionType.RECEIVE
  6. ActionType.END_PLAYER_TURN
  7. ActionType.USE_REROLL
  8. ActionType.DONT_USE_REROLL
  9. ActionType.END_TURN
  10. ActionType.STAND_UP
  11. ActionType.SELECT_ATTACKER_DOWN
  12. ActionType.SELECT_BOTH_DOWN
  13. ActionType.SELECT_PUSH
  14. ActionType.SELECT_DEFENDER_STUMBLES
  15. ActionType.SELECT_DEFENDER_DOWN
  16. ActionType.SELECT_NONE
  17. ActionType.PLACE_PLAYER <Position>
  18. ActionType.PLACE_BALL <Position>
  19. ActionType.PUSH <Position>
  20. ActionType.FOLLOW_UP <Position>
  21. ActionType.SELECT_PLAYER <Position>
  22. ActionType.MOVE <Position>
  23. ActionType.BLOCK <Position>
  24. ActionType.PASS <Position>
  25. ActionType.FOUL <Position>
  26. ActionType.HANDOFF`<Position>
  27. ActionType.LEAP`<Position>
  28. ActionType.STAB <Position>
  29. ActionType.START_MOVE <Position>
  30. ActionType.START_BLOCK <Position>
  31. ActionType.START_BLITZ <Position>
  32. ActionType.START_PASS <Position>
  33. ActionType.START_FOUL <Position>
  34. ActionType.START_HANDOFF <Position>
  35. ActionType.USE_SKILL
  36. ActionType.DONT_USE_SKILL
  37. ActionType.SETUP_FORMATION_WEDGE
  38. ActionType.SETUP_FORMATION_LINE
  39. ActionType.SETUP_FORMATION_SPREAD
  40. ActionType.SETUP_FORMATION_ZONE

Actions

To take an action, the step function must be called with an Action instance that contains an action type and a position if needed. See the list above whether an actions needs a position. Actions are instantiated and used like this:

action = {
    'action-type': 26,
    'x': 8,
    'y': 6
}
obs, reward, done, info = env.step(action)

You can always check if an action type is available using env.available_action_types() and for positions available_positions(action_type). The same information is available through obs['available-action-types'] and obs['board']['<action_type> positions'] where <action_type> e.g. could be move.

Rewards and Info

The default reward function only rewards for a win, draw or loss 1/0/-1. However, the info object returned by the step function contains useful information for reward shaping:

'cas_inflicted': {int},
'opp_cas_inflicted': {int},
'touchdowns': {int},
'opp_touchdowns': {int},
'half': {int},
'round': {int},
'ball_progression': {int}

These values are commulative, such that ‘cas_inflicted’ refers to the total number of casualties inflicted by the team in the game. Another way to detect events is looking at env.game.state.reports.

Environments

FFAI comes with five environments with various difficulty:

A rendering of __FFAI-3-v2__.

Explore the Observation Space

Try running examples/gym.py while debugging in your favorite IDE (e.g. PyCharm). Set a break point in the line where the step function is called and investigate the obs object. If you run with the rendering enabled it is easier to analyze the values in the feature layers.

In the next tutorial, we will start developing a reinforcement learning agent.