It’s been almost 25 years since a computer has beaten chess world champion Garry Kasparov at his own game. Ever since, computers have been winning decisive victories against human players. But how does one teach a computer to elevate its skills to such a high level? One possibility is called Reinforcement Learning. And the good news is, this does not only work for chess. We can also train computers to find the best strategies for optimising logistics operations.

How to train the system: Build Artificial Intelligence

The most common model used in Reinforcement Learning (RL) is based on the so-called Markov Decision Process (MDP). This model consists of

  • a state (a position on a chess board)
  • an action (move pawn from E2 to E4)
  • rewards (positive if the game is won, but also negative if the game is lost)
  • transition probabilities (if states are influenced by uncertainty – e. g. in Backgammon with rolling dice).

First, the MDP with all possible states and their associated actions, transition probabilities and rewards is modelled. Then, the training of the algorithm can begin.

Training consists of exploring all activities. Mainly it follows the best path currently available according to previous learning. Sometimes a random step is induced to make the training more explorative. Each action/state pair gets assigned a value based on its cumulated reward – the reward being either positive or negative. Basically, this represents how good it is to be in this state. These values are saved and later, when the algorithm is confronted with one of the states, it looks up the best action to take.

Reinforcement Learning can be implemented for a variety of problems as the method itself is quite abstract. Thus, it’s also interesting for redPILOT.

Challenges of in-day management

Currently we are researching how RL can increase planning efficiency.  A shift leader has a series of decisions to take in his day-to-day business. For example, it may be necessary to reschedule operators to other activities or an operator needs to be sent home early to compensate overtime. The shift leader has to make these decisions based on the current status of the system, which is represented by operator allocation, throughput and orders. However, if a decision was right or wrong, can only be evaluated at the end of the shift.

Optimise logistics operations

This is where Reinforcement Learning comes into play. A trained algorithm could verify the impact through comparing actions, states and rewards against learned values. It would give a calculated recommendation based on which action delivers the highest reward. In doing so, it can support managers by offering real decision support for their daily challenges. The result will be a more efficient daily operation because the actions supported by the RL algorithm directly influence the ongoing work. This will, in turn, reduce overall cost.