Category Archive Deep Reinforcement Learning

Advantage function in Deep Reinforcement learning

Deep reinforcement learning involves building a deep learning model which enables function approximation between the input features and future discounted rewards values also called Q values. We have seen how we can effectively get these q values and create a map consisting of input features and corresponding set of q values in this article.

This map of input features and all possible q values at a given state enables the Reinforcement learning agent get an overall picture of environment which further helps the agent in choosing the optimal path.

Read the rest of the article at Mindboard’s Medium channel.

Input Window Size for Deep Recurrent Reinforcement Learning

Deep Recurrent Reinforcement Learning makes use of a Recurrent Neural Network (RNN), such as Long Short-Term Memory (LSTM) or Gated Recurrent Unit (GRU) based networks, for learning a value function that maps environment states to action values. Recurrent Neural Networks are useful for modeling time-series data since the network maintains a memory, learning to retain useful information from inputs of prior model inferences. Every time the model is called, the memory is updated in correspondence with the current inputs.

Read the rest of the article at Mindboard’s Medium channel.

Scaling Reward Values for Improved Deep Reinforcement Learning

Deep Reinforcement Learning involves using a neural network as a universal function approximator to learn a value function that maps state-action pairs to their expected future reward given a particular reward function. This can be done many different ways. For example, a Monte Carlo based algorithm will observe total rewards following state-action pairs from a complete episode to make build training data for the neural network. Alternatively, a Temporal Difference approach would use incremental rewards from single time-steps and bootstrap off of predicted future rewards from the latest version of the value function model. However, no matter what approach is taken, it is important that the neural network is being efficiently fitted to the data in order to optimize the learning algorithm. There are many factors that determine a neural networks ability to fit to training data. In this post we will examine how scaling our outputs can affect our rate of convergence.

Read the rest of the article at Mindboard’s Medium channel.

Q Matrix Update to train Deep Recurrent Q Networks More Effectively

Deep Recurrent Q network, as discussed in previous article, can be very helpful in building smart agents that remember their learning from distant past. This feature makes a Deep Recurrent Q network a valuable function approximator in building AI agents for Deep Reinforcement Learning.

Read the rest of the article at Mindboard’s Medium channel.