Having completed CS 231N and CS 224N last month, today, I started to begin with Berkeley’s Fall 2017 CS 294 available online. I always regretted not being able to blog my learnings from CS 224N and CS 231N and I plan to do that for sure when I (if) I go through the 2018 iterations of them. But, I didn’t want to miss regular blogging for CS 294, especially because I am finding reinforcement learning to have very interesting applications, especially in robotics. And, after going through the Lecture 1 video, my interest in the field has increased even more and I have started feeling RL as the real AI problem, as mentioned in one of the slides in the lecture.

The lecture begins with class logistics, prerequisites and all such stuff. The very first topic discussed is “What is Reinforcement Learning ?” with the famous ‘dog reward’ case. A “Reinforcement Learning” problem can be modelled as a learner-environment interaction in which learner performs and action (or takes decision) w.r.t. environment, and receives a respense (consequence) from the environment. This learner can be a robot or any other agent. The “response” from the environment falls normally in either of the two categories :- (i) Observation (ii) Reward. The difference can be understood via an example of a Robot. The camera images taken by Robot are observations, whereas a task success measure is like a reward.

The discussion moves on to Deep Reinforcement Learning.

Deep learning: end-to-end training of expressive, multi-layer models.
Deep models are what allow reinforcement learning algorithms to solve complex problems end to end!

I particularly liked the part where end-to-end was explained. Especially via the Video Game example (Slide 17). The example put was, if we have to build a game bot, there is a pipeline process of using the game API, extracting features, planning how to use them, and then, finally using them via low level bot controls etc.. In an end to end process, this whole 'cyclic' task is done in one go, rather than in 2 stages - planning and then applying. The pipeline blocks are replaced by “weight” blocks of the neural network.

Pipeline to Weight
Reference, Credits :- http://rll.berkeley.edu/deeprlcourse/f17docs/lecture_1_introduction.pdf

And then comes the quote which I agree with a lot :-

The reinforcement learning problem is the AI problem!

The session moves forwards with which problems CAN and CAN’T be modelled as a ‘Sequential Decision Making’ problem.

Every Supervised Learning problem can be modelled as a Reinforcement Learning problem, taking rewards as the loss function. But, these become harder to solve in most cases, and thus, aren’t advisable. One case, though, where it is beneficial is in Machine Translation where BLEU score is non-differentiable.

Various ways to learn reward function, such as learning from example (inverse reinforcement learning)(Eg. Learning from others doing a particular thing.), transferring skills between domains (Transfer Learning), prediction (Learning to predict and using prediction to act - Imagining what might happen if a particular axction is taken.) were thrown light upon. Meta Learning - Learning to learn and getting an idea of learning an efficient way to learn was also mentioned. Other methods were imitiation learning and inferring intentions. The example given of Inferring Intentions was something I might not be abe to forget ever. See it at 39:56 of the Lecture Video. I find the exaple related to :-

Child Innocence

Just like a child tried to infer intentions, and innocently tried to help bear, is another way supervision might be performed.

Finally, there were several ‘quotes’ in the lecture, which I found interesting :-

We are accustomed to operating with rewards so sparse, that we experience them once or twice in the lifetime, if at all.

“This quote points to the difficulty in finding a reward function. A reward function might not be something too obvious, but a small thing that we aren’t accustomed to ‘experience’ as a reward.”

Flexibility is the hallmark of intelligence.

“I literally loved this statement.”

Instead of trying to produce a program to simulate the adult mind, why not rather try to produce one which simulates the child’s. - Alan Turing

“Well, child indeed is the best learner. His curiosity and eagerness to gain knowledge from every experience is something adults should also follow.”

The first lecture filled me with thrill, excitement and motivation for the course. I plan to watch a lecture a day and complete assignments the day itself. The assignment for this Lecture was basically just to follow Tensorflow MNIST tutorial available at https://www.tensorflow.org/get_started/mnist/pros. I have done it already and did it today again after watching the lecture.

The takehome image for the lecture for RL is :- Reinforcemnt Learning
Reference, Credits :- http://rll.berkeley.edu/deeprlcourse/f17docs/lecture_1_introduction.pdf

My Lecture Notes :- Lecture Notes 1

Hoping to gain a lot of knowledge from the course and share with you all !

Thanks for reading. :)