Can we make it work without more data?
- Dagger addresses the problem of distributional drift
- But what if our model is so good that it doesn’t drift?
- Need to mimic expert behavior very accurately.
- But don’t overfit!
Why might our model not be good in the first place?
- Even if the observation is Markovian, the human’s behavior might not be Markovian
- Human behavior might be multimodal (select from multiple different modes in a distribution)
If your setup is non-Markovian, you can use the entire history, by taking some kind of RNN encoder, converting the image to an RNN state, and then use multiple images (corresponding to previous time steps) as a way to convert the problem to an RNN.
uid: 202008311500 tags: #cs285