Can we make it work without more data?

Dagger addresses the problem of distributional drift
But what if our model is so good that it doesn’t drift?
Need to mimic expert behavior very accurately.
But don’t overfit!

Why might our model not be good in the first place?

Even if the observation is Markovian, the human’s behavior might not be Markovian
Human behavior might be multimodal (select from multiple different modes in a distribution)

If your setup is non-Markovian, you can use the entire history, by taking some kind of RNN encoder, converting the image to an RNN state, and then use multiple images (corresponding to previous time steps) as a way to convert the problem to an RNN.

uid: 202008311500 tags: #cs285

Date

February 22, 2023

Up next

[LEC] CS 375 (Teaching Panel) Making sure that if students are asking questions, making sure that they’re not relegated to chat, or only said verbally. Giving a lot of redundancy

Previously

Weekly casual checkpoint Prev meeting: Weekly casual checkpoint 202007211130 Status on the ElasticSearch migration What I asked him about on Friday - apparently there is