Worker Assignment Race Condition investigation

  • Investigate how many times we get that kind of error.
  • Investigate how often we use airtable.default_shard_id_is_used in prod.
  • Draft a message to Keunwoo about the race condition discussed in the thread — there’s a ping request before the application/create request, but for some reason, the getApplicationLiveShardConfigAsync method has get default” shard behavior built-in rather than failing or returning some other message to the callers of the live shard config async.
    • It’ll be helpful at least to clean up the comments, and document why this behavior happens and what we can do to try to solve it.
      • The current comment is out of date: it’s not just called in the create action path. Not just web server either. See the task above.
    • It feels like we’re the owners of this code, so we should probably do something to try to clean it up.
  • Move the investigation that I sent to Keunwoo into its own doc.
  • Note that we also probably want to clean up the NEW_APPLICATIONS_LIVE_SHARD_ID admin flag.

#update 10-03-22: I probably should’ve made this a doc and shared with the broader group rather than just sharing with Keunwoo. If I was doing this today, that’s what I probably would’ve done.

Created from: Airtable diary: 04-15-22 202204151410


uid: 202204151800 tags: #airtable #update


Date
February 22, 2023