Worker Assignment Race Condition investigation
- Investigate how many times we get that kind of error.
-
Investigate how often we use
airtable.default_shard_id_is_used
in prod.- https://elk-applogs.shadowbox.cloud/goto/9b4121754f676d0eac248c283b69e34b
- Not just in the web server — in
createRows
,
-
Draft a message to Keunwoo about the race condition discussed in the thread — there’s a ping request before the application/create request, but for some reason, the getApplicationLiveShardConfigAsync method has get “default” shard behavior built-in rather than failing or returning some other message to the callers of the live shard config async.
- It’ll be helpful at least to clean up the comments, and document why this behavior happens and what we can do to try to solve it.
- The current comment is out of date: it’s not just called in the create action path. Not just web server either. See the task above.
- It feels like we’re the owners of this code, so we should probably do something to try to clean it up.
- It’ll be helpful at least to clean up the comments, and document why this behavior happens and what we can do to try to solve it.
- Move the investigation that I sent to Keunwoo into its own doc.
-
Note that we also probably want to clean up the
NEW_APPLICATIONS_LIVE_SHARD_ID
admin flag.
#update 10-03-22: I probably should’ve made this a doc and shared with the broader group rather than just sharing with Keunwoo. If I was doing this today, that’s what I probably would’ve done.
Created from: Airtable diary: 04-15-22 202204151410
uid: 202204151800 tags: #airtable #update