Main upgrade follow-ups

when the client opens up a new application, it tries to fetch some scaffolding elements
- refresh homepage after bring back main primary (but still falling back to the live traffic replica)
see some crashes on web, worker service, SRS etc when primary is upgrading:
- dashboard: https://app.datadoghq.com/dashboard/gxd-6k3-j8f/service-crashes?from_ts=1675486244533&to_ts=1675495606808&live=false
  - web server: async gap issue, same as hook://slack/?workspace=Airtable&channel=incident-409-crud-request-processing-anomaly
    - Fix PR I can use: https://github.com/Hyperbase/hyperbase/pull/56759
  - worker child
    - Known change hooks error related to adding formula columns to a very large errors. (LMDB related) https://opensearch-applogs.shadowbox.cloud/_dashboards/goto/e86a40e1cd0284eed6d95a1e244ea898?security_tenant=global
      - more LMDB related errors: we need to make these resilient in general. https://opensearch-applogs.shadowbox.cloud/_dashboards/goto/92395e874dbb14b295b0ad2fb4dbc14c?security_tenant=global
    - change hooks callsite that isn’t resilient, but is currently marked as so.
      - https://github.com/Hyperbase/hyperbase/blob/782e331c15af3683877fbcca62ee58e9f1ff02e7/worker_service/internal/change_hook_requester.tsx#L38-L40
      - We just need to make this resilient.
      - It’s in a fire and forget, which means it gets handled by the parent’s error domain handler (worker origin crud requester), which doesn’t do anything for safe to keep process alive errors https://github.com/Hyperbase/hyperbase/blob/d71c7f6f0f905ee79a1ac6fa34e0ecd6afe41e44/worker_service/internal/external_table_sync/sync_with_external_data/sync_with_external_data_noop_sync_helpers.tsx#L91
        
        Instead of the error handler callback! https://github.com/Hyperbase/hyperbase/blob/d71c7f6f0f905ee79a1ac6fa34e0ecd6afe41e44/worker_service/internal/worker_origin_crud_requester.tsx#L2048-L2065
    - transaction was canceled and connection was terminated. After this a query ran against the terminated connection: https://opensearch-applogs.shadowbox.cloud/_dashboards/goto/5636bd1b07cacbdc87519ba1cb1c6e7b?security_tenant=global
      - bug: there should be an await in this line: https://github.com/Hyperbase/hyperbase/blob/d71c7f6f0f905ee79a1ac6fa34e0ecd6afe41e44/worker_service/json_serializers/app_json_to_db_serializer.tsx#L5868
    - app json to db serializer shard assignments query needs to be made resilient: https://opensearch-applogs.shadowbox.cloud/_dashboards/goto/2d50728bf4044b09ad8ff1fadfc5459f?security_tenant=global
      - don’t understand where this thread pool call is coming from, but I know this improvement needs to be made regardless.
      - my intuition is that this is because we are updating the sync status in a fire and forget.
    - updateUserContentState: write ENOBUFS.
      - https://opensearch-applogs.shadowbox.cloud/_dashboards/goto/417c52883d68dd233a96f003b5da99ff?security_tenant=global
    - Connection lost
Why is there a connection spike?
- So many slow live traffic replica queries, so the multiplexer in proxysql is forced to grab a new connection (because existing connections are held up by the slow queries)

Created from: Journal entry: 02-06-23 202302060801

uid: 202302061338 tags: #inbox

Date

February 22, 2023

Up next

Brass Birmingham Strategy Try to lay as few canals down as possible in the canal era. The less you lay canals, the more actions you have to develop for more points and

Previously

# Node.js multithreading: Worker threads and why they matter - LogRocket Blog source: https://ift.tt/JRySq7t tags: #literature #programming #software-engineering #airtable #javascript uid: 202301260141 — The need for threads