Airtable formulas interview

  1. Have to define what formulas look like
    • How are formulas (specifically operators) defined
      • How are we delineating column names from other types of objects in the formula
      • ([A] + [B])
      • =SUM(A, B)
      • Whether we will allow both kinds of operators (prefix vs infix)
        • Design decision: allow both kinds of operators
    • What kinds of inputs/outputs can formulas have
      • Do we want to enforce types (raise an error if you’re applying an operation on forbidden types, or fail silently)
  2. How to actually evaluate formulas
    • Storing some internal mapping from operator in names in Airtable to the programmatic operator being applied
      • SUM -> operator.add
    • =SUM([A], 2)”
      • Parsing step:
        1. Extract all function compositions with functions and corresponding inputs from the formula string
        class Function:
            def 
            operator: # type: Callable
            Inputs: List[Union[Function, Literal]]
        Function(sum, [Function(multiply, 3, 3) , 2])
        Def eval(input: Input): If input is Literal: Return input For input in inputs: evaluated_input = eval()
      • How will we extract useful information from this formula string
        • Extracting the function name
        • Detecting whether we’re referencing an existing column, or just using a literal
          • Substituting value from existing column before you can proceed with the rest of the computation

Formula for Column A depends on Column B, and A also references C

Circular dependency - Column A depends on Column B, and formulas for Column B depends on Column A

Formula A: Sum(1, 2) + Sum(B, C)

Sum(Sum(1, 2), Sum(B, C))

  • At some point in the evaluation (maybe at the innermost layer), we need to substitute in values from other columns

  • uid: 202011181541 tags: #airtable #interviews

February 22, 2023

Print Out First Section

Asymptotics

  • O(1) -> O(log N) -> O(N) -> O(N log N) -> O(N^2) -> O(2^N) -> O(3^N) -> O(N^N)
  • all logarithmic are θ(log n) by change of base rule (differ by only a constant)
  • Asymptotic notation only cares about highest-growing” terms. For example, O(n^2+n) = THeta(n^2)
  • Asymptotic notation does not care about leading constants.
  • Any exponential with base >1 grows faster than any polynomial
  • The base of the exponential matters. For example, 3n= O(4n), but 3n != Ω(4n).

image-20200903014008287image-20200903014008287

Divide and Conquer

these problems depend on breaking up a bigger problem into smaller problems that are completely independent of each other types of divide and conquer top down”, usually recursive example: binary search

bin_search(arr, x):

If Len(arr) == 0:

Return false

Mid <- arr[len(arr) / 2)]

If mid == x:

Return mid

Elif mid < x:

Return bin_search(arr[0:mid], x)

Else:

Return bin_search(arr[mid:len(arr)], x)

do work that influences your split break up search space (binary search) Work is done as you go down the tree

– General structure of a recursive algorithm:

∗ Split the problem up.

∗ Make a recursive calls to problems of size n/b

∗ Combine the results. (In time O(n^d) )

bottom up” example: merge sort (iterative) need to do something with all the problems however there are much more possibilities when the problem is bigger. So, to alleviate work, break into small problems and combine the small problems in an intelligent fashion usually using some property that you have created via your work (in merge sort this invariant is the array is sorted)

in this type: work is done as you come up the tree (you merge the sorted subproblems together)

Proof techniques

  • Invariants and induction are pretty similar.

Loop invariant proofs: One possible scheme: prove an invariant is true for all iterations 1 Initialization: the invariant(s) is true prior to the first iteration of the loop 2 Maintenance: if the invariant is true for iteration n, it is true for iteration n+1 3 Termination: when the loop terminates, the invariant is true on the entire input For bubblesort, the invariant is At iteration i, the sub-array A[1..i] is sorted and any element in A[i+1..A.size] is greater or equalt to any element in A[1..i]”


uid: 202009022351 tags: #teaching

February 22, 2023

First Section Script

Hi everyone! It’s 2:10 PST so I think we’ll get started. Hope all of you are doing well, staying safe and sane. Welcome to CS 170 discussion! My name is Manan Khattar, and for a little bit about me,

Classes for asymptotic notation

1 < log2 n < n < n log2 n < n2 < n3 < … < 2 n < n! < nn

Also have asymptotic tricks In general, we can observe the following properties ofO/Θ/Ω from this:

  • ˆIfd > c,nc=O(nd), butnc6= Ω(nd) (this is sort of saying thatndgrows strictly more thannc).ˆ
  • Asymptotic notation only cares about highest-growing” terms. For example,n2+n= Ω(n2).ˆ
  • Asymptotic notation does not care about leading constants.
  • For example 50n= Θ(n).ˆ
  • Any exponential with base>1 grows more than any polynomialˆThe base of the exponential matters. For example, 3n=O(4n), but 3n6= Ω(4n).ˆ
  • Ifd > c,nclogn=O(nd).

Divide and Conquer problems:

advice for these problems figure out what the problem is asking as that is usually the what your function returns think about what information is really necessary in order to solve what your function returns

think about how to make a biggest problem either split so you can ignore some portion of the original search space, or do work so that useful invariants are introduced to the smaller problems and can be combined on the way up

Proof techniques

  • Invariants and induction are pretty similar.

Loop invariant proofs: One possible scheme: prove an invariant is true for all iterations 1 Initialization: the invariant(s) is true prior to the first iteration of the loop 2 Maintenance: if the invariant is true for iteration n, it is true for iteration n+1 3 Termination: when the loop terminates, the invariant is true on the entire input For bubblesort, the invariant is At iteration i, the sub-array A[1..i] is sorted and any element in A[i+1..A.size] is greater or equalt to any element in A[1..i]”

Created from: First Section Preparation 202008301313

  • Have a zoom poll at the beginning and end (beginning about what topics I want to emphasize, Manan take the wheel), end about my pace and how fast I was going through things

3. Find the Valley

We’re using a top-down approach

Things people could be confused about

  1. Asymptotic notation

  2. Recurrence relations

  3. Divide and conquer problems

  4. Algorithm design/proof techniques

    • For this one, use the merge sort pseudocode example

      uid: 202008311933 tags: #teaching

February 22, 2023

Establishing Vs. Executing

I’ve done more than enough of establishing routines and new techniques for myself this quarantine—I’ve had vast amounts of self-growth, and I now know so much more and process information totally differently to how I did before. At the same time, there comes a moment in life when you feel like you’re relying on these systems as a crutch, giving yourself small wins in the name of getting better at the system rather than confronting your shortcomings and facing up to the fact that you haven’t been executing at the level that you should be.

I think I had such a moment today (yesterday). I was feeling uninspired and unable to control my baser instincts (I ended up playing hearthstone for something like 3 hours of my workday). I put my phone in the cupboard for the whole day, so I told myself it was enough and it would be fine if I skimped out on other things like allowing myself to play Hearthstone (which were significantly more important than the physical act of putting my phone in the cupboard - the only reason why the physical act is even relevant is because it signifies the act of putting your distractions away for that time period.)

I’m done giving excuses for myself. I’ve had 4 months to establish a system for myself - I’m very, very used to it now. If I’m not delivering on the things that I should be, it’s nobody’s fault but my own - system or not, sometimes you just need to do things, and I haven’t been doing the best job of that recently.

In summary, I’m not going to blame any kind of system when I shirk my commitments or when my productivity otherwise takes a dip - I just have to live within the framework I’ve created for myself and promise myself that I’m doing the absolutely best I can. The rest isn’t worth considering.


uid: 202007221725 tags: #insights #living-well

February 22, 2023

Daily review 07-18-20

How did your goals turn out today?

Be organized.

Hmm. Wasn’t too organized specifically.

Do my best on all my commitments.

Be practically independent.

Turned down an opportunity to make bhindi, so I didn’t do everything I could have in terms of this. However, I really wanted to work at the moment, and wasn’t feeling too talkative in terms of that.

Grow grow grow.

Actually built out a pretty good Mac app!Figured out Rumps, and also solidified my python virtualenv setup (it’s simple, it’s crazy). I really really like it.

Develop my social network.

Be a good son/brother/boyfriend/friend

Played Decrypto with fam. Things got kind of intense, I think Trisha was annoyed at us being competitive? Or trying to show that we deserved to win? Don’t know why she was getting annoyed. Maybe she wasn’t in the best mood. I’m not sure.

Contribute to the world

I’m going to put the app out once it works properly (with the sync and stuff). I’m really excited for it.

Be more forward-thinking

I played around with the Things.sh script! I realized that I don’t really need it, and it was kind of weird with how it worked anyway. Also decided I’m not going to go with compline.

Be more disciplined

I woke up early! And got a good amount of work done early in the morning. Also took a really important note about not fucking around after meetings 202007172359 that I think is going to go a long way in increasing my productivity. Oh, also the app I made! That’s definitely going to increase my productivity as well.

Be physically healthy.

Didn’t do too much exercise today, but I’m going to play cricket tomorrow and should get a good workout in because of that.

Any other thoughts?


uid: 202007182338 tags: #journal

February 22, 2023

Database Report Check-In

At a high level, the underlying database behind the data platform should fulfill the following criteria:

  1. High performance - we want to have the ability to query multiple vendor codes (or multiple attributes in general) and get a timely response. Currently, in Tableau, such a query results in tens of seconds of delays, and often crashes the Tableau dashboard.
  2. Effective sorting and flexibility - the data needs to be sortable efficiently, whether that is through a user-provided sorting pattern or a predefined default one. The user needs to be able to play with the data and ultimately be presented with only relevant information.
  3. Data needs to be downloadable as an Excel sheet/CSV for further analysis.
  4. It should be as convenient as possible to migrate and upload the data from S3 and Redshift.
  5. The provided solution should be as cost-effective as possible, while not sacrificing performance quality.

At the most basic level, we want to be able to pass in a vendor code and time range, and show the relevant ASIN-level data as can be seen in this Tableau dashboard. Signature ASIN Deep Dive

Based on these 5 main metrics, we present a series of analyses for each potential database tool, as well as the predicted database schema for each.

AWS DynamoDB

DynamoDB is a non-relational, low-latency, high-scale database. Data in DynamoDB is structured in the form of key-value pairs, which leads to massive throughput and scalability. However, special care must be taken to efficiently structure the data, because outside of partition/sort key queries, queries in DynamoDB can be expensive and slow. Therefore, if we pursue this option, we must design our schema specifically to make the most common and important queries as fast and as inexpensive as possible.

Because of the lack of advanced querying functionality in DynamoDB, there are some apparent concerns with this option in potentially using it towards the data platform. Some of these include:

  1. Lack of flexibility. In DynamoDB, results can only be sorted by range keys in indexes. That’s all nice and good, except if we decide at some point in time later that we want to sort by another field. This would require adding a new index. The problem with this is that local indexes can only be added at creation time of the table! Global indexes can be added after the fact but are no longer free — we would have to pay the same cost as for another table! Therefore, DynamoDB would not be an effective solution if we want to dynamically query the dashboard based on certain input data.
  2. Difficulty with date range parsing. While DynamoDB would allow us to filter based on a provided input date range (see this article https://medium.com/cloud-native-the-gathering/querying-dynamodb-by-date-range-899b751a6ef2 for more), it requires scanning across the entire GSI (global secondary index), adding additional time complexity and creating more overhead.
  3. Higher cost. In order to effectively query across the host of attributes that the Tableau dashboard currently supports, we would have to create at least 5-10 GSIs per table, which would have to concurrently updated every time data was added to the DynamoDB. This would incur both time and financial cost.
  4. DynamoDB also struggles when the data size is greater than 1GB, because it cannot all fit in one partition, it cannot all fit in one page, so the read-write will be very slow.

Summary

In general, it is not the best idea to use DynamoDB:

  1. When multi-item or cross table transactions are required
  2. When complex queries and joins are required
  3. When real-time analytics on historic data is required

For our use case, we require both 2 and 3. Therefore, DynamoDB does not present as the most effective solution for our use case.

As an aside, there exists an additional tool, Rockset, which presents an analytics-centric SQL layer on top of the No-SQL DynamoDB layer, letting us run SQL analytics on the DynamoDB data without any kind of ETL. This might be worth looking more into.

Aurora

Amazon Aurora (Aurora) is a fully managed relational database engine that’s compatible with MySQL and PostgreSQL.

Aurora includes a high-performance storage subsystem. Its MySQL- and PostgreSQL-compatible database engines are customized to take advantage of that fast distributed storage. The underlying storage grows automatically as needed, up to 64 tebibytes (TiB). Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.

With Aurora, the resulting code would be slightly less scalable, and would require some more additional setup. In RDBMS, you design for flexibility without worrying about implementation details or performance. Query optimization generally doesn’t affect schema design, but normalization is important. However, the resulting code is likely to be more maintainable, and we would feel much more confident in our ability to handle new query requirements in the future.

AWS Athena

Athena is useful for presenting analytics on top of S3 data. Athena uses Presto with ANSI SQL support and works with a variety of standard data formats, including CSV, JSON, ORC, Avro, and Parquet. Athena is ideal for quick, ad-hoc querying but it can also handle complex analysis, including large joins, window functions, and arrays.

Athena charges for the amount of data scanned during query execution. $5 is charged for a TeraByte of data scanned. Scanned data is rounded off to the nearest 10 MB. There is no charge for DDL, Managing Partitions, and Failed Queries.

The core difference between Athena and Aurora is that Athena is a cold database, while Aurora is warm (less latency). Given that this database layer is sort of a cache of the underlying Redshift table, a warm database might be more preferable.


uid: 202007090046 tags: #amazon

February 22, 2023