# Why We Need More Boutique Search Engines

source: https://sariazout.mirror.xyz/7gSSTJ96SEyvXeljymglO3zN4H6DCgVnrNZq8_2NX1A tags: #literature #relational-thinking #favorites #inspiration #blog-ideas #startup-ideas uid: 202301292238

NOTE: This has been the most inspirational product idea-related post I’ve read in a long time. If this company decided to pivot to something other than hiring, or someone else decided to work on this problem, I’d join them in an instant. Maybe that means that this is something that I should be working on instead. Maybe someday when I decide I want to make my own startup. #self-reflection Update 02-20-23: Maybe the reason why this captured my imagination so much is because it reflects some of my values? 202301292249. Specifically, Reduce inefficiencies” 202301292317 and Measure, track, and pace yourself.” 202301292318.

  • There’s an emergence of tools like Notion, Airtable, and Readwise where people are aggregating content and resources, reviving the curated web. But at the moment these are mostly solo affairs, hidden in private or semi-private corners of the Internet, fragmented, poorly indexed, and unavailable for public use. We haven’t figured out how to make them multiplayer. (View Highlight)

  • With the advent of Google AdWords, it became profitable to put out shitty content that passed as informative and filled Google’s search engine results. In 1998, the first iteration of Google indexed 25 million pages. Today, Google’s index includes more than 100 trillion pages. AdWords was the catalyst for the explosion of content garbage manufactured for SEO purposes we see today. It’s also the reason why hidden gems, the kind of UGC content we discover on Twitter all the time, are less likely to show up highly ranked in search results. What started as a well-intentioned way to organize the world’s information has turned into a business focusing most of its resources on monetizing clicks to support advertisers rather than focusing on the search experience for people. (View Highlight)

  • I believe the opportunity in search is not to attack Google head-on with a massive, one size fits all horizontal aggregator, but instead to build boutique search engines that index, curate, and organize things in new ways (View Highlight)

  • Google is a great example of how the internet enabled scale and speed: every page on the web returned in an instant. But increasingly, we’re seeing this scale is at odds with a fundamental human need: relevance. Someone who wants to find the best freelance designer, or the best sushi restaurant, or the best NFT to buy will not find the answer on Google. (View Highlight)

  • When you monetize via ads, curation takes a backseat to featuring advertisers - there is just less digital real estate available to curate your own recommendations - so these platforms end up making ethically dubious design choices that generate massive trust gaps. (View Highlight)

  • A daily email with the top five Alibaba products feels fun and gimmicky as a side project, but it doesn’t help when you’re trying to find the best crib for your baby. Inevitably, you’ll want a way to search through a curator’s archives. (View Highlight)

  • The opportunity is in moving curated content feeds away from their never-ending-now orientation and towards more goal-oriented interfaces. People should be able to find whatever content they want on their terms and not be beholden to when the curator decides to publish. (View Highlight)

    • Note: I have felt this when I tried to use TikTok as a way to find information about clubs in SF (since Google and reddit had obviously failed me), and the posts I found were more focused on capturing the viewer’s attention than making a conscious effort to enumerate all possibilities and address them exhaustively.
  • Applying Ben Evans’ framework, it becomes clear that while the vertical search players have become too large and need curation, the curation feeds have become too long to browse and require search and structured data. The solution is better search and better curation, all wrapped in a better business model - a combination I call boutique search engines. (View Highlight)

    • Note: Is this what are.na is?
  • Unlike vertical search aggregators, boutique search engines feel less like yellow pages, and more like texting your friends to ask for a recommendation. They have constrained supply, which is the foundation for their biggest moat - trust. Importantly, boutique search engines introduce new business models that don’t rely on advertising. (View Highlight)

  • There are tens of thousands of people sharing insights on a long-tail of topics, but their content is buried in the deep corners of the interwebs, found only by chance, and consumed in fleeting social media feeds that strip context and discourage reflection. (View Highlight)

    • Note: This is the reason I’ve been using Pinboard so much! It’s basically a curated search engine for information on the web

February 22, 2023

Building my own metadata library

Inspired by some of the insights in The Magic of Small Databases” 202301292232, Build your metadata library” 202301291444, and Why We Need More Boutique Search Engines” 202301292238, I’ve decided to build my own metadata library using Datasette.

Note: Why can’t I use Airtable for this?

Airtable would work great for my purposes! Except for one, glaring issue: it has a complete and utter lack of search, and I really want to see what workflows I could unlock once I have the power of actually functional search at my fingertips. (See 202301292246)

#update 02-20-23: It’s funny that I claimed that I wouldn’t do this on Airtable because it has bad search, and ended up coming to Airtable for this anyway. (See https://airtable.com/appeI4Fto3INw0Je2/tbliavWk8DfgNiCoL/viwbDNQJS6hax069i?blocks=hide). Some of the reasons:

  1. Airtable’s search is good enough once you learn how to use linked records to connect things that you actually care about. This isn’t even that hard to backfill with scripts, if necessary.
  2. I’m pretty committed to Airtable as a product anyway given that I work there, so I don’t feel too bad about having my data ensconced within Airtable’s system. Worst case, I’ll export my data to PostgresSQL or something using https://uibakery.io/move-airtable-to-postgresql
  3. Airtable is very easy to set up, has great automation and API, and already has a nice user interface already. It can even serve as the backend of a website if I really wanted it to: https://mzrn.sh/2022/04/29/using-airtable-as-a-jekyll-website-database/ Some of the reasons why I might not want to do this in the future:
  4. What we look for in a resume — Having my own Datasette setup looks really good as a resume project. Airtable is slightly less impressive.

Things I want to use Datasette for:

  1. Things 3 tasks
  2. Books I’ve read
  3. Workouts I’ve done
  4. Notes I’ve written each day
  5. Spotify playlists
  6. NYT spelling bee
  7. Beeminder
  8. Pinboard

uid: 202301292209 tags: #starred #relational-thinking #projects

February 22, 2023

# Unbundling Tools for Thought

source: https://ift.tt/yNa5Zkm tags: #literature #zettelkasten #notetaking #tools-for-thought uid: 202212271346

In practice 95% of the use cases can be naturally unbundled into disjoint apps, and the lack of centralization and cross-app hyperlinking has no real negative effects.

Journalling: 86% of the nodes in my personal wiki are journal entries. Mostly there’s no reason for them to be there, they are rarely linked to by anything.

Todo Lists: I used to write todo lists in the daily entries in my personal wiki. But this is very spartan: what about recurring tasks, due dates, reminders, etc.?

Learning: if you’re studying something, you can keep your notes in a TfT. This is one of the biggest use cases. But the problem is never note-taking, but reviewing notes. Over the years, I’ve found that long-form lecture notes are all but useless, not just because you must remember to review them on a schedule, but because spaced repetition can subsume every single lecture note. It takes practice and discipline to write good spaced repetition flashcards, but once you do, the long-form prose notes are themselves redundant.

And you could argue that I could have stayed in my personal wiki by implementing support for transclusion (to assemble all the fragments into one view) and improved the version control UI. But this advice can be applied equally to every domain I attack with a personal TfT and for which it is lacking: just write a plugin to do X. The work becomes infinite, the gains are imaginary. You end up with this rickety structure of plugin upon plugin sitting on top of your TfT, and UX typically suffers the death by a thousand cuts.

Process Notes: e.g. “how do I do X in Docker”. I often have cause to write notes like this and can never quite think of where to put them. But this can’t be a genuine use case for a tool for thought because there’s very little need to create links between process notes. So this is just a matter of finding somewhere to put them in the filesystem or in a note-taking app.

And most of what I see is junk. It’s never the Zettelkasten of the next Vannevar Bush, it’s always a setup with tens of plugins, a daily note three pages long that is subdivided into fifty subpages recording all the inane minutiae of life. This is a recipe for burnout.

I agree with this

Every node in your knowledge graph is a debt. Every link doubly so. The more you have, the more in the red you are. Every node that has utility—an interesting excerpt from a book, a pithy quote, a poem, a fiction fragment, a few sentences that are the seed of a future essay, a list of links that are the launching-off point of a project—is drowned in an ocean of banality. Most of our thoughts appear and pass away instantly, for good reason.

The more crap you have, the harder it is to find the actually interesting stuff. 202212271453

But the main drawback is: you don’t need it. The idea of having this giant graph where all your data is hyperlinked is cute, but in practice, it’s completely unnecessary. Things live in separate apps just fine. How often, truly, do you find yourself wanting to link a task in your todo list app to a file in Dropbox? And if you do manage to build this vast web of links: how often is each link actually followed?

If I needed to do this, I could do it (in the worst case) with Hook. Is having ideas connected even necessary? Response to #zettelkasten cynicism:

  1. Just because links aren’t often followed doesn’t mean that adding links is wrong. Adding a backlink to something means that the next time that you try to look up the thing that you added the backlink to, you have much more context about how you’ve previously thought about that thing or how it could be used.

The rest are incidental reference” links: I’m writing a journal entry saying I’m working on project X, so I add a link to project X, out of some vague feeling of duty to link things. And it’s pointless.

don’t agree with this. Just because it’s difficult to link things to relevant atomic concepts doesn’t mean it’s not helpful. I have noted that I’ve done this

The natural conclusion of most tools for thought is a relational database with rich text as a possible column type. So that’s essentially what I built: an object-oriented graph database on top of SQLite.

This is basically capacities.io 202212271351, and what I thought about in 202207051412

The one graph database” is an unproductive, monistic obsession.

February 22, 2023

# Be less scared of overconfidence

source: https://ift.tt/ybHC0x9 tags: #literature #insights #selfgrowth #benkuhn uid: 202212101111

Ben Kuhn!

In both of these situations, I had some mental model of what was going on (“this epidemic is growing exponentially,” this startup seems good”) based on the particulars of the situation, but instead of using my internal model to make a prediction, I threw away all my knowledge of the particulars and instead used a simple, easy-to-apply heuristic (“experts are usually right,” markets are efficient”).

I frequently see people leaning heavily on this type of low-information heuristic to make important decisions for themselves, or to smack down overconfident-sounding ideas from other people.

This startup is growing incredibly fast and the founders are some of the most effective people I’ve ever met, but at their current VC valuation, the total comp is lower than my Big Tech job so I can’t justify the move.

  • I think I could have a big impact as an academic researcher, but most grad students end up depressed and don’t land a tenure-track position, so it’s not worth trying.

  • You’re going to start a company? Are you aware that 90% of startups fail? What makes you think you and your ragtag band of weirdos are the chosen ones?

  • Who are you to be sounding the alarm about a pandemic when every past alarm has been false and all the reputable, top-tier experts say not to worry?

These all place way too much weight on the low-info heuristic.

What’s worse, these low-info heuristics almost always push in the direction of being less ambitious, because the low-info view of any ambitious project is that it will fail (most projects run behind schedule, most startups fail, most investors underperform the market, etc.).

Why do people find low-info heuristics so compelling? A few potential reasons:

  • Many (most?) attempts to reason via specific details are wrong. Most people who think I’m going to beat the market” don’t; most people who think I know better than all the experts” are less Balaji Srinivasan and more Time Cube guy.

  • The reasoning and evidence backing up low-info heuristics is (relatively) legible and easily verifiable. If I claim 90% of startups fail,” I can often cite a study for support. Whereas if I claim the markets aren’t freaking out enough about COVID,” I’d need to make a much more complicated argument to explain my reasoning.

  • It’s relatively straightforward to reason with low-info heuristics even when you’re not an expert in the domain. For something like a forecasting challenge, where forecasters need to make predictions across a wide range of topics and can’t possibly be an expert in all of them, this is very important.

  • Because it’s much more objective, reasoning via low-info heuristics gives you many fewer opportunities to fall prey to biases like optimism bias, motivated reasoning, the planning fallacy, etc.

Sometimes I see people use the low-info heuristic as a baseline” and then apply some sort of fudge factor” for the illegible information that isn’t incorporated into the baseline—something like the baseline probability of this startup succeeding is 10%, but the founders seem really determined so I’ll guesstimate that gives them a 50% higher probability of success.” In principle I could imagine this working reasonably well, but in practice most people who do this aren’t willing to apply as large of a fudge factor as appropriate. Strong evidence is common 202212092211

For example, the efficient market hypothesis (“asset prices incorporate all available information, so it’s hard to beat the market” used in the above example to infer that venture capitalists value companies correctly”) is justified by economic theory that relies on a few assumptions:

  • Low transaction costs: The cost of doing a trade in the market (in this case, an investment) must be near-zero so that people can use any mispricings to get rich.

  • Enough smart money: The well-informed and rational players in the market need to have enough capital to take advantage of any pricing inefficiencies that they notice.

  • No secrets: The available information” must be available to enough of the smart money that it can be used to correct mispricings.

  • Ability to profit: There must be a way for a smart market participant to make money from a mispriced asset.

In the case of venture capital, many of these assumptions are super false. Fundraising takes a lot of time and money: transaction costs are high. Venture capitalists YOLO their valuations after a few meetings: they frequently miss important information. And it’s impossible to short-sell startups, so there’s no market mechanism to correct an overpriced company. You can see the outcome of this in the fact that there are venture capitalists that consistently beat the market’s” returns.

Unfortunately, low-info heuristics tell you that outliers can’t exist. By definition, most members of any group are not outliers, so any generalized heuristic will predict that whatever you’re looking at isn’t an outlier either. If you index too heavily on what the average outcome is, you’re deliberately blinding yourself to the possibility of finding an outlier.

The problem is that the bad consequences of underconfidence and under-ambition are severe but subtle, whereas the bad consequences of overconfidence and wishful thinking are milder but more obvious. If you’re overconfident, you’ll try things that fail, and people will laugh at you. If you’re underconfident, you’ll avoid making risky bets, and miss out on the potential upside, but nobody will know for sure what you missed.

In fact, outperforming low-info heuristics isn’t just possible; it’s practically mandatory if you want to have an outsized impact on the world. That’s because leaning too heavily on low-info heuristics pushes people away from being ambitious or trying to search for outliers.

OK, so what should you do instead of relying on low-info heuristics? Here are my suggestions:

Build gears-level models of the decision you’re trying to make. If you’re deciding, e.g., where to work, try to understand what makes different jobs awesome or terrible for you.

Think really hard 202212101100 about the problem. Most inside views are wrong—to stand a fighting chance of beating the outside view, you’ll need to put a lot of effort in.

Don’t fool yourself with motivated reasoning. Stress-test your ideas; ask yourself what the best arguments against your inside view are and see if you can rebut them.

To the extent that you do use low-info heuristics, use them as a stress test rather than a default belief. 90% of startups fail” is useful to know as a warning to try to mitigate failure modes. It’s dangerous when you hear it and stop thinking there. Don’t be afraid to try ambitious things where the downside of failing is low, and the upside of succeeding is high!

February 22, 2023

Unix Command Line

Going through this guide: GitHub - jlevy/the-art-of-command-line: Master the command line, in one page and going to take note on notable thing as I go.

Tips:

  1. Use curl cheat.sh/<command> for a cheat sheet on whatever command you’re not sure about.
  2. awk 202211260214 is extremely useful for programmatically processing / applying some sort of function to a file line by line.
  3. less 202211260215 is a pager, which makes is to that you don’t have to load a file completely into memory to read it (which is what Vim does). Most of Vim commands still apply, with ctrl-f to go one page forward and ctrl-b to go one page backward.
  4. Globbing is important. Reference link:
  5. Bash job management 202211260219
    • appending an & to a process will make it run while still allowing you to enter input into the console.
    • fg and bg will move the most recent process into the foreground or background respectively.
    • If you want to kill a job with a certain id (found in jobs, for example), then use kill %{job_id}.
    • Awk 202211260214
    • For a quick summary of disk usage: du -hs *
  6. Less 202211260215 follow-mode, to see how a file grows:
    • less +F <file>
  7. Regular expressions 202211260228 (grep / egrep):
    • Flags to be aware of:
      • -i: ignore case
      • -v: Invert match. Select lines that don’t match the specified patterns
      • -o: prints only the matching part of the line
      • -A, -B, -C, number of trailing lines of context before/after/before+after each match.
  8. Awk 202211260214 can be used with sed for text manipulation.
  9. Last one for today! What do the columns in ls -l mean?
    • file mode, number of links, owner name, group name, number of bytes in the file, abbreviated month, day-of-month file was last modified, hour file last modified, minute file last modified, and the pathname.
      • The file mode printed under the -l option consists of the entry type and the permissions. The entry type character describes the type of file, as follows:
         -     Regular file.
         b     Block special file.
         c     Character special file.
         d     Directory.
         l     Symbolic link.
         p     FIFO.
         s     Socket.
         w     Whiteout.
      • [Note: the section above isn’t super helpful. I don’t see a situation where this will be very useful, so I’m not going to view it for now.]
  10. Advantages vs. disadvantages of soft links (symlinks)
    • Create a soft link: ln -s file1 link1
    • While useful, there are some limitations to what hard links can do. For starters, they can only be created for regular files (not directories or special files). Also, a hard link cannot span multiple filesystems. They only work when the new hard link exists on the same filesystem as the original.
    • All of this sounds great, but there are some drawbacks to using a soft link. The biggest concern is data loss and data confusion. If the original file is deleted, the soft link is broken. This situation is referred to as a dangling soft link. If you were to create a new file with the same name as the original, your dangling soft link is no longer dangling at all. It points to the new file created, whether this was your intention or not, so be sure to keep this in mind.
  11. Things to look at in the future: chown, chmod
    • for filesystem management: df, mount, fdisk, mkfs, lsblk
  12. Keybindings in bash (and also zsh): These are all emacs keybindings!
    1. Ctrl-w to delete the last word
    2. Ctrl-u to delete content from cursor to start of the line. Ctrl-k to kill till end of line.
    3. Alt-b to move back one word, and alt-f to move forward.
    4. Ctrl-a to move cursor to beginning of line.
    5. Ctrl-e to move cursor to end of line.
    6. Ctrl-l to kill the screen.
  13. If you are halfway through typing a command but change your mind, hit alt-# to add a # at the beginning and enter it as a comment (or use ctrl-a, #, enter). You can then return to it later via command history.

  14. uid: 202211260212 tags: #software-engineering

February 22, 2023

Rsync

rsync is a utility for efficiently transferring and synchronizing files between a computer and a storage drive and across networked computers by comparing the modification times and sizes of files.

The rsync utility uses an algorithm invented by Australian computer programmer Andrew Tridgell for efficiently transmitting a structure (such as a file) across a communications link when the receiving computer already has a similar, but not identical, version of the same structure.

The recipient splits its copy of the file into chunks and computes two checksums for each chunk: the MD5 hash, and a weaker but easier to compute rolling checksum’. It sends these checksums to the sender.

The sender computes the checksum for each rolling section in its version of the file having the same size as the chunks used by the recipient’s. While the recipient calculates the checksum only for chunks starting at full multiples of the chunk size, the sender calculates the checksum for all sections starting at any address. If any such rolling checksum calculated by the sender matches a checksum calculated by the recipient, then this section is a candidate for not transmitting the content of the section, but only the location in the recipient’s file instead. In this case, the sender uses the more computationally expensive MD5 hash to verify that the sender’s section and recipient’s chunk are equal. Note that the section in the sender may not be at the same start address as the chunk at the recipient. This allows efficient transmission of files which differ by insertions and deletions. The sender then sends the recipient those parts of its file that did not match, along with information on where to merge existing blocks into the recipient’s version. This makes the copies identical.

Created from: Algorithms to know before system design interviews 202211231052


uid: 202211231059 tags: #distributed-systems #software-engineering #algorithms

February 22, 2023