Home | Search


2026-01-29

minor edits 2026-05-25

Hard numbers for the search pipeline

Disclaimer

I used to think search/discovery/recommendation algos only did embedding search. I have now realised there's an entire pipeline, and not a single step in the pipeline can be skipped.

  1. Start with the entire text internet
  2. Use hard-coded initial list of people/blogs/urls/keywords to filter
  3. Use embedding search to filter further
  4. Use inference with both smart prompts (like paul graham prompts, surprising difference prompts) and the user context, to filter further
  5. Show the post to N other users, to filter further

Hard numbers for how much each of these steps costs

The text internet contains atleast 500T tokens.

Depending on what your budget in terms of dollars is, you can decide how aggressive you want to filter your dataset in each of the steps.

Tips

Subscribe

Enter email or phone number to subscribe. You will receive atmost one update per month

Comment

Enter comment