[Home]
[Parent directory]
[Search]
unimportant/software_and_ai/samuel_saksham_ai_timelines_20250509.html
2025-05-12
Samuel x Saksham AI timelines (discussion on 2025-05-09)
- top-level views
- samuel top-level: 25% AI!2030 >= ASI, >50% ASI >> AI!2030 >> AI!2025, <25% AI!2030 ~= AI!2025
- saksham top-level: medium probability AI!2030 >= ASI
- samuel bullish on model scaling, more uncertain on RL scaling
- saksham bullish on RL/inference scaling, saksham bullish on grokking
- samuel: does bullish on grokking mean bullish on model scaling. saksham: unsure
- agreements
- samuel and saksham agree: only 2024-2025 counts as empirical data to extrapolate RL/inference scaling trend. (o1, o3, deepseek r1, deepseek r0). RLHF done on GPT3.5 not a valid datapoint on this trend.
- saksham and samuel agree: if superhuman mathematician and physicist are built, high likelihood we get ASI (so robotics and other tasks also get solved). robotics progress is not a crux.
- crux: how good is scaling RL for LLM?
- saksham is more certain as being bullish on scaling RL for LLM, samuel has wider uncertainty on it.
- testable hypothesis: saksham claims GPT3 + lots of RL in 2025 ~= GPT4. saksham claims GPT2-size model trained in 2025 + high quality data + lots of RL in 2025 ~= GPT3. samuel disagrees. need top ML labs to try this stuff more.
- testable hypothesis: saksham claims models such as qwen 2.5 coder are <50B params but better than GPT3 175B and almost as good as GPT4 1.4T. samuel disagrees and claims overfit to benchmark. samuel needs to try <50B param models on tests not in benchmarks.
- testable hypothesis: samuel thinks small model being trained on big model leads it to overfit benchmark. saksham unsure. samuel and saksham need to try such models on tests not in benchmarks.
Subscribe
Enter email or phone number to subscribe. You will receive atmost one update per month