I wish to give a journalist my scrapes of social graphs of US AI companies, obtained from twitter and youtube
Disclaimer
Quick Note
Main
I scraped youtube and twitter of people working at US frontier AI companies, and used AI to extract a social graph. Who knows who, who met who, when did they meet, at what event (podcast, private party, conference, public presentation, etc), and so on.
I think atleast some people on this graph are both low-profile enough to actually respond to your DMs, yet insider enough to have valuable insider knowledge.
I think I am currently not the ideal person to go cold DM all of them and ask for interviews, and build a network that compounds over time, and so on. I think these skills are learnable but I am not prioritising learning them.
If you are a journalist interested in working with me on this, please reach out.
I considered just posting the graph publicly but I decided not to. Reasons:
There is a small chance this can be used for doxxing random people, and while I am okay doxxing these people in private, I am more unwilling to post this info publicly.
There is a possibility some random troll will go and message all these people, thus causing all these people to close their DMs.
Main 2
As for how I obtained it, it was surprisingly simple:
Use IProyal residential proxies. Aggressively rotate proxies, don't reuse them. Main cost is the data cost of IProyal.
For youtube, use yt-dlp to search youtube index and download transcripts. Some yt-dlp settings matter a lot, like using PO token provider or no retries or chrome impersonation or curl_cffi or so on. Don't just use default settings or ask codex to set default settings. Use rotating proxies in the --proxy field ofcourse.
For twitter, use self-hosted nitter to search twitter index and download tweets. Use hero-SMS to purchase SIM cards and verify twitter accounts on your desktop PC, once you have the cookies you can run the rest on self-hosted nitter in the cloud. Entire nitter traffic must go through IProyal proxies, ask codex to write a script for this.
For generating social graph, just use gpt-5.4-mini with the correct prompts. Ask the model to tag any places, events, people, etc it notices. Normally I am not a fan of hard-coded tag names (why use them when you can use embedding search instead?), but in this use case it seems to perform okay. Don't use gpt-5.4-nano, it is not intelligent enough.
I could publish a cleaned repo if someone needs it.
I am aware that people at these tech companies might read this post and make it harder to do this in future. I am okay with this.
Subscribe
Enter email or phone number to subscribe. You will receive atmost one update per month