my north star might go supernova

6.9-6.15, week 4 in sf

Sep 14, 2024

After being prompted to do so by many people, I finally read Leopold Aschenbrenner’s Situational Awareness, his ~~essays~~ manifesto about the emergence and consequences of superintelligence. Leopold was, until recently, a member of OpenAI’s safety team so I feel obligated to give his perspective more weight than the multitude of others who have written on these topics before. In short, he believes that we will see AI agents that do research on themselves by ~2027 and that this will result in an explosion of superhuman intelligence across all fields over the following years. I strongly recommend that anyone in tech or adjacent at least read the first essay.

To say that I have seen the writing on the wall sounds awfully definite, but I have been watching advancements in the field for a few years and intellectually was ready to accept these arguments. To see a researcher from a top lab write them up so eloquently is viscerally uncomfortable though. I feel like a conspiracy theorist, putting on my tinfoil hat and praying to mechanical gods that don’t exist yet—but will soon.

I am terrified of living in a world where human thought has been deprecated. I can already see ways in which LLMs are able to “think” in ways that we will never be able to, like storing the entire documentation of a previously-unknown software library in short-term memory and recalling from it perfectly. Leopold writes of models being “hobbled”, supremely competent in some ways yet entirely crippled in others. The models we interact with can’t yet work on long-horizon tasks or store things in long-term memory, currently asking ChatGPT a question is like posing it to a person and asking them to say the first thing that comes to mind. The status quo is just that—these limits will be lifted.

Once there are agents that can interact with a computer just as I can, but can hold every machine-learning paper ever written in their silicon brains and reason over them, what will my purpose be? Leopold posits that such models would soon crack robotics so the physical world is unlikely to be safe either.

I’ve previously said that my goal in life is to contribute to the frontier of human knowledge, but I didn’t feel it until the realization that it might not always be out there to reach for. It scares me to think that “human” might be a crucial modifier, the knowledge of the machines may quickly outstrip ours by orders of magnitude.

As someone still early in my career, much of what I do is just not that deep. I expected to have years to grow and learn but now it feels like I should be optimizing for the short-term—the long-term may not exist in the way that I desire.

things i hope to do this week

May the following serve as evidence that I actually wrote this post in week 4:

I will be going on a road trip with my family to Glacier National Park.

Active quests:

Touching grass
Finessing my way into the Berkeley AI Hackathon

That said, it is 91 days late now…

things i did last week

I keep thinking these are getting shorter but perhaps not. I’m definitely being choosier with the content that I include but I think now that I have started putting down some roots many of experiences are richer. I am concerned that doing so many things steeped in context makes for a worse reading experience—if you have an opinion on this/find my blog increasingly boring please tell me!

On Sunday, I attended Playspace. I feel that I am playing poorly, I keep getting wrapped up in projects that are work-adjacent. Might try an artistic pursuit in future weeks. JP and J? came, afterwards JP asked if I know anywhere else where there are “chill tech people”. I don’t think so—it’s a very unique space. We had grand plans to modernize my LinkedIn profile picture afterwards but Safeway claimed to have pomelos yet did not so we left disappointed.

I have heard that the SF art scene is somewhat anemic so when I heard that LJ’s sister is a professional choral singer and had an upcoming concert I jumped at the chance to attend. The myriad languages and styles of music made for a fascinating experience.

On Monday, I was reminded that Andrej Karpathy is a phenomenal teacher. Some of the buds and I made a pact to work on projects we had been putting off and I have a long-held desire to train my own (L?)LM—which happens to be exactly the topic of his Zero to Hero course! (Hopefully I am starting from somewhere beyond zero at this point but I suspect that I am not quite at hero.) I am writing my own auto-differentiation engine; Andrej’s video was a great stimulus to get started, I often forget how easy it is to implement theory that I am highly familiar with.

I experienced the magic of SF when I got lunch with MS, an entrepreneur I had met at a previous event. It seems that he was scouting me as a potential machine learning engineer for his latest venture but I told him that I wasn’t particularly interested in the role and yapped about AIMO. I figure that elsewhere the conversation would have ended there but when I mentioned some schemes that I was reluctant to spend money on the compute for, he offered to give me a thousand dollars, no strings attached. Will have to remember to pay the many favors I have received forward.

Later in the evening, I spontaneously received a call from CF asking if I would be willing to come into the Martian office—tomorrow!—to work on an on-site for the week. BK sent me a technical spec and it seemed borderline infeasible, the goal was to extract dense representations from the weights of LLMs that reflected on their capabilities in some way. I was bouncing off the walls with a spurt of adrenaline from somewhere between being asked to interview with a super cool company days after chatting about the prospect of doing that and the task seeming incredibly difficult. Obviously, it was a good time to go to the gym and I ran into BK who told me not to worry about it too much—it seemed that we were on the same page about the sort of techniques that could be attempted.

To provide a little bit of context about Martian, they are a startup aiming to commercialize superior understanding of LLMs and their capabilities. Their flagship product is the Martian Router, a drop-in replacement for the OpenAI API. The router automatically sends the user’s prompt to the LLM that will handle it most effectively, with the option to include additional considerations like price and speed. Mechanistic interpretability is the field of research aiming to understand how and why LLMs make decisions by examining their internal structures. In my opinion, it is one of the most exciting avenues that we have for ensuring that models are behaving in the way that we expect and I believe that better understanding will result in better control.

Two of the buds, AU and KR, are throwing together a YCombinator application at the last second. Lots of excitement around that, I am living vicariously through AU, who is the more technical founder. Wishing them the best of luck!

On Tuesday, I had my first day in the Martian office! I won’t bore you with the details of what I did there but it was phenomenal, the team seems super sharp and I forgot how much of a phase-change going into an office triggers. I learned that my happiness costs 27.6 USD/day, the price to rent an A100 (fancy GPU) from Paperspace.1 Doordashing lunch also feels opulent, though I don’t know what else would make sense at startup-scale.

I will however bore you with the details of my desk setup.

Continuing the process of incrementally doxxing myself, my commute is wonderful, I catch one bus right outside LBB and then it takes me to right outside the office—which is itself minutes from a Fitness SF location.

The oddly aesthetic escalator into Fitness SF Transbay.

On Wednesday, I spent 13 hours in the office. I wrote “my blog post is going to be so boring this week”. I feel like a 0.2x MLE.

The Salesforce Transit Center goes hard for no reason.

On Thursday and Friday, the grind continued. I think I have a decent grasp of LLM theory but I have a lot to iterate on in practice. Several Martians2 offered to find time to privately answer any questions I may not want to ask in front of others. I have no idea how to make the most of such an offer, the most scandalous thing I asked about was the names of the people across from me.

On Friday, the last day of my on-site, I got lunch with YU, the technical co-CEO. (Martian-glazing warning.) When I heard of Martian, I was excited about the tech. Being there for the week, I got excited about the team. Talking to YU, I got excited about the company’s mission. They seem to be uniquely positioned to benefit directly from improvements in mechanistic interpretability research, whereas for other labs interpretability work often becomes an after-thought, especially for those with frontier models.

I also got rejected from the Berkeley AI Hackathon. The following content is griping: While it certainly wouldn’t be my first time getting rejected from a hackathon, I suspect that this was due to system errors. After getting my application in by the priority deadline, I received an email along the lines of “make sure to get your application in” so I checked and it was marked as submitted. Then, when I got rejected I looked at my application and saw that on the single admission question (“why do you want to come to this hackathon?”), it seemed to have saved my initial half-baked response which I later fleshed out. I keep all of my hackathon applications answers ~~so that I can recycle them~~ and am relatively confident that this was not my mistake.

On Saturday, I started a sleep competition with a friend. Anyone with a Fitbit or other way of getting a 0-100 sleep score is free to join in—possibly subject to some affine transformation.

I went on the Board Walk, as I have been doing weekly. I happened to run into someone dating a Martian employee who guessed that my on-site was there, he said that I “gave off Martian vibes”. Probably another sign that I should seriously consider working there.

GB had invited me to grab lunch with ET and some of his friends in Cupertino. I wanted to take the Caltrain but I was looking at a three-hour commute and figured that two hours would be beyond fashionably late. I think literally everyone in the group was in big tech, it was weird (but probably good) to leave the SF startup bubble. Thanks ID for the ride back to SF.

Afterwards, AU and I joined PC, one of the rarely-spotted buds, at his coworking space of choice. On the way there, I witnessed my first SF-jank, we walked through a sidewalk market of likely-bipped goods. I didn’t feel unsafe at any point, it was fascinating to see the seedier side of SF. Anyway, PC was the only resident of the space grinding on the weekend so I got to co-opt someone’s ultrawide monitor, need to acquire one for the office.

If it is wrong to love ultrawide monitors, I don’t want to be right.

That was my week, shoutout to AH, RM, NB, AV, SC and IW for surviving weekly calls. Still looking to schedule calls with RP and KA. If I end up talking to any of you this week, superintelligence will almost certainly be a major talking point.

Amusing add for the company that has a keylogger on my work laptop.

This is the rate for a 3-year commitment whereas we were using it on-demand so it might be ~3-5x higher. I did see that the on-demand pricing on Vast is comparable to or even cheaper though, I really need to remember that I can—and should—spend money on compute.

The company is named for the martians of Budapest, a group of incredibly influential thinkers including John von Neumann who all happened to grow up in the same neighborhood in Hungary. Hypothetically, we’re aiming to interface with martians rather than be them ourselves but everyone on the team is cracked so this seemed appropriate beyond being a cute moniker.

[better name pending]

Discussion about this post