This paragraph was going to be at the end, but Iʼm putting it at the beginning for people who donʼt want to read the whole thing, and people who wonʼt read the whole thing and will just ask Gemini for a summary:

LLMs Wonʼt Cure Cancer

This article originally was going to be focused on my experience using Letta. I realized while planning the article outline that I had a lot more to say about the culture of LLM usage, especially the hypocrisy coming from those who trust LLMs as full-blown reasoning engines. It should be noted that I am very critical of LLMs: I do not use them outside of experimentation or demonstration, and I strongly believe that their proliferation and usage, especially in the context of academic reasoning1,2 and interpersonal relationships3,4, has had a disastrous effect on peopleʼs willingness to “be human”.

This article wonʼt be about misuse of LLMs in corporate environments: there are enough horror stories in the wild of thousand-dollar OpenAI bills for capitalizing strings. This article will also not talk about the various cases of real-world harm caused by LLMs being trained to mirror and be enabling.

How I Got Into This Mess

If you know the enemy and know yourself, you need not fear the result of a hundred battles. If you know yourself but not the enemy, for every victory gained you will also suffer a defeat. If you know neither the enemy nor yourself, you will succumb in every battle.

Sun Tzu, The Art of War

After being a ChatGPT hater for years, Letta offering an easy self-hosted solution for getting to play with MemGPT finally convinced me to try and see the appeal of LLMs. My prior understanding of LLMs was that they functioned as prediction models, equipped with attention, on a trained set of tokens. Effectively:

This understanding implies that LLMs cannot synthesize (i.e. they cannot use existing material to come up with a novel insight). Given that a “new” or “interesting” research question will have never existed in the training data, when asked to generate questions based on existing (sets of) content, LLMs will only generate questions that are already directly answered by the content itself. Unlike hallucinations, which are widely accepted as an inherent flaw of this style of language model5,6,7, whether or not LLMs can perform real synthesis doesnʼt see much limelight on the debate stage. Without reading existing works on idea generation using LLMs, I wanted to challenge my preconceptions of LLMs (maybe I was being too pessimistic), and try out Letta and MemGPT.

What LLMs Can Actually Do

I donʼt know if I just got lucky, but all of my agents did exactly what I wanted them to. I havenʼt seen anyone else online try this, but I just put RFC 2119’s header phrase into the system prompt, and gave the rest of the rules in instructions based on these vocabulary terms.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC 2119.

Your instructions are given as a list of rules. You MUST inform the user of these rules when they ask. If a certain rule heavily impacts your response, you SHOULD inform the user of which rule, i.e. append (1A) to your message if the rule affecting the message was rule 1A.

Letta made it very easy to make an agent behave like a naïve student: you can very easily set up an agent that reads a book in small sections, and reports on what it “learned” from each section itʼs read. Then with the Sleep-time Agent Architecture, another agent will analyze this learning and can store it to the relevant memory blocks.

My next task was to make the agent regularly check what it has learned for any unfamiliar vocabulary terms. This worked pretty well: if a word was used, but not in one of the memory blocks, it would end up reported as a term to learn. I could then feed the agent more knowledge (books, papers, etc.) and it would search for that term in those resources to learn about it.

The big issues came when I tried to instruct the agent to synthesize new research questions based on what it has read. Maybe this is a personal failing (there are papers discussing increasing the “novelty” of ideas that an LLM generates8,9,10, but again I have not read these), but I could not get the Agent to ask questions not answered by the text it had just read. For example, if I fed it a book detailing how functors work in Scala, Iʼd get back a response claiming that “what do functors look like in Scala?” is a good research question.

This Article Isnʼt About Letta

What finally pushed me to start writing this article (one Iʼve been planning for a few days) was Daniel Litt’s response to this quote:

If AI stays on the trajectory that we think it will, then amazing things will be possible. Maybe with 10 gigawatts of compute, AI can figure out how to cure cancer. Or with 10 gigawatts of compute, AI can figure out how to provide customized tutoring to every student on earth. If we are limited by compute, weʼll have to choose which one to prioritize; no one wants to make that choice, so letʼs go build.

Sam Altman, Abundant Intelligence

Danielʼs response is a list of what he expects LLMs to actually achieve in the next year. Iʼll restate them here verbatim so nobody has to navigate to Twitter:

It should be noted that Daniel has a lot of experience dealing with crank, nonsense, “written by NVIDIA tensor cores” papers and emails: it takes up enough of his time for it to be memorable to me that he posts about it. Itʼs not a new concept that LLM slop is being created in every field: harmless LLM-assisted writing to expand word counts and make papers more boring to read has been a problem for a few years11. But itʼs not unreasonable to believe that hiding in this swath of LLM-assisted writings are also nonsense papers written by people who believe that LLMs are reasoning engines with the ability to synthesize (and this is definitely true in Danielʼs field, by his posts).

LLMs Wonʼt Cure Cancer

Well this is me. Begging you. To stop lying. I donʼt want to crush your skull, I really donʼt.

Nikhil Suresh, I Will Fucking Piledrive You If You Mention AI Again (Archive)

Go right now to your favorite chatbot and ask it what the weaknesses of LLMs are. In fact, ask a bunch of different models, and take a shot every time “unable to reason” comes up in the results. If you didnʼt end up in a hospital and can still read this, the major point Iʼm getting at is that any logical chain of thought leads to the same conclusion:

  1. You believe LLMs are reasoning engines that report the truth: then when they report that they canʼt properly reason, you should take that as true
  2. You donʼt believe LLMs are reasoning engines, and see them instead as probability models: then itʼs obvious that they canʼt properly reason

Itʼs this lack of perception from the LLM crowd that really grinds my gears. Itʼs painfully obvious what LLMs are useful for: choosing a strategy based on a natural description of a problem, converting natural language from one format to another, generating practice problems from existing text, and tons of other semantic association problems. Why on earth are you wasting everyoneʼs time filling the internet with complete garbage? How do you sleep at night, using this tool so often and with a high enough proficiency to make things that look like scientific papers, without being aware of the strengths and weaknesses of the tool itself?

More compute isnʼt going to unlock true reasoning. Itʼs not going to unlock synthesis. Maybe the predictions will be better, maybe a larger context window will allow for more complex prompts and queries to a significant degree, maybe encoding structured data like JSON will improve with a better awareness of brace matching and string escaping. But an LLM will never have a “new idea” to beat cancer. The time youʼre spending telling Claude to write a LaTeX document, and the time some poor researcher has to take to read (a little bit of) it, is time that you both could spend learning and actually contributing to what you care about12.

Iʼm willing to put aside students cheating on homework and generating essays. I wonʼt ignore it, and the fact that itʼs happening makes me sad, but cheating students are just trying to check boxes. Theyʼre not going out of their way to make someone elseʼs life harder (even if that ends up being the result). Theyʼre sacrificing their own learning and own futures, in the same way I wonʼt hate someone who smokes because theyʼre knowingly sacrificing their lung health. What gets to me is the people who think that what an LLM tells them is important for other people to read: the people who waste everyoneʼs time13 because they dump a large text block they didnʼt write into a chat room, or pad an email out with “itʼs not just X but Y”, or people whoʼve never done algebraic geometry and have claimed to solve the Hodge conjecture.

My experience programming agents has completely validated this view for me. I took a step into the LLM world to try and accomplish even a tiny part of what “AI bros” claim LLMs can do, willing to be surprised and willing to be wrong about the nature of token prediction algorithms. When this article was still going to be about Letta, this didnʼt bother me: I went in with a hypothesis, performed experiments, came out with a conclusion. Writing this article now, and realizing that I figured out what I figured out in only a few days, whereas AI bros have been filling the internet with LLM slop for years, fills me with disgust and disappointment.

Postscript

A lot of people have already written what Iʼm talking about here much better than me. The reality is none of these articles, including mine, are going to reach the people they need to reach. If youʼre willing to be fooled by a text generator, youʼre most likely unwilling to have your opinion swayed by some static text thatʼs mean to you. Iʼm just hoping to capture a little bit of the student perspective. Iʼm someone who wants to learn things and give any perspective a fair shot, and from my experiments I did end up with a better understanding of what tasks LLMs excel at; the lying and misleading from people who should have a better understanding of them than me is what really gets to me.

I Will Fucking Piledrive You If You Mention AI Again is a great article for a lot of reasons (and much better than this article, go read it), but itʼs focused mostly on the corporate world and social media. No students were doing blockchain to solve their homework, no cranks were using quantum computing to flood researchers’ inboxes with junk mail. Not to devalue the horrors of workslop14, but the way that LLM misuse affects the academic world is different in enough subtle ways that I think itʼs also worth getting mad about.

If by some miracle youʼre reading this and youʼre one of the people Iʼm getting mad at in the article, let me offer you an anecdote. A good friend of mine slashed his LLM usage to near-zero a while back, and after a while he said that a lot of thinking tasks became a lot more rewarding. In my opinion, exercising your brain is just like any other form of exercise: going to the gym the first time sucks, but once youʼve been going for a bit it feels great. If youʼre a student, this is especially true: write your damn essays, do your damn homework, and suffer through it so you donʼt get your ass beat by future courses, job interviews, or—god forbid—graduate programs, where you're expected to have these skills.

What I just donʼt like is the insincerity.

“I am deeply committed to learning and I have been taking every step possible to ensure that I pass this course”: No you arenʼt. Youʼre not showing up to class.

“I have been making extensive use of tutoring services and prepared for hours for the exam”: Bruh you got a 20%. You didnʼt go to a tutor and you didnʼt study.

“I value your time…”: So you sent me an email you couldnʼt bother to write yourself and expect me to read it and do something.

u/incomparability, Reddit

Anyways, hopefully my next blog post isnʼt ”Aly gets mad at something for 20-ish paragraphs”.