LLM Applications I Want To See

ideas for actually making practical use of large language models
Sarah Constantin

Subscribe to The Industry

Sarah Constantin is an AI and science researcher. Early in her career, she was a deployed computational engineer at Palantir, and she later founded a company to research human lifespan extension technologies. She’s now a program officer at Renaissance Philanthropy funding AI tools for advanced math research. Today in Pirate Wires, Constantin imagines six creative, genuinely useful applications for LLMs that go far beyond “a chatbot, but better.”

This piece was originally published in Constantin’s Substack, Rough Diamonds.

--

I’m convinced that people who are interested in large language models (LLMs) are overwhelmingly focused on general-purpose “performance” at the expense of exploring useful (or fun) applications.

As I work on a personal project, I’ve been learning my way around HuggingFace, a hosting platform, set of libraries, and almost-social-network for the open-source AI community. It’s fascinating, and worth exploring even if you’re not going to be developing foundation models from scratch yourself. If you simply want to use the latest models, build apps around them, or adapt them slightly to your own purposes, HuggingFace seems like the clear place to go.

You can look at trending models, and trending public “spaces,” (aka cloud-hosted instances of models that users can test) to get a sense of where the “energy” is. And what I notice is that almost all the “energy” in LLMs is on general-purpose models, competing on general-purpose question-answering benchmarks, sometimes specialized to particular languages, or to math or coding.

“How can I get something that behaves basically like ChatGPT or Claude or Gemini, but gets fewer things wrong, and ideally requires less computing power and gets the answer faster?” is an important question, but it’s far from the only interesting one!

If I really search, I can find “interesting” specialized applications like “predicts a writer’s OCEAN personality scores based on a text sample” or “uses abliteration to produce a wholly uncensored chatbot that will indeed tell you how to make a pipe bomb,” but mostly… it’s general-purpose models. Not applications for specific uses I might actually try.

Some applications seem eager to go to the most creepy and inhumane use cases. No, I don’t want little kids talking to a chatbot toy. No, I don’t want to talk to a chatbot on a necklace or pair of glasses. (In public? Imagine the noise pollution!) No, I certainly don’t want a bot writing emails for me!

Even one of the apps I found potentially cool — an AI diary that analyzes your writing and gives you personalized advice — ended up being so preachy that I canceled my subscription.

In the short term, of course, the most economically valuable thing to do with LLMs is duplicating human labor, so it makes sense that the priority application is autogenerated code. But the most creative and interesting potential applications go beyond “doing things humans can already do, but cheaper” to do things humans can’t do at all on a comparable scale.

A Personalized Information Environment

To some extent, social media, search, and recommendation engines were supposed to enable us to get the “content” we want. And, to the extent that’s turned out to be a disappointment, people complain that getting exactly what you want is counterproductive. We end up with filter bubbles, superstimuli, and the like.

But I find that we actually have incredibly crude tools for getting what we want.

We can follow or unfollow, block or mute people; we can upvote and downvote pieces of content and hope “the algorithm” feeds us similar results; we can mute particular words or tags.

What we can’t do, yet, is define a “quality,” “genre,” or “vibe” we’re looking for and filter by that criterion. The old tagging systems (on Tumblr or AO3 or Delicious, or back when hashtags were used unironically on Twitter) were the closest approximation to customizable selectivity, and they’re still pretty crude.

We can do a lot better now.

Personalized Content Filter

This is a browser extension.

You teach the LLM, by highlighting and saving examples, what you consider “unwanted” content that you’d prefer not to see. The model learns a classifier to sort all text in your browser into “wanted” vs. “unwanted,” and shows you only the “wanted” text, leaving everything else blank.

Unlike muting/blocking particular people (who may produce a mix of objectionable and unobjectionable content) or muting particular words or phrases (which are vulnerable to context confusions, e.g. if you want to mute “woke” in a political sense but not “I woke up this morning”), you can teach your own personal machine a gestalt of the sort of thing you’d prefer not to see, and adjust it to taste. And, you don’t need to trust a third-party moderator to decide for you.

You would, of course, be able to make multiple filters and toggle between them if you wanted to “see the world” differently at different times.

You’d be able to share your filters. Some would probably become popular and widely used, the way Lists on Twitter/X and a few simple browser extensions like Shinigami Eyes are now.

Color-Coded Text

This is also a browser extension.

In addition to hiding unwanted text, you could make a more general type of text classification by labeling text according to user-defined, model-trained classification.

For instance:

  • Right-wing text in red, left-wing text in blue
  • Color-coded highlighting for (predicted) humor, satire, outrage bait, commercial/promotional content
  • Color-coded highlighting for (predicted) emotion: sad, angry, disgusted, fearful, happy, etc.

I expect it’s more difficult, but it may be possible for the LLM to infer characteristics pertaining to the quality/validity of discussion:

  • Non sequiturs
  • Invalid inferences
  • Failures of reading comprehension

This would display information about what kind of text we are reading, which we can certainly detect on our own though it can sneak up on us unnoticed. A “cognitive prosthetic” like color-coded text could be helpful for maintaining perspective or prioritizing: “Oh hey, I’ve been reading angry stuff all day, no wonder I’m getting angry myself” or “let me read the stuff highlighted as high-quality first.”

Fact Extraction

This could be an app.

You’d give it a set of resources (blog, forum, social media feed, etc.) that you don’t want to actually read, and assign it to give you a digest of facts (news-style, who/what/when/where concrete details) that come up in those sources.

For instance, back in January 2020, early online discussion of COVID-19 often took place on sites like 4chan where racially offensive language is common. To learn there was a new deadly epidemic in China, you’d have to expose yourself to a lot of content most people would rather not see.

It should be well within the capacity of modern LLMs to filter out jokes, rhetoric, and opinionated commentary, isolating “newsworthy” claims of fact and presenting them relatively neutrally.

I don’t love LLM applications for “text summarization” because I usually worry about the auto-summary missing something important about the original document. Lots of these summarization tools seem geared for people who don’t actually like to read — otherwise, why not just read the original? But summarization could become useful if it’s more like trawling for notable “signal” in very noisy (or aversive) text.

Plain Language

This is a browser extension that would translate everything into plain language, or language at a lower reading level. The equivalent of Simple English Wikipedia, but autogenerated and for everything.

I don’t find that current commercial LLMs are actually very good at this! I’m not sure how much additional engineering work would be necessary to make this work — but it might literally save lives.

People with limited literacy or cognitive disabilities can find themselves in terrible situations when they can’t understand documents. Simplifying bureaucratic or official language so more people can understand it would be a massive public service.

Dispute Resolution and Mediation

For better or for worse, people end up using LLMs as oracles. If you’re counting on the LLM to give you definitely correct advice or answers, that’s foolish. But if you merely want it to be about as good as asking your friends or doing a five-minute Google search, it can be fine.

What makes an LLM special is that it combines a store of information, a natural language user interface, and a random number generator.

If you’re indecisive and you literally just need to pick an option, a simple coin flip will do; but if you feel like it might be important to incorporate personalized context about your situation, you can just dump the text into the LLM and trust that “somehow” it’ll take that into account.

The key “oracular” function is not that the LLM needs to be definitely right but that it needs to be a neutral or impersonal source, like a dice roll or a pattern of bone cracks. Two parties can commit to abiding by “whatever the oracle says” even if the oracle is in no way “intelligent” — but intelligence is certainly a bonus.

AITA For Everything

This works best as an app.

It’s inspired by r/AmITheAsshole’s model: given an interpersonal conflict, who’s the “asshole” (rude, unethical, unreasonable, etc.)? It’s possible for multiple parties to be “assholes,” or for nobody to be.

The mechanism:

  • You enter your contacts into the app.
  • You can add contacts to a group “issue” you want to resolve.
  • Each participant in an “issue” describes, in writing, the situation as they see it, and submits it to the LLM. You cannot see other participants’ entries; only your own.
  • Once all descriptions have been submitted, the LLM sends everybody the same “verdict” — who, if anyone, is “the asshole,” and what should be done about the situation.

Of course, this is not enforceable; nobody has to take the LLM’s advice. But nobody has to take a couple’s therapist’s advice either, and people still go to them.

A neutral third party who can weigh in on everyday disputes is incredibly valuable — this is what clergy often wound up doing in more religious societies — and we lack accessible, secular, private means to do this today.

Chat Moderator

This is an LLM-powered bot you can include in group chats (e.g. Discord, Slack, etc.)

The bot is trained to detect conversational dynamics:

  • Persistent patterns of boundary-pushing, rudeness, “piling on,” etc.
  • Misunderstandings or “talking past each other”
  • Evasiveness, subject changes, non sequiturs
  • Coalitions and alliances

What could you do with this sort of information? Potentially:

  • Give the bot power to (temporarily or permanently) ban people engaging in unwanted behavior patterns.
  • Let the bot interject when it observes an unwanted conversational dynamic.
  • Allow people to ask the bot questions about what it observes, e.g. “what do you think the coalitions or ‘sides’ in this conversation are?”

Some implementations would be very similar to human moderation, but probably more nuanced than any existing auto-moderation system; other implementations would be unsettling but potentially illuminating social experiments. They might help people gain insight into how they show up socially.

The option to ask the bot to “weigh in,” like Hey bot, did Alice avoid answering my question right there? can build common knowledge about conversational “tactics” that are often left plausibly deniable. Plausible deniability isn’t necessarily a bad thing, but at its worst it enables gaslighting. A bot that can serve as a “third party” in even a private conversation, if all parties can trust it not to have a preexisting bias, can be a sort of recourse for hey, it’s not just my imagination, right? something shady just happened there.

Rethinking “Online”

All of our mechanisms for managing digital communication were invented before we had advanced tools for analyzing and generating natural language. So many of the technologies we now think of as failures may be worth revisiting â€” they may prove more tractable now that LLMs exist.

As I remember, during the rise of “Web 2.0” back in the late 2000s and early 2010s, we were continually learning new behavioral patterns enabled by digital tools. We each experienced a first time for ordering delivery food online, or ordering a taxi online, or filling out a personality quiz, or posting on a social media site or forum, or making a video call.

And then, for a while, those “firsts” stagnated. All the basic “types of things you could do” on the internet were the same ones you’d been doing five or ten years ago.

I don’t think that’s fundamentally a technological stagnation. It’s not really that we’d reached the limit of what can be done with CRUD (Create, Read, Update, Delete) apps. Instead, it may have been a cultural consolidation — the center of gravity moved to bigger tech companies, and eyeballs moved to a handful of social media sites.

Now, I have a sense that LLMs might be able to restart the whole “what could I do with a computer?” discussion. Some of our answers to that question will rely on new AI capabilities; others will feel familiar because they’re things we could have done before LLMs, but it didn’t occur to anyone to try.

What if, for instance, an LLM “decided” how to match dating profiles for compatibility?

Well, you could have done that in 2010 with a dot product between multiple-choice questionnaire responses, and OkCupid did. But... shh, never mind. Because we want nice things, and we should appreciate pixie dust (even computationally expensive pixie dust) that makes nice things seem possible.

And the ability to work with language as a truly malleable medium allows quite a bit nicer things than the decade-ago version would. Many nice things are not fundamentally dependent on any future advances in technical capability. You can do them with what we have now, and maybe even with what we had yesterday.

—Sarah Constantin

Subscribe to The Industry

0 free articles left

Please sign-in to comment