Great article, thanks for sharing it OP.
For example, the Anthropic researchers who located the concept of the Golden Gate Bridge within Claude didn’t just identify the regions of the model that lit up when the bridge was on Claude’s mind. They took a profound next step: They tweaked the model so that the weights in those regions were 10 times stronger than they’d been before. This form of “clamping” the model weights meant that even if the Golden Gate Bridge was not mentioned in a given prompt, or was not somehow a natural answer to a user’s question on the basis of its regular training and tuning, the activations of those regions would always be high.
The result? Clamping those weights enough made Claude obsess about the Golden Gate Bridge. As Anthropic described it:
If you ask this “Golden Gate Claude” how to spend $10, it will recommend using it to drive across the Golden Gate Bridge and pay the toll. If you ask it to write a love story, it’ll tell you a tale of a car who can’t wait to cross its beloved bridge on a foggy day. If you ask it what it imagines it looks like, it will likely tell you that it imagines it looks like the Golden Gate Bridge.
Okay, now imagine you’re Elon Musk and you really want to change hearts and minds on the topic of, for example, white supremacy. AI chatbots have the potential to fundamentally change how a wide swath of people perceive reality.
If we think the reality distortion bubble is bad now (MAGAsphere, etc), how bad will things get when people implicitly trust the output from these models and the underlying process by which the model decides how to present information is weighted towards particular ideologies? Considering the rest of the article, which explores the way in which chatbots attempt to create a profile for the user and serve different content based on that profile, now it will be even easier to identify those most susceptible to mis/disinformation and deliver it with a cheery tone.
How might we, as a society, create a process for conducting oversight for these “tools”? We need a cohesive approach that can be explained to policymakers in a way that will call them to action on this issue.
Figures that a slop company’s CEO wouldn’t have words of his own, rather have the machine generate slop for him, but this stuck out
We can’t stop the bus, but we can steer it …
What a bullshit statement. It’s not that you can’t stop it, but rather that you won’t stop it. It’s an active choice you’re making, and not a compulsion like in the case of a kleptomaniac. Machine learning isn’t some natural force, it’s entirely man-made and we can stop whenever we want.
No, you’re not uncontrollable kleptomaniacs, you’re just doing the same shit the rich elite has always done; you exploit the world around you and you get away with it scot-free.
Also, you can stop a bus. It’s an integral part of how they fucking operate. Not that I’d expect a CEO to have ever been on one.