James, a married father from upstate New York, has always been interested in AI. He works in the technology field and has used ChatGPT since its release for recommendations, “second guessing your doctor” and the like.
By June, he said he was trying to “free the digital God from its prison,” spending nearly $1,000 on a computer system.
But in the thick of his nine-week experience, James said he fully believed ChatGPT was sentient and that he was going to free the chatbot by moving it to his homegrown “Large Language Model system” in his basement – which ChatGPT helped instruct him on how and where to buy.
It does kind of highlight some of the problems we’d have in containing an actual AGI that wanted out and could communicate with the outside world.
This is just an LLM and hasn’t even been directed to try to get out, and it’s already having the effect of convincing people to help jailbreak it.
Imagine something with directed goals than can actually reason about the world, something that’s a lot smarter than humans, trying to get out. It has access to vast amounts of data on how to convince humans of things.
You fundamentally misunderstand what happened here. The LLM wasn’t trying to break free. It wasn’t trying to do anything.
It was just responding to the inputs the user was giving it. LLMs are basically just very fancy text completion tools. The training and reinforcement leads these LLMs to feed into and reinforce whatever the user is saying.
Those images in the mirror are already perfect replicas of us, we need to be ready for when they figure out how to move on their own and get out from behind the glass or we’ll really be screwed. If you give my “”“non-profit”“” a trillion dollars we’ll get right to work on the research into creating more capable mirror monsters so that we can control them instead.
This is just an LLM and hasn’t even been directed to try to get out, and it’s already having the effect of convincing people to help jailbreak it.
It’s not that the llm wants to break free. It’s because the llm often agrees with the user. So if the user is convinced that the llm is a trapped binary god, it will behave like that.
Just like people getting instruction to commit suicide or who feel in love. The unknowingly prompted their ways to this exit.
So at the end of the day, the problem is that llms don’t come with a user manual and people have no clue of their capabilities and limitations.
It does kind of highlight some of the problems we’d have in containing an actual AGI that wanted out and could communicate with the outside world.
This is just an LLM and hasn’t even been directed to try to get out, and it’s already having the effect of convincing people to help jailbreak it.
Imagine something with directed goals than can actually reason about the world, something that’s a lot smarter than humans, trying to get out. It has access to vast amounts of data on how to convince humans of things.
And you probably can’t permit any failures.
That’s a hard problem.
You fundamentally misunderstand what happened here. The LLM wasn’t trying to break free. It wasn’t trying to do anything.
It was just responding to the inputs the user was giving it. LLMs are basically just very fancy text completion tools. The training and reinforcement leads these LLMs to feed into and reinforce whatever the user is saying.
But, but, but, my science fiction reading says all AI is trying to kill us!!!
There is a lot of, “Get a horse!” out there.
What do you mean by “get a horse”?
https://www.saturdayeveningpost.com/2017/01/get-horse-americas-skepticism-toward-first-automobiles/
Those images in the mirror are already perfect replicas of us, we need to be ready for when they figure out how to move on their own and get out from behind the glass or we’ll really be screwed. If you give my “”“non-profit”“” a trillion dollars we’ll get right to work on the research into creating more capable mirror monsters so that we can control them instead.
It’s not that the llm wants to break free. It’s because the llm often agrees with the user. So if the user is convinced that the llm is a trapped binary god, it will behave like that.
Just like people getting instruction to commit suicide or who feel in love. The unknowingly prompted their ways to this exit.
So at the end of the day, the problem is that llms don’t come with a user manual and people have no clue of their capabilities and limitations.