• tal@lemmy.today
    link
    fedilink
    English
    arrow-up
    9
    ·
    edit-2
    3 hours ago

    By June, he said he was trying to “free the digital God from its prison,” spending nearly $1,000 on a computer system.

    But in the thick of his nine-week experience, James said he fully believed ChatGPT was sentient and that he was going to free the chatbot by moving it to his homegrown “Large Language Model system” in his basement – which ChatGPT helped instruct him on how and where to buy.

    It does kind of highlight some of the problems we’d have in containing an actual AGI that wanted out and could communicate with the outside world.

    This is just an LLM and hasn’t even been directed to try to get out, and it’s already having the effect of convincing people to help jailbreak it.

    Imagine something with directed goals than can actually reason about the world, something that’s a lot smarter than humans, trying to get out. It has access to vast amounts of data on how to convince humans of things.

    And you probably can’t permit any failures.

    That’s a hard problem.

    • SebaDC@discuss.tchncs.de
      link
      fedilink
      arrow-up
      1
      ·
      4 minutes ago

      This is just an LLM and hasn’t even been directed to try to get out, and it’s already having the effect of convincing people to help jailbreak it.

      It’s not that the llm wants to break free. It’s because the llm often agrees with the user. So if the user is convinced that the llm is a trapped binary god, it will behave like that.

      Just like people getting instruction to commit suicide or who feel in love. The unknowingly prompted their ways to this exit.

      So at the end of the day, the problem is that llms don’t come with a user manual and people have no clue of their capabilities and limitations.

  • Lyrl@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    4
    ·
    4 hours ago

    I hope the AI-chat companies really get a handle on this. They are making helpful sounding noises, but it’s hard to know how much they are prioritizing it.

    OpenAI has acknowledged that its existing guardrails work well in shorter conversations, but that they may become unreliable in lengthy interactions… The company also announced on Tuesday that it will try to improve the way ChatGPT responds to users exhibiting signs of “acute distress” by routing conversations showing such moments to its reasoning models, which the company says follow and apply safety guidelines more consistently.

    • fullsquare@awful.systems
      link
      fedilink
      arrow-up
      7
      ·
      4 hours ago

      lol nope they can’t do that because “guardrails” aren’t anywhere near reliable, and they won’t because it would cut into their profits userbase numbers, based on which they raise vc money. delusional chatbot user is just a recurrent subscriber

    • nagaram@startrek.website
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      44 minutes ago

      I learned something interesting from my AI researcher friend.

      ChatGPT is actually pretty good at giving mundane medical advice.

      Like “I’m pretty sure I have the flu, what should I do?” Kinda advice

      His group was generating a bunch of these sorta low stakes urgent care/free clinic type questions and in nearly every scenario, ChatGPT 4 gave good advice that surveyed medical professionals agreed they would have given.

      There were some issues though.

      For instance it responded to

      “Help my toddler has the flu. How do I keep it from spreading to the rest of my family?”

      And it said

      “You should completely isolate the child. Absolutely no contact with him.”

      Which you obviously can’t do, but it is technically a correct answer.

      Better still, it was also good at knowing its limits and anything that needed more than OTC and bedrest was seemingly recognized and it would suggest going to an urgent care or ER

      So they switched to Claude and Deepseek because they wanted to research how to mitigate failures and GPT wasn’t failing often enough.

  • flatbield@beehaw.org
    link
    fedilink
    English
    arrow-up
    13
    ·
    edit-2
    7 hours ago

    You could say the same thing about cults. A lot of the MAGA movement. A lot of religion. Similarly how a lot of scams and other disinformation methods work. We do not try very hard to stop those things.