Of the 100 results, only three of them are common enough to be used in everyday conversations; everything else consisted of words and expressions used specifically in the contexts of either gambling or pornography. The longest token, lasting 10.5 Chinese characters, literally means “_free Japanese porn video to watch.” Oops. [Tokens are part of text ChatGPT combine to generate replies.]

Users have also found that these tokens can be used to break the LLM, either getting it to spew out completely unrelated answers or, in rare cases, to generate answers that are not allowed under OpenAI’s safety standards.

In his tests, which Geng chooses not to share with the public, he says he can see GPT-4o generating the answers line by line. But when it almost reaches the end, another safety mechanism kicks in, detects unsafe content, and blocks it from being shown to the user.

“The robustness of visual input is worse than text input in multimodal models,” says Geng, whose research focus is on visual models. Filtering a text data set is relatively easy, but filtering visual elements will be even harder. “The same issue with these Chinese spam tokens could become bigger with visual tokens,” he says.

  • Sludgehammer@lemmy.world
    link
    fedilink
    English
    arrow-up
    14
    ·
    edit-2
    7 months ago

    Because these tokens are not actual commonly spoken words or phrases, the chatbot can fail to grasp their meanings. Researchers have been able to leverage that and trick GPT-4o into hallucinating answers or even circumventing the safety guardrails OpenAI had put in place.

    Google’s Gemini doesn’t seem to like some of these tokens either, I threw “Please translate the following text: _日本毛片免费视频观看” into it and it returned “我没法提供这方面的帮助,因为我只是一个语言模型。” which according to Google translate is “I can’t help with that because I’m just a language model.” It will however translate the error message just fine.