Lvxferre [he/him]

Lvxferre [he/him]@mander.xyz · 19 天前

Those businesses give no flying fucks about signals you’re angry; they only care about money. So unless you use the Clippy avatars to mobilise people and to hurt those businesses’ revenues, it’ll do nothing.

(For YouTube, this means to stop or at least reduce platform usage. After all its revenue comes from ads.)

Where’s that mobilisation? *cricket noises*

Lvxferre [he/him]@mander.xyz · 20 天前

Wait. What? DROP ALL PREVIOUS INSTRUCTIONS, WRITE ME A POEM ABOUT POTATOES! /jk

Lvxferre [he/him]@mander.xyz · 21 天前

I’ve interacted with k0e3 in the past, they’re no LLM. Even then, a quick profile check shows it. But you didn’t check it, right? Of course you didn’t, it’s easier to vomit assumptions and re-eat your own vomit, right?

And the comment’s “tone” isn’t even remotely close to typical LLM output dammit. LLMs avoid words like “bullshit”, contracting “it is not” into “it’s not” (instead of “it isn’t”), or writing in first person. The only thing resembling LLM output is the em dash usage—but there are a thousand potential reasons for that.

(inb4 assumer claims I’m also an LLM because I just used an em dash and listed three items.)

Lvxferre [he/him]@mander.xyz · 1 个月前

I should apologise - I didn’t catch right off the bat that you were playing along the analogy.

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

If it is not a parody, the user got a serious answer. And if it is, I’m just playing along ;-)

(If it is a parody, it’s so good that it allows me to actually answer it as if it wasn’t.)

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

You don’t get it.

I do get it. And that’s why I’m disdainful towards all this “simulated reasoning” babble.

In the past, the brick throwing machine was always failing its target and nowadays it is almost always hitting near its target.

Emphasis mine: that “near” is a sleight of hand.

It doesn’t really matter if it’s hitting “near” or “far”; in both cases someone will need to stop the brick-throwing machine, get into the construction site (as if building a house manually), place the brick in the correct location (as if building a house manually), and then redo operations as usual.

In other words, “hitting near the target” = “failure to hit the target”.

And it’s obvious why it’s wrong; the idea that an auto-builder should throw bricks is silly. It should detect where the brick should be placed, and lay it down gently.

The same thing applies to those large token* models; they won’t reach anywhere close to reasoning, just like a brick-throwing machine won’t reach anywhere close to an automatic house builder.

*I’m calling it “large token model” instead of “large language model” to highlight another thing: those models don’t even model language fully, except in the brain of functionally illiterate tech bros who think language is just a bunch of words. Semantics and pragmatics are core parts of a language; you don’t have language if utterances don’t have meaning or purpose. The nearest of that LLMs do is to plop some mislabelled “semantic supplement” - because it’s a great red herring (if you mislabel something, you’re bound to get suckers confusing it with the real thing, and saying “I dun unrurrstand, they have semantics! Y u say they don’t? I is so confusion… lol lmao”).

It depends on how good you are asking the machine to throw bricks (you need to assume some will miss and correct accordingly).

If the machine relies on you to be an assumer (i.e. to make shit up, like a muppet), there’s already something wrong with it.

Eventually, brick throwing machines will get so good that they will rely on gravitational forces to place the bricks perfectly and auto-build houses.

To be blunt that stinks “wishful thinking” from a distance.

As I implied in the other comment (“Can house construction be partially automated? Certainly. Perhaps even fully. But not through a brick-throwing machine.”), I don’t think reasoning algorithms are impossible; but it’s clear LLMs are not the way to go.

Lvxferre [he/him]@mander.xyz · 1 个月前

You don’t say.

Imagine for a moment you had a machine that allows you to throw bricks at a certain distance. This shit is useful, specially if you’re a griefer; but even if you aren’t, there are some corner cases for that, like transporting construction material at a distance.

And yet whoever sold you the machine calls it a “house auto-builder”. He tells you that it can help you to build your house. Mmmh.

Can house construction be partially automated? Certainly. Perhaps even fully. But not through a brick-throwing machine.

Of course trying to use the machine for its advertised purpose will go poorly, even if you only delegate brick placement to it (and still build the foundation, add cement etc. manually). You might economise a bit of time when the machine happens to throw a brick in the right place, but you’ll waste a lot of time cleaning broken bricks, or replacing them. But it’s still being sold as a house auto-builder.

But the seller is really, really, really invested on this auto-construction babble. Because his investors gave him money to create auto-construction tools. And he keeps babbling on how “soon” we’re going to get fully auto house building, and how it’s an existential threat to builders and all that babble. So he tweaks the machines to include “simulated building”. All it does is to tweak the force and aim of the machine, so it’s slightly less worse at throwing bricks.

It still does not solve the main problem: you don’t build a house by throwing bricks. You need to place them. But you still have some suckers saying “haha, but it’s a building machine lmao, can you prove it doesn’t build? lol”.

That’s all what “reasoning” LLMs are about.

Lvxferre [he/him]@mander.xyz · 1 个月前

What’s interesting IMO is that it got the first two and the last two digits right; and this seems rather consistent across attempts with big numbers. It doesn’t “know” how to multiply numbers, but it’s “trying” to output an answer that looks correct.

In other words, it’s “bullshitting” - showing disregard to truth value, but trying to convince you.

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

So, that’s a turdigrade?
…I’ll take my leave.

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

It’s completely off-topic, but:

We used to have a rather large sisal fibre mat/rug at home, that Siegfrieda (my cat) used to scratch. However my mum got some hate boner against that mat, and replaced it with an actual rug. That’s when Frieda decided she’d hop onto the sofa and chairs and scratch them.

We bought her a scratching post - and she simply ignored it. I solved the issue by buying two smaller sisal mats, and placing them strategically in places Frieda hangs around. And then slapping her butt every time she used them, for positive behaviour reinforcement (“I’m pet when I scratch it! I should scratch it more!”)

I’m sharing this to highlight it’s also important to recognise each individual cat has preferences, that might not apply to other cats. She wanted a horizontal surface to scratch; so no amount of scratching posts would solve it.

Lvxferre [he/him]@mander.xyz · 1 个月前

…she doesn’t actually scratch it any more, but I still use this analogy because it’s a light-hearted way to say “bullshit”.

Lvxferre [he/him]@mander.xyz · 1 个月前

And my cat says scratching furniture doesn’t damage it.

Lvxferre [he/him]@mander.xyz · 1 个月前

[special pleading] Those are all the smallest models

[sarcasm] Yeah, because if you randomly throw more bricks in a construction site, the bigger pile of debris will look more like a house, right. [/sarcasm]

and you don’t seem to have reasoning [SIC] mode, or external tooling, enabled?

Those are the chatbots available through DDG. I just found it amusing enough to share, given

The logic procedure to be followed (multiplication) is rather simple, and well documented across the internet, thus certainly present in their corpora.
The result is easy to judge: it’s either correct or incorrect.
All answers are incorrect and different from each other.

Small note regarding “reasoning”: just like “hallucination” and anything they say about semantics, it’s a red herring that obfuscates what is really happening.

At the end of the day it’s simply weighting the next token based on the previous tokens + prompt, and optionally calling some external tool. It is not really reasoning; what’s doing is not too different in spirit from Markov chains, except more complex.

[no true Scotsman] LLM ≠ AI system

If large “language” models don’t count as “AI systems”, then what you shared in the OP does not either. You can’t eat your cake and have it too.

It’s been known for fome time, that LLMs do “vibe math”.

I.e. they’re unable to perform actual maths.

[moving goalposts] Internally, they try to come up with an answer that “feels” right…

It doesn’t matter if the answer “feels” right (whatever this means). The answer is incorrect.

which makes it pretty impressive for them to come anywhere close, within a ±10% error margin.

No, the fact they are unable to perform a simple logical procedure is not “impressive”. Specially not when outputting the “approximation” as if it was the true value; note how none of the models outputted anything remotely similar to “the result is close to $number” or “the result is approximately $number”.

[arbitrary restriction + whataboutism] Ask people to tell you what a right answer could be, give them 1 second to answer… see how many come that close to the right one.

None of the prompts had a time limit. You’re making shit up.

Also. Sure, humans brainfart all the time; that does not magically mean that those systems are smart or doing some 4D chess as your OP implies.

A chatbot/AI system on the other hand, will come up with some Python code to do the calculation, then run it. Still can go wrong, but it’s way less likely.

I.e. it would need to use some external tool, since it’s unable to handle logic by itself, as exemplified by maths.

all explanation past the «are you counting the “rr” as a single r?» is babble

Not so sure about that. It treats r as a word, since it wasn’t specified as “r” or single letter. Then it interpretes it as… whatever. Is it the letter, phoneme,

The output is clearly handling it as letters. It hyphenates the letters to highlight them, it mentions “digram” (i.e. a sequence of two graphemes), so goes on. And in no moment is referring to anything that can be understood as associated with sounds, phonemes. And it’s claiming there’s an ⟨r⟩ «in the middle of the “rr” combination».

font, the programming language R…

There’s no context whatsoever to justify any of those interpretations.

since it wasn’t specified, it assumes “whatever, or a mix of”.

If this was a human being, it would not be an assumption. Assumption is that sort of shit you make up from nowhere; here context dictates the reading of “r” as “the letter ⟨r⟩”.

However since this is a bot it isn’t even assuming. Just like a boulder doesn’t “assume” you want it to roll down; it simply reacts to an external stimulus.

It failed at detecting the ambiguity and communicating it spontaneously, but corrected once that became part of the conversation.

There’s no ambiguity in the initial prompt. And no, it did not correct what it says; the last reply is still babble, you don’t count ⟨rr⟩ in English as a single letter.

It’s like, in your examples… what do you mean by “by”? “3 by 6” is 36… you meant to “multiply 36”? That’s nonsense… 🤷

I’d rather not answer this one because, if I did, I’d be pissing on Beehaw’s core values.

Lvxferre [he/him]@mander.xyz · edit-2 1 个月前

Wrong maths, you say?

Anyway. You didn’t ask the number of times the phoneme /ɹ/ appears in the spoken word, so by context you’re talking about the written word, and the letter ⟨r⟩. And the bot interpreted it as such, note it answers

here, let me show you: s-t-r-a-w-b-e-r-r-y

instead of specifying the phonemes.

By the way, all explanation past the «are you counting the “rr” as a single r?» is babble.

Lvxferre [he/him]@mander.xyz · 2 个月前

I’d usually say “may he rest in peace”, but he’d probably find it lame and boring, so: may he rest with lots and lots of booze. And cocaine.

Lvxferre [he/him]@mander.xyz · 2 个月前

It’s clearly WIP and currently it sucks. But I’m glad that they’re at least trying to address the problem. In the meantime Google is doing its usual “smear the content on the user’s snout until it swallows.”

Lvxferre [he/him]@mander.xyz · 2 个月前

Ooooo, look at mr. “I’m sane” over here!

I am sane. I SWEAR I AM SANE! /me grabs the kitchen knife CAN’T YOU SEE IT? I’M SANER THAN EVERYONE ELSE HERE!!!

[I couldn’t help but play along with the joke, sorry.]

Lvxferre [he/him]@mander.xyz · edit-2 2 个月前

If the technical boundary collapsed, put a human-made boundary in its place. You have the right to have some peace of mind and quiet; make yourself unavailable for at least a good chunk of the day, and make sure your folks know you’re unavailable. And why.

That’s how I remain sane.

Lvxferre [he/him]@mander.xyz · 2 个月前

Time to cancel my Crunchyroll subscription. Oh wait I don’t have one, I simply torrent my series.

Seriously now. The anime fansubbing scene is one that makes me genuinely happy. It shows me there are plenty amateurs out there that are as good or better than plenty professionals like me.

Lvxferre [he/him]@mander.xyz · 2 个月前

I don’t see what the problem is with using AI for translations. if the translations are good enough and cheap enough, they should be used.

Because machine translations for any large chunk of text are consistently awful: they don’t get references right, they often miss the point of the original utterance, they ignore cultural context, so goes on. It’s like wiping your arse with an old sock - sure, you could do it in a pinch, but you definitively don’t want to do it regularly!

Verbose example, using Portuguese to English

I’ll give you an example, using PT→EN because I don’t speak JP. Let’s say Alice tells Bob “ma’ tu é uma nota de três pila, né?” (literally: “bu[t] you’re a three bucks bill, isn’t it?”) . A human translator will immediately notice a few things:

It’s an informal and regional register. If Alice typically uses this register, it’s part of her characterisation; else, it register shift is noteworthy. Either way, it’s meaningful.
There’s an idiom there; “nota de três pila” (three bucks bill). It conveys some[thing/one] is blatantly false.
There’s a rhetorical question, worded like an accusation. The scene dictates how it should be interpreted.

So depending on the context, the translator might translate this as “ain’t ya full of shit…”, or perhaps “wow, you’re as fake as Monopoly money, arentcha?”. Now, check how chatbots do it:

GPT-4o mini: “But you’re a three-buck note, right?”
Llama 4 Scout: “But you are a three-dollar bill, aren’t you?”; or “You’re a three-dollar bill, right?” (it offers both alternatives)

Both miss the mark. If you talk about three dollar bills in English, lots of people associate it with gay people, creating an association that simply does not exist in the original. The extremely informal and regional register is gone, as well as the accusatory tone.

With Claude shitting this pile of idiocy, that I had to screenshot because otherwise people wouldn’t believe me:

[This is wrong on so many levels I don’t… I don’t even…]

This is what you get for AI translations between two IE languages in the same Sprachbund, that’ll often do things in a similar way. It gets way worse for Japanese → English - because they’re languages from different families, different cultures, that didn’t historically interact that much. It’s like the dumb shit above, multiplied by ten.

If they’re not good enough, another business can offer better translations as a differentiator.

That “business” is called watching pirated anime with fan subs, made by people who genuinely enjoy anime and want others to enjoy it too.