• 4 Posts
  • 384 Comments
Joined 1 year ago
cake
Cake day: March 22nd, 2024

help-circle




  • Yeah. But it also messes stuff up from the llama.cpp baseline, and hides or doesn’t support some features/optimizations, and definitely doesn’t support the more efficient iq_k quants of ik_llama.cpp and its specialzied MoE offloading.

    And that’s not even getting into the various controversies around ollama (like broken GGUFs or indications they’re going closed source in some form).

    …It just depends on how much performance you want to squeeze out, and how much time you want to spend on the endeavor. Small LLMs are kinda marginal though, so IMO its important if you really want to try; otherwise one is probably better off spending a few bucks on an API that doesn’t log requests.




  • At risk of getting more technical, ik_llama.cpp has a good built in webui:

    https://github.com/ikawrakow/ik_llama.cpp/

    Getting more technical, its also way better than ollama. You can run models way smarter than ollama can on the same hardware.

    For reference, I’m running GLM-4 (667 GB of raw weights) on a single RTX 3090/Ryzen gaming rig, at reading speed, with pretty low quantization distortion.

    And if you want a ‘look this up on the internet for me’ assistant (which you need for them to be truly useful), you need another docker project as well.

    …That’s just how LLM self hosting is now. It’s simply too hardware intense and ad hoc to be easy and smart and cheap. You can indeed host a small ‘default’ LLM without much tinkering, but its going to be pretty dumb, and pretty slow on ollama defaults.





  • Oh, and there are other graphics makers that could theoretically work on linux, like Imagination’s PowerVR, and some Chinese startups. Qualcomm’s already trying to push into laptops with Adreno (which has roots in AMD/ATI, hence ‘Adreno’ is an anagram for ‘Radeon’)

    The problem is making a desktop-sized GPU has a massive capital cost (over $1,000,000,000, maybe even tens of billions these days) just to ‘tape out’ a single chip, much less a line, and AMD/Nvidia are just so far ahead in terms of architecture. It’s basically uneconomical to catch up without a massive geopolitical motivation like there is in China.






  • But nothing is standard.

    As an example from this last week, I tried to install something with a poetry install procedure… didn’t work. In a nutshell, apparently a bunch of stuff in poetry is ancient and doesn’t even work with this git repo anymore. Or maybe not my system? I can’t tell.

    So I tried uv. Worked amazing… Until I tried to run the project. Apparently some dependency of matplotlib uses Python C libraries in a really bizzare nonstandard way, so the slight discrepency broke an import, which broke the library, which broke the whole project on startup.

    So I bet the bullet, cleared a bunch of disk space and installed conda instead, the repo’s other official recipe. Didn’t freakin’ work out of the box either. I finally got it to work with some manual package version swapping, though.

    And there was, of course, zero hope of doing any of this with actual pip, apparently.

    At this point I wasn’t even excited to test the project anymore, and went to bed.


  • The character swapping really isn’t accomplishing much.

    • Speaking from experience, if I’m finetuning an LLM Lora or something, bigger models will ‘understand’ the character swaps anyway, just like they abstract different languages into semantic meaning. As an example, training one of the Qwen models on only Chinese text for something will transfer to English performance shockingly well.

    • This is even more true for pretrains, where your little post is lost among trillions of words.

    • If it’s a problem, I can just swap words out in the tokenizer. Or add ‘oþer’ or even individual characters to the banned strings list.

    • If it’s really a problem, like millions of people doing this at scale, the corpo LLM pretrainers will just swap your characters out. It’s trivial to do.

    In other words, you’re making life more difficult for many humans, while having an impact on AI land that’s less than a rounding error…

    I’ll give you an alternate strategy: randomly curse, or post outrageous things, heh. Be politically incorrect. Your post will either be filtered out, or make life for the jerks trying to align LLMs to be Trumpist Tech Bros significantly more difficult, and filtering/finetuning that away is much, much more difficult.



  • through phone if you have a phone on your water account, through a system no one knew existed

    I interpreted this as one system. So its:

    • Water website, you’d have to happen to stumble upon

    • Obscure opt-in phone system

    • Facebook

    If that’s the case, the complaint is reasonable, as the water service is basically assuming Facebook (and word of mouth) are the only active notifications folks need.

    But yeah, if OP opted out of SMS warnings or something, that’s more on them.