

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
Can you try setting the num_ctx
and num_predict
using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter
Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization
I have no problems with changing num_ctx or num_predict
Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit
Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
That’s great! Hopefully it shows up on F-Droid sometime soon
I’m a big limoncello fan, I wonder if it will turn out similar
Love to see it
Reading comprehension
But saying that a country can’t do anything about espionage unless they pass that law is unrealistic.
Who said that?
No I’m not conflating anything, you’re just moving the goalpost…
from
accountable for data privacy and misinformation/election interference violations.
to
ownership of media and telecommunication infrastructure
People can still do murder even though its illegal and most murderers are never caught, so we shouldn’t have laws making murder illegal because it doesn’t “solve” murder
would reduce not eliminate the problem
🙂 perfect is the enemy of good. I don’t think we’re going to “eliminate” espionage, something that has existed for all of written history…
I would expect a meaningful data privacy law would involve forcing the client software to be audited to ensure they aren’t collecting the information in the first place?
They are while doing business in the USA
They all suck, yeah. I think banning individual social media services is not the solution. The solution is to create meaningful laws that hold any company, Chinese or American, accountable for data privacy and misinformation/election interference violations.
So what is considered perfectly fine for Facebook and Twitter to do, got it
It has value in natural language processing, like turning unstructured natural language data into structured data. Not suitable for all situations though, like situations that cannot tolerate hallucinations.
Its also good for reorganizing information and presenting it in a different format; and also classification of semantic meaning of text. It’s good for pretty much anything dealing with semantic meaning, really.
I see people often trying to use generative AI as a knowledge store, such as asking an AI assistant factual questions, but this is an invalid usecase.
My guess is an x86 32bit machine
You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file