

Force transcoding and burn in subtitles
EDIT: explain the downvote? This is the correct solution lol it’s not a problem with your client
Force transcoding and burn in subtitles
EDIT: explain the downvote? This is the correct solution lol it’s not a problem with your client
Returning the shopping cart
Dunno why this is downvoted because RAG is the correct answer. Fine tuning/training is not the tool for this job. RAG is.
starting their tech life from scratch
Lol that’s an exaggeration
It’s the pro driver for workstation use. If you are gaming then you don’t need it. The gaming driver is only open source
I don’t want any proprietary drivers
So then you don’t want any NVIDIA.
The AMD open source Linux driver performs better than their Windows driver. And there is no proprietary AMD Linux driver, the official AMD driver for Linux is open source.
You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file
What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching
Can you try setting the num_ctx
and num_predict
using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter
Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization
I have no problems with changing num_ctx or num_predict
Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit
Ummm… did you try /set parameter num_ctx #
and /set parameter num_predict #
? Are you using a model that actually supports the context length that you desire…?
This is plainly incorrect
EDIT: nvm OP has edited the post
I agree. My blocklist has gotten significantly larger since this happened
The majority of reddit content is bots, shills, and astroturfing
“So anyway, here take a billion dollars worth of bombs”
It has value in natural language processing, like turning unstructured natural language data into structured data. Not suitable for all situations though, like situations that cannot tolerate hallucinations.
Its also good for reorganizing information and presenting it in a different format; and also classification of semantic meaning of text. It’s good for pretty much anything dealing with semantic meaning, really.
I see people often trying to use generative AI as a knowledge store, such as asking an AI assistant factual questions, but this is an invalid usecase.
you can tell whoever wrote this has never run that command
Uh… isn’t that a good thing?
My guess is an x86 32bit machine
The performance hit during play and scrubbing is due to need to transcode the video as it burns the subtitles in. If you have hardware acceleration for transcoding configured properly, it shouldn’t be a big impact on modern hardware