@theunknownmuncher

theunknownmuncher@lemmy.world · 7 days ago

The performance hit during play and scrubbing is due to need to transcode the video as it burns the subtitles in. If you have hardware acceleration for transcoding configured properly, it shouldn’t be a big impact on modern hardware

theunknownmuncher@lemmy.world · edit-2 7 days ago

Force transcoding and burn in subtitles

EDIT: explain the downvote? This is the correct solution lol it’s not a problem with your client

theunknownmuncher@lemmy.world · 9 days ago

Returning the shopping cart

theunknownmuncher@lemmy.world · 21 days ago

Dunno why this is downvoted because RAG is the correct answer. Fine tuning/training is not the tool for this job. RAG is.

theunknownmuncher@lemmy.world · 23 days ago

starting their tech life from scratch

Lol that’s an exaggeration

theunknownmuncher@lemmy.world · 3 months ago

It’s the pro driver for workstation use. If you are gaming then you don’t need it. The gaming driver is only open source

theunknownmuncher@lemmy.world · edit-2 3 months ago

I don’t want any proprietary drivers

So then you don’t want any NVIDIA.

The AMD open source Linux driver performs better than their Windows driver. And there is no proprietary AMD Linux driver, the official AMD driver for Linux is open source.

theunknownmuncher@lemmy.world · 3 months ago

You can overwrite the model by using the same name instead of creating one with a new name if it bothers you. Either way there is no duplication of the llm model file

theunknownmuncher@lemmy.world · 3 months ago

What I am talking about is when layers are split across GPUs. I guess this is loading the full model into each GPU to parallelize layers and do batching

theunknownmuncher@lemmy.world · edit-2 3 months ago

Can you try setting the num_ctx and num_predict using a Modelfile with ollama? https://github.com/ollama/ollama/blob/main/docs/modelfile.md#parameter

theunknownmuncher@lemmy.world · edit-2 3 months ago

Are you using a tiny model (1.5B-7B parameters)? ollama pulls 4bit quant by default. It looks like vllm does not used quantized models by default so this is likely the difference. Tiny models are impacted more by quantization

I have no problems with changing num_ctx or num_predict

theunknownmuncher@lemmy.world · 3 months ago

Models are computed sequentially (the output of each layer is the input into the next layer in the sequence) so more GPUs do not offer any kind of performance benefit

theunknownmuncher@lemmy.world · edit-2 3 months ago

Ummm… did you try /set parameter num_ctx # and /set parameter num_predict #? Are you using a model that actually supports the context length that you desire…?

theunknownmuncher@lemmy.world · edit-2 3 months ago

This is plainly incorrect

EDIT: nvm OP has edited the post

theunknownmuncher@lemmy.world · 4 months ago

I agree. My blocklist has gotten significantly larger since this happened

theunknownmuncher@lemmy.world · 4 months ago

The majority of reddit content is bots, shills, and astroturfing

theunknownmuncher@lemmy.world · 4 months ago

“So anyway, here take a billion dollars worth of bombs”

theunknownmuncher@lemmy.world · 4 months ago

It has value in natural language processing, like turning unstructured natural language data into structured data. Not suitable for all situations though, like situations that cannot tolerate hallucinations.

Its also good for reorganizing information and presenting it in a different format; and also classification of semantic meaning of text. It’s good for pretty much anything dealing with semantic meaning, really.

I see people often trying to use generative AI as a knowledge store, such as asking an AI assistant factual questions, but this is an invalid usecase.

theunknownmuncher@lemmy.world · 4 months ago

you can tell whoever wrote this has never run that command

Uh… isn’t that a good thing?

theunknownmuncher@lemmy.world · 4 months ago

My guess is an x86 32bit machine