Unfortunately that’s not really relevant to LLMs beyond inserting things into the text you feed them. For every single word they predict, they make a pass through the multi-gigabyte weights. Its largely memory bound, and not integrated with any kind of sane external memory algorithm.
There are some techniques that muddy this a bit, like MoE and dynamic lora loading, but the principle is the same.
Unfortunately that’s not really relevant to LLMs beyond inserting things into the text you feed them. For every single word they predict, they make a pass through the multi-gigabyte weights. Its largely memory bound, and not integrated with any kind of sane external memory algorithm.
There are some techniques that muddy this a bit, like MoE and dynamic lora loading, but the principle is the same.