Yes, this is totally possible and I did it for a couple of years with OPNsense. I actually had an OPNsense box and a pfSense box both on Hyper-V. I could toggle between them easily and it worked well. There are CPU considerations which depend on your traffic load. Security is not an issue as long as you have the network interface assignments correct and have not accidentally attached the WAN interface to any other guest VM’s.
Unfortunately, when I upgraded to 1Gb/s (now 2Gb/s) on the WAN, the VM could not keep up. No amount of tuning in the Hyper-V host (dual Xeon 3GHz) or the VM could resolve the poor throughput. I assume it came down to the 10Gb NICs and their drivers, or the Hyper-V virtual switch subsystem. Depending on what hardware offload and other tuning settings I tried, I would get perfect throughput one way, but terrible performance in the other direction, or some compromise in between on either side. There was a lot of iperf3 testing involved. I don’t blame OPNsense/pfSense – these issues impacted any 10Gb links attached to VM’s.
Ultimately, I eliminated the virtual router and ended up where you are, with a baremetal pfSense on a much less powerful device (Intel Atom-based). I’m still not happy with it – getting a full 2Gb/s up and down is hard.
Aside from performance, one of the other reasons for moving the firewall back to a dedicated unit was that I wanted to isolate it from any issues that might impact the host. The firewall is such a core component of my network, and I didn’t like it going offline when I needed to reboot the server.
You would need to run the LLM on the system that has the GPU (your main PC). The front-end (typically a WebUI) could run in a docker container and make API calls to your LLM system. Unfortunately that requires the model to always be loaded in the VRAM on your main PC, severely reducing what you can do with that computer, GPU-wise.