GGML still runs on llama.cpp, and that still requires CUDA to be installed, unfortunately. I saw a PR for DirectML, but I'm not really holding my breath.
Yeah, I researched this and I absolutely missed this whole part. To my defense I looked into this in 2023 which is ages ago :)
Looks like local models are getting much more mature.