CUDA not permitted
-
-
After some investigation, it turns out that one has to use a custom docker version provided by nvidia to add support for this. https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#getting-started and https://github.com/NVIDIA/nvidia-docker
So this means that we have to somehow add support on the Cloudron platform side as well as make sure things are working in nvidia and then also non-nvidia case, given that docker parts have to be compatible with whatever the server currently provides hardware wise.
So this is a bit out of scope for the coming release but hopefully we can get this supported afterwards.
-
I'm surprised I missed that, but I'm also surprised Nvidia had to make this difficult. Going forward I'm willing to provide you the access you need for testing implementation whenever that can happen. I'll have to live with none hardware acceleration for now.
-
@nebulon this appears to use a docker configuration in privileged mode to get access to the hardware device, which goes against the cloudron use case and security posture.
A better approach would be to use a different
runc
such as sysbox-runc that can solve that issue without requiring privileged mode. -
@nebulon it looks like that nvidia-docker has been superseded by nvidia container toolkit
https://github.com/NVIDIA/nvidia-container-toolkit
Does that make the issue go away or change anything at all ?
I'm also happy to allow you access to my cloudron server with nvidia GPU to try out.I've got the nvidia drivers installed, and nvidia container toolkit installed also, but...
In this guide
https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
I have stopped at running the configuration commandsudo nvidia-ctk runtime configure --runtime=docker
because the file /etc/docker/daemon.json doesn't exist... (and I can't find it anywhere else either) which leads me to think it's not going to work.
Also theres nothing explaining what effect it will have on the existing docker containers, I'm assuming it will add access to the GPU to all containers. -
Did it make any difference?
Assuming its functioning I would think the docker start would still have to be modified.
ie docker run --gpu all <container> where ever that is.