When I’m trying to do nvidia-smi
inside the docker for multiple-gpus, its gave errors. I’m using docker API python module to run it. Checking on nvidia-gpus, its showing only single device, rather multiple
ls /proc/driver/nvidia/gpus
Solution is to ensure the gpus=all
or gpus=2
is initialize properly. Running the docker manually first using
docker run --name caviar --detach --gpus all -it --privileged ghcr.io/ehfd/nvidia-dind:latest
This step showing all the GPUs is loaded. Then, the culprit is at Docker API. the proper way to do it is
runtime_option = "nvidia" if any("--gpus" in opt for opt in cfg.docker.options) else None
# Run the new container
container = client.containers.run(
image=image_name,
name=container_name,
detach=True, # Run in background
tty=True,
stdin_open=True,
volumes=volumes,
runtime=runtime_option if runtime_option else None, # Only add if necessary
privileged=True, # Equivalent to --privileged
device_requests=[docker.types.DeviceRequest(count=-1, capabilities=[["gpu"]])] # Use all GPUs
)
print(f"✅ Container started successfully: {container.short_id}")