Solve multi-GPU not detected Docker-in-Docker Google Cloud

When I’m trying to do nvidia-smi inside the docker for multiple-gpus, its gave errors. I’m using docker API python module to run it. Checking on nvidia-gpus, its showing only single device, rather multiple

ls /proc/driver/nvidia/gpus

Solution is to ensure the gpus=all or gpus=2 is initialize properly. Running the docker manually first using

docker run --name caviar --detach --gpus all -it --privileged ghcr.io/ehfd/nvidia-dind:latest

This step showing all the GPUs is loaded. Then, the culprit is at Docker API. the proper way to do it is

runtime_option = "nvidia" if any("--gpus" in opt for opt in cfg.docker.options) else None

    # Run the new container
    container = client.containers.run(
        image=image_name,
        name=container_name,
        detach=True,  # Run in background
        tty=True,
        stdin_open=True,
        volumes=volumes,
        runtime=runtime_option if runtime_option else None,  # Only add if necessary
        privileged=True,  # Equivalent to --privileged
        device_requests=[docker.types.DeviceRequest(count=-1, capabilities=[["gpu"]])]  # Use all GPUs
    )

    print(f"✅ Container started successfully: {container.short_id}")

Leave a Reply Cancel reply