📅 2020-Dec-15 ⬩ ✍️ Ashwin Nanjappa ⬩ 🏷️ docker, mig, nvidia-smi ⬩ 📚 Archive
Multi-Instance GPU (MIG) is a feature on the A100 GPU to slice it into GPU instances and GPU instances into compute instances. Note that many of the commands listed below might need to be run as sudo
.
$ sudo nvidia-smi -mig 1
$ sudo nvidia-smi -mig 0
$ sudo nvidia-smi -i 3 -mig 1
$ sudo nvidia-smi -i 3 -mig 0
$ sudo nvidia-smi mig -lgip
+--------------------------------------------------------------------------+
| GPU instance profiles: |
| GPU Name ID Instances Memory P2P SM DEC ENC |
| Free/Total GiB CE JPEG OFA |
|==========================================================================|
| 0 MIG 1g.5gb 19 7/7 4.75 No 14 0 0 |
| 1 0 0 |
+--------------------------------------------------------------------------+
| 0 MIG 2g.10gb 14 3/3 9.75 No 28 1 0 |
| 2 0 0 |
+--------------------------------------------------------------------------+
| 0 MIG 3g.20gb 9 2/2 19.62 No 42 2 0 |
| 3 0 0 |
+--------------------------------------------------------------------------+
| 0 MIG 4g.20gb 5 1/1 19.62 No 56 2 0 |
| 4 0 0 |
+--------------------------------------------------------------------------+
| 0 MIG 7g.40gb 0 1/1 39.50 No 98 5 0 |
| 7 1 1 |
+--------------------------------------------------------------------------+
$ sudo nvidia-smi mig -lgipp
GPU 0 Profile ID 19 Placements: {0,1,2,3,4,5,6}:1
GPU 0 Profile ID 14 Placements: {0,2,4}:2
GPU 0 Profile ID 9 Placements: {0,4}:4
GPU 0 Profile ID 5 Placement : {0}:4
GPU 0 Profile ID 0 Placement : {0}:8
$ sudo nvidia-smi mig -cgi 19,19,19,19,19,19,19 -i 3
Successfully created GPU instance ID 13 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 11 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 12 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 7 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 8 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 9 on GPU 0 using profile MIG 1g.5gb (ID 19)
Successfully created GPU instance ID 10 on GPU 0 using profile MIG 1g.5gb (ID 19)
$ sudo nvidia-smi mig -cci -i 3
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 7 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 8 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 9 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 10 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 11 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 12 using profile MIG 1g.5gb (ID 0)
Successfully created compute instance ID 0 on GPU 0 GPU instance ID 13 using profile MIG 1g.5gb (ID 0)
$ nvidia-smi
+-----------------------------------------------------------------------------+
| MIG devices: |
+------------------+----------------------+-----------+-----------------------+
| GPU GI CI MIG | Memory-Usage | Vol| Shared |
| ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG|
| | | ECC| |
|==================+======================+===========+=======================|
| 0 7 0 0 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 8 0 1 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 9 0 2 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 10 0 3 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 11 0 4 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 12 0 5 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
| 0 13 0 6 | 1MiB / 4864MiB | 14 0 | 1 0 0 0 0 |
| | 0MiB / 8191MiB | | |
+------------------+----------------------+-----------+-----------------------+
$ nvidia-smi -L
GPU 0: A100-PCIE-40GB (UUID: GPU-0069414c-9f30-41f9-d5d8-87890423f0c4)
MIG 1g.5gb Device 0: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/7/0)
MIG 1g.5gb Device 1: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/8/0)
MIG 1g.5gb Device 2: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/9/0)
MIG 1g.5gb Device 3: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/10/0)
MIG 1g.5gb Device 4: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/11/0)
MIG 1g.5gb Device 5: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/12/0)
MIG 1g.5gb Device 6: (UUID: MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/13/0)
$ sudo nvidia mig -lgi
$ sudo nvidia mig -lci
+-------------------------------------------------------+
| Compute instances: |
| GPU GPU Name Profile Instance |
| Instance ID ID |
| ID |
|=======================================================|
| 0 7 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 8 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 9 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 11 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 12 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 13 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
| 0 14 MIG 1g.5gb 0 0 |
+-------------------------------------------------------+
$ sudo nvidia-smi mig -dci -i 3
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 7
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 8
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 9
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 10
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 11
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 12
Successfully destroyed compute instance ID 0 from GPU 0 GPU instance ID 13
$ sudo nvidia-smi mig -dgi -i 3
Successfully destroyed GPU instance ID 7 from GPU 0
Successfully destroyed GPU instance ID 8 from GPU 0
Successfully destroyed GPU instance ID 9 from GPU 0
Successfully destroyed GPU instance ID 10 from GPU 0
Successfully destroyed GPU instance ID 11 from GPU 0
Successfully destroyed GPU instance ID 12 from GPU 0
Successfully destroyed GPU instance ID 13 from GPU 0
$ nvidia-smi mig -h
mig -- Multi Instance GPU management.
Usage: nvidia-smi mig [options]
Options include:
[-h | --help]: Display help information.
[-i | --id]: Enumeration index, PCI bus ID or UUID.
Provide comma separated values for more than one device.
[-gi | --gpu-instance-id]: GPU instance ID.
Provide comma separated values for more than one GPU instance.
[-ci | --compute-instance-id]: Compute instance ID.
Provide comma separated values for more than one compute
instance.
[-lgip | --list-gpu-instance-profiles]: List supported GPU instance profiles.
Option -i can be used to restrict the command to
run on a specific GPU.
[-lgipp | --list-gpu-instance-possible-placements]: List possible GPU instance placements
in the following format, {Start}:Size.
Option -i can be used to restrict the
command to run on a specific GPU.
[-C | --default-compute-instance]: Create compute instance with the default profile when used
with the option to create a GPU instance (-cgi).
[-cgi | --create-gpu-instance]: Create GPU instance for the given profile names or IDs.
Provide comma separated values for more than one profile.
Option -i can be used to restrict the command to run on
a specific GPU.
[-dgi | --destroy-gpu-instance]: Destroy GPU instances.
Options -i and -gi can be used individually or combined
to restrict the command to run on a specific GPU or GPU
instance.
[-lgi | --list-gpu-instances]: List GPU instances.
Option -i can be used to restrict the command to run on a
specific GPU.
[-lcip | --list-compute-instance-profiles]: List supported compute instance profiles.
Options -i and -gi can be used individually or
combined to restrict the command to run on a
specific GPU or GPU instance.
[-cci | --create-compute-instance]: Create compute instance for the given profile name or IDs.
Provide comma separated values for more than one profile.
If no profile name or ID is given, then the default*
compute instance profile ID will be used. Options -i and
-gi can be used individually or combined to restrict the
command to run on a specific GPU or GPU instance.
[-dci | --destroy-compute-instance]: Destroy compute instances.
Options -i, -gi and -ci can be used individually or
combined to restrict the command to run on a specific
GPU or GPU instance or compute instance.
[-lci | --list-compute-instances]: List compute instances.
Options -i and -gi can be used individually or combined
to restrict the command to run on a specific GPU or GPU
instance.
To expose a MIG instance to a Docker container, add --gpus='"device=MIG-GPU-0069414c-9f30-41f9-d5d8-87890423f0c4/7/0"'
to your docker --runtime=nvidia ...
command.
Multiple MIG instances can be specified by separating with commas, like --gpus='"device=0:0,0:1"'
for example.