Code Yarns ‍👨‍💻
Tech BlogPersonal Blog

nvidia-smi cheatsheet

📅 2019-Apr-26 ⬩ ✍️ Ashwin Nanjappa ⬩ 📚 Archive

nvidia-smi (NVIDIA System Management Interface) is a tool to query, monitor and configure NVIDIA GPUs. It ships with and is installed along with the NVIDIA driver and it is tied to that specific driver version. It is a tool written using the NVIDIA Management Library (NVML).

Query status of GPUs

$ nvidia-smi

This outputs a summary table, where I find the following information useful:

Query parameters of GPUs

$ nvidia-smi -q

Some of the information I find useful in this is:

To query the parameters of a particular GPU, use its index:

$ nvidia-smi -q -i 9

Query supported clock values

To list the supported pairs of memory and graphics clock values:

$ nvidia-smi -q -d SUPPORTED_CLOCKS
$ nvidia-smi -q -d SUPPORTED_CLOCKS -i 3

Typically, there are only a few supported memory clock values, while the number of supported graphics clock values is high with a fine granularity.

Query current values of GPUs

To view how much power is being consumed by GPUs in watts:

$ nvidia-smi --query-gpu=gpu_name,power.draw --format=csv

To view the VBIOS version of your GPUs:

$ nvidia-smi --query-gpu=gpu_name,vbios_version --format=csv

To get a continuous update of power consumed, GPU and memory temperatures, and current GPU and memory clock values:

$ nvidia-smi dmon -s pc

To view all the properties, like gpu_name, power.draw or vbios_version, that can be queried for your GPUs:

$ nvidia-smi --help-query-gpu

Set GPU clocks

$ sudo nvidia-smi -rac
$ sudo nvidia-smi -i 9 -rac
$ sudo nvidia-smi -rgc
$ sudo nvidia-smi -i 9 -rgc
$ sudo nvidia-smi -pm 0
$ sudo nvidia-smi -i 9 -pm 0
$ sudo nvidia-smi -pm 1
$ sudo nvidia-smi -i 9 -pm 1

It is recommended to enable persistence mode before locking clocks.

$ sudo nvidia-smi --auto-boost-default=DISABLED

It is recommended to do this before locking clocks.

$ sudo nvidia-smi -i 9 -ac 1215,900
$ sudo nvidia-smi -i 9 -lgc 900

MIG

Multi-Instance GPU (MIG) is a feature on the A100 GPU to slice it into GPU instances and GPU instances into compute instances.

$ nvidia-smi -mig=1
$ nvidia-smi -mig=0
$ nvidia mig -lgi
$ nvidia mig -lci

+-------------------------------------------------------+
| Compute instances:                                    |
| GPU     GPU       Name             Profile   Instance |
|       Instance                       ID        ID     |
|         ID                                            |
|=======================================================|
|   0      7       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0      8       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0      9       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     11       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     12       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     13       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+
|   0     14       MIG 1g.5gb           0         0     |
+-------------------------------------------------------+