NVIDIA/mig-parted

gpu-operator creates ci using mig Insufficient Resources

asskss opened this issue · 3 comments

asskss commented

mig-config.yaml

    mig-configs:
      custom-config: 
        - devices: [0]
          mig-enabled: false
        - devices: [1]     
          mig-enabled: true   
          mig-devices:
            "7g.80gb": 1    
        - devices: [2]
          mig-enabled: true
          mig-devices:
            "2g.20gb": 3
        - devices: [3]      
          mig-enabled: true 
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1    
        - devices: [4]      
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1    
        - devices: [5]      
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1
        - devices: [6]
          mig-enabled: true
          mig-devices:
            "3g.40gb": 1
            "4g.40gb": 1 
        - devices: [7]
          mig-enabled: true
          mig-devices:
            "1g.10gb": 1
            "2g.20gb": 1
            "4g.40gb": 1 

nvidia-smi mig -lcip -gi 0

+--------------------------------------------------------------------------------------+
| Compute instance profiles:                                                           |
| GPU     GPU       Name             Profile  Instances   Exclusive       Shared       |
|       Instance                       ID     Free/Total     SM       DEC   ENC   OFA  |
|         ID                                                          CE    JPEG       |
|======================================================================================|
|   0      0       MIG 1c.7g.80gb       0      0/7           14        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 2c.7g.80gb       1      0/3           28        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 3c.7g.80gb       2      0/2           42        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 4c.7g.80gb       3      0/1           56        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   0      0       MIG 7g.80gb          4*     0/1           98        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 1c.7g.80gb       0      0/7           14        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 2c.7g.80gb       1      0/3           28        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 3c.7g.80gb       2      0/2           42        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 4c.7g.80gb       3      0/1           56        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+
|   1      0       MIG 7g.80gb          4*     0/1           98        5     0     1   |
|                                                                      7     1         |
+--------------------------------------------------------------------------------------+

nvidia-smi mig -lci -gi 0

+--------------------------------------------------------------------+
| Compute instances:                                                 |
| GPU     GPU       Name             Profile   Instance   Placement  |
|       Instance                       ID        ID       Start:Size |
|         ID                                                         |
|====================================================================|
|   0      0       MIG 7g.80gb          4         0          0:7     |
+--------------------------------------------------------------------+
|   1      0       MIG 7g.80gb          4         0          0:7     |
+--------------------------------------------------------------------+

nvidia-smi mig -cci 2 -gi 0

Unable to create a compute instance on GPU  0 GPU instance ID  0 using profile 2: Insufficient Resources
Failed to create compute instances: Insufficient Resources

I want to create a MIG 3c.7g/80gb specification prompt for Independent Resources.How to solve it.

elezar commented

Note that as shown, both GPUs already have [7c.]7g.80gb partitions created on them. This means that the ADDITIONAL 3c.7g.80gb partition cannot be created.

Note that since you mention the GPU Operator, the use of partitions where c != g are not currently supported there, but they should be in mig-parted.

Could you give more details on your use case?

asskss commented

Note that as shown, both GPUs already have [7c.]7g.80gb partitions created on them. This means that the ADDITIONAL 3c.7g.80gb partition cannot be created.

Note that since you mention the GPU Operator, the use of partitions where c != g are not currently supported there, but they should be in mig-parted.

Could you give more details on your use case?

Thanks, my usage scenario is K8S.

  1. My needs: I hope that the two models can be deployed mixedly through 4c.7g.80gb and 3c.7g.80gb. Is this mode shared video memory?
  2. If two models are deployed on one card, are they scheduled fairly or preempted freely when using CUDA?