Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Statistics of other types of gpu #2484

Open
1 of 2 tasks
zjj2wry opened this issue Oct 31, 2024 · 3 comments · May be fixed by #2631
Open
1 of 2 tasks

Statistics of other types of gpu #2484

zjj2wry opened this issue Oct 31, 2024 · 3 comments · May be fixed by #2631
Assignees
Labels
enhancement New feature or request raycluster

Comments

@zjj2wry
Copy link

zjj2wry commented Oct 31, 2024

Search before asking

  • I had searched in the issues and found no similar feature requirement.

Description

if strings.HasSuffix(string(key), "gpu") && !val.IsZero() {

if use aliyun k8s gpu share, gpu key is aliyun.com/gpu-mem

    workerGroupSpecs:
            resources:
              limits:
                aliyun.com/gpu-mem: "1"
                cpu: "1"
                memory: 2Gi
              requests:
                aliyun.com/gpu-mem: "1"
                cpu: "1"
                memory: 2Gi

autoscaler will not work when request gpu resource

(autoscaler +3m13s) Error: No available node types can fulfill resource request {'GPU': 1.0, 'CPU': 1.0}. Add suitable node types to this cluster to resolve this issue.

code:

import ray
import time

ray.init()

@ray.remote(num_gpus=1) 
def gpu_task():
    import torch
    x = torch.rand(10000, 10000).cuda()  
    y = torch.mm(x, x) 
    return y.sum().item()

future = gpu_task.remote()
result = ray.get(future)

print("Result:", result)

ray.shutdown()

Use case

No response

Related issues

none

Are you willing to submit a PR?

  • Yes I am willing to submit a PR!
@zjj2wry zjj2wry added enhancement New feature or request triage labels Oct 31, 2024
@win5923
Copy link
Contributor

win5923 commented Oct 31, 2024

Perhaps using strings.Contains could be a better way.

@zjj2wry
Copy link
Author

zjj2wry commented Nov 4, 2024

https://github.com/ray-project/ray/blob/ba41ae99097c30cac2dd62e263bbe0b7b9bffc95/python/ray/autoscaler/_private/kuberay/autoscaling_config.py#L346-L351

By setting num-gpus, i can solve the problem that the gpu will not automatically expand. desireGPU is just for display purposes.

@zjj2wry zjj2wry changed the title [Feature] autoscaler support custom gpu key Statistics of other types of gpu Nov 4, 2024
@andrewsykim
Copy link
Collaborator

I suggest adding these to I suggest to add these in the list of well known accelerators instread: https://github.com/ray-project/kuberay/blob/master/ray-operator/controllers/ray/common/pod.go#L41-L43 instead of using regex to parse GPU counts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request raycluster
Projects
None yet
4 participants