Openshift3.11 如何调度GPU任务POD

2019-10-08  本文已影响0人  frederickhou

参考https://blog.openshift.com/how-to-use-gpus-with-deviceplugin-in-openshift-3-10/

基础环境安装:NVIDIA驱动、nvidia-docker安装与测试详见参考链接。
本文主要提供经过实践验证过的创建SCC服务的nvidia-deviceplugin-scc.yaml 文件和 创建NVIDIA Device Plugin daemonset 的nvidia-deviceplugin.yaml文件以及测试调用GPU POD test-gpu.yaml 文件。

    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
    name: nvidia-device-plugin-daemonset
    namespace: nvidia
    spec:
    template:
        metadata:
        # Mark this pod as a critical add-on; when enabled, the critical add-on scheduler
        # reserves resources for critical add-on pods so that they can be rescheduled after
        # a failure.  This annotation works in tandem with the toleration below.
        annotations:
            scheduler.alpha.kubernetes.io/critical-pod: ""
        labels:
            name: nvidia-device-plugin-ds
        spec:
        affinity:
            nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
                nodeSelectorTerms:
                - matchExpressions:
                - key: openshift.com/gpu-accelerator
                    operator: Exists
        tolerations:
        # Allow this pod to be rescheduled while the node is in "critical add-ons only" mode.
        # This, along with the annotation above marks this pod as a critical add-on.
        - key: CriticalAddonsOnly
            operator: Exists
        - key: nvidia.com/gpu
            operator: Exists
            effect: NoSchedule
        serviceAccount: nvidia-deviceplugin
        serivceAccountName: nvidia-deviceplugin
        hostNetwork: true
        hostPID: true
        containers:
        - image: nvidia/k8s-device-plugin:1.11
            name: nvidia-device-plugin-ctr
            securityContext:
            allowPrivilegeEscalation: false
            capabilities:
                drop: ["ALL"]
            seLinuxOptions:
                type: nvidia_container_t
            volumeMounts:
            - name: device-plugin
                mountPath: /var/lib/kubelet/device-plugins
        volumes:
            - name: device-plugin
            hostPath:
                path: /var/lib/kubelet/device-plugins
Inked图片3_LI.jpg
    apiVersion: v1
    kind: Pod
    metadata:
    name: cuda3
    namespace: nvidia
    spec:
    restartPolicy: OnFailure
    containers:
        - name: cuda3
        image: "docker.io/nvidia/cuda:9.0-base"
        args: ["nvidia-smi"]
        resources:
            limits:
            nvidia.com/gpu: 3 # requesting 3 GPU
图片5.png
上一篇 下一篇

猜你喜欢

热点阅读