k8s pod

2021-11-16 本文已影响0人 Wu杰语

学习了几个月k8s，发现之前很多东西不懂的现在逐渐能搞明白了，特别是有些东西之前学习了一遍，其实学习的很肤浅。有些特别重要的基本概念，其实是忽视了。k8s最基本的抽象元素是Pod，对这个概念应该重视并深入学习。

Pod是什么

Pod是k8s的最小API，是k8s调度的原子单位。
怎么理解，我们先看看代码

type Pod struct {
    metav1.TypeMeta
    // +optional
    metav1.ObjectMeta

    // Spec defines the behavior of a pod.
    // +optional
    Spec PodSpec

    // Status represents the current information about a pod. This data may not be up
    // to date.
    // +optional
    Status PodStatus
}
// PodSpec is a description of a pod
type PodSpec struct {
    Volumes []Volume
    // List of initialization containers belonging to the pod.
    InitContainers []Container
    // List of containers belonging to the pod.
    Containers []Container
    // List of ephemeral containers run in this pod. Ephemeral containers may be run in an existing
    // pod to perform user-initiated actions such as debugging. This list cannot be specified when
    // creating a pod, and it cannot be modified by updating the pod spec. In order to add an
    // ephemeral container to an existing pod, use the pod's ephemeralcontainers subresource.
    // This field is alpha-level and is only honored by servers that enable the EphemeralContainers feature.
    // +optional
    EphemeralContainers []EphemeralContainer
    // +optional
    RestartPolicy RestartPolicy
    // Optional duration in seconds the pod needs to terminate gracefully. May be decreased in delete request.
    // Value must be non-negative integer. The value zero indicates stop immediately via the kill
    // signal (no opportunity to shut down).
    // If this value is nil, the default grace period will be used instead.
    // The grace period is the duration in seconds after the processes running in the pod are sent
    // a termination signal and the time when the processes are forcibly halted with a kill signal.
    // Set this value longer than the expected cleanup time for your process.
    // +optional
    TerminationGracePeriodSeconds *int64
    // Optional duration in seconds relative to the StartTime that the pod may be active on a node
    // before the system actively tries to terminate the pod; value must be positive integer
    // +optional
    ActiveDeadlineSeconds *int64
    // Set DNS policy for the pod.
    // Defaults to "ClusterFirst".
    // Valid values are 'ClusterFirstWithHostNet', 'ClusterFirst', 'Default' or 'None'.
    // DNS parameters given in DNSConfig will be merged with the policy selected with DNSPolicy.
    // To have DNS options set along with hostNetwork, you have to specify DNS policy
    // explicitly to 'ClusterFirstWithHostNet'.
    // +optional
    DNSPolicy DNSPolicy
    // NodeSelector is a selector which must be true for the pod to fit on a node
    // +optional
    NodeSelector map[string]string

    // ServiceAccountName is the name of the ServiceAccount to use to run this pod
    // The pod will be allowed to use secrets referenced by the ServiceAccount
    ServiceAccountName string
    // AutomountServiceAccountToken indicates whether a service account token should be automatically mounted.
    // +optional
    AutomountServiceAccountToken *bool

    // NodeName is a request to schedule this pod onto a specific node.  If it is non-empty,
    // the scheduler simply schedules this pod onto that node, assuming that it fits resource
    // requirements.
    // +optional
    NodeName string
    // SecurityContext holds pod-level security attributes and common container settings.
    // Optional: Defaults to empty.  See type description for default values of each field.
    // +optional
    SecurityContext *PodSecurityContext
    // ImagePullSecrets is an optional list of references to secrets in the same namespace to use for pulling any of the images used by this PodSpec.
    // If specified, these secrets will be passed to individual puller implementations for them to use.  For example,
    // in the case of docker, only DockerConfig type secrets are honored.
    // +optional
    ImagePullSecrets []LocalObjectReference
    // Specifies the hostname of the Pod.
    // If not specified, the pod's hostname will be set to a system-defined value.
    // +optional
    Hostname string
    // If specified, the fully qualified Pod hostname will be "<hostname>.<subdomain>.<pod namespace>.svc.<cluster domain>".
    // If not specified, the pod will not have a domainname at all.
    // +optional
    Subdomain string
    // If true the pod's hostname will be configured as the pod's FQDN, rather than the leaf name (the default).
    // In Linux containers, this means setting the FQDN in the hostname field of the kernel (the nodename field of struct utsname).
    // In Windows containers, this means setting the registry value of hostname for the registry key HKEY_LOCAL_MACHINE\\SYSTEM\\CurrentControlSet\\Services\\Tcpip\\Parameters to FQDN.
    // If a pod does not have FQDN, this has no effect.
    // +optional
    SetHostnameAsFQDN *bool
    // If specified, the pod's scheduling constraints
    // +optional
    Affinity *Affinity
    // If specified, the pod will be dispatched by specified scheduler.
    // If not specified, the pod will be dispatched by default scheduler.
    // +optional
    SchedulerName string
    // If specified, the pod's tolerations.
    // +optional
    Tolerations []Toleration
    // HostAliases is an optional list of hosts and IPs that will be injected into the pod's hosts
    // file if specified. This is only valid for non-hostNetwork pods.
    // +optional
    HostAliases []HostAlias
    // If specified, indicates the pod's priority. "system-node-critical" and
    // "system-cluster-critical" are two special keywords which indicate the
    // highest priorities with the former being the highest priority. Any other
    // name must be defined by creating a PriorityClass object with that name.
    // If not specified, the pod priority will be default or zero if there is no
    // default.
    // +optional
    PriorityClassName string
    // The priority value. Various system components use this field to find the
    // priority of the pod. When Priority Admission Controller is enabled, it
    // prevents users from setting this field. The admission controller populates
    // this field from PriorityClassName.
    // The higher the value, the higher the priority.
    // +optional
    Priority *int32
    // PreemptionPolicy is the Policy for preempting pods with lower priority.
    // One of Never, PreemptLowerPriority.
    // Defaults to PreemptLowerPriority if unset.
    // This field is beta-level, gated by the NonPreemptingPriority feature-gate.
    // +optional
    PreemptionPolicy *PreemptionPolicy
    // Specifies the DNS parameters of a pod.
    // Parameters specified here will be merged to the generated DNS
    // configuration based on DNSPolicy.
    // +optional
    DNSConfig *PodDNSConfig
    // If specified, all readiness gates will be evaluated for pod readiness.
    // A pod is ready when all its containers are ready AND
    // all conditions specified in the readiness gates have status equal to "True"
    // More info: https://git.k8s.io/enhancements/keps/sig-network/580-pod-readiness-gates
    // +optional
    ReadinessGates []PodReadinessGate
    // RuntimeClassName refers to a RuntimeClass object in the node.k8s.io group, which should be used
    // to run this pod.  If no RuntimeClass resource matches the named class, the pod will not be run.
    // If unset or empty, the "legacy" RuntimeClass will be used, which is an implicit class with an
    // empty definition that uses the default runtime handler.
    // More info: https://git.k8s.io/enhancements/keps/sig-node/585-runtime-class
    // +optional
    RuntimeClassName *string
    // Overhead represents the resource overhead associated with running a pod for a given RuntimeClass.
    // This field will be autopopulated at admission time by the RuntimeClass admission controller. If
    // the RuntimeClass admission controller is enabled, overhead must not be set in Pod create requests.
    // The RuntimeClass admission controller will reject Pod create requests which have the overhead already
    // set. If RuntimeClass is configured and selected in the PodSpec, Overhead will be set to the value
    // defined in the corresponding RuntimeClass, otherwise it will remain unset and treated as zero.
    // More info: https://git.k8s.io/enhancements/keps/sig-node/688-pod-overhead
    // This field is beta-level as of Kubernetes v1.18, and is only honored by servers that enable the PodOverhead feature.
    // +optional
    Overhead ResourceList
    // EnableServiceLinks indicates whether information about services should be injected into pod's
    // environment variables, matching the syntax of Docker links.
    // If not specified, the default is true.
    // +optional
    EnableServiceLinks *bool
    // TopologySpreadConstraints describes how a group of pods ought to spread across topology
    // domains. Scheduler will schedule pods in a way which abides by the constraints.
    // All topologySpreadConstraints are ANDed.
    // +optional
    TopologySpreadConstraints []TopologySpreadConstraint
}

一个Pod的定义是这样的，其它对象可以认为是继承自Pod，例如deployement的定义

// Deployment provides declarative updates for Pods and ReplicaSets.
type Deployment struct {
    metav1.TypeMeta
    // +optional
    metav1.ObjectMeta

    // Specification of the desired behavior of the Deployment.
    // +optional
    Spec DeploymentSpec

    // Most recently observed status of the Deployment.
    // +optional
    Status DeploymentStatus
}

// DeploymentSpec specifies the state of a Deployment.
type DeploymentSpec struct {
    // Number of desired pods. This is a pointer to distinguish between explicit
    // zero and not specified. Defaults to 1.
    // +optional
    Replicas int32

    // Label selector for pods. Existing ReplicaSets whose pods are
    // selected by this will be the ones affected by this deployment.
    // +optional
    Selector *metav1.LabelSelector

    // Template describes the pods that will be created.
    Template api.PodTemplateSpec

    // The deployment strategy to use to replace existing pods with new ones.
    // +optional
    Strategy DeploymentStrategy

    // Minimum number of seconds for which a newly created pod should be ready
    // without any of its container crashing, for it to be considered available.
    // Defaults to 0 (pod will be considered available as soon as it is ready)
    // +optional
    MinReadySeconds int32

    // The number of old ReplicaSets to retain to allow rollback.
    // This is a pointer to distinguish between explicit zero and not specified.
    // This is set to the max value of int32 (i.e. 2147483647) by default, which means
    // "retaining all old ReplicaSets".
    // +optional
    RevisionHistoryLimit *int32

    // Indicates that the deployment is paused and will not be processed by the
    // deployment controller.
    // +optional
    Paused bool

    // DEPRECATED.
    // The config this deployment is rolling back to. Will be cleared after rollback is done.
    // +optional
    RollbackTo *RollbackConfig

    // The maximum time in seconds for a deployment to make progress before it
    // is considered to be failed. The deployment controller will continue to
    // process failed deployments and a condition with a ProgressDeadlineExceeded
    // reason will be surfaced in the deployment status. Note that progress will
    // not be estimated during the time a deployment is paused. This is set to
    // the max value of int32 (i.e. 2147483647) by default, which means "no deadline".
    // +optional
    ProgressDeadlineSeconds *int32
}
// PodTemplateSpec describes the data a pod should have when created from a template
type PodTemplateSpec struct {
    // Metadata of the pods created from this template.
    // +optional
    metav1.ObjectMeta

    // Spec defines the behavior of a pod.
    // +optional
    Spec PodSpec
}

可以再翻一翻其它的API对象，可以看到都有一个含有PodTemplateSpec的定义，可以理解为可调度的所有API对象都继承自Pod。

因此我们操作Deployment、Job、DeamonSet等都是操作Pod，controllers维持的对象是Pod。

这里要和Node区分一下，Node对应的是物理节点或者虚拟节点，调度器调度的是Pod，但是需要管理所有的Node，将Pod分配到合适的Node上。

Pod的Yaml文件对应

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: nginx
    image: nginx:1.14.2
    ports:
    - containerPort: 80

如上，一个Pod的简单定义是这样的，其中apiversion和kind对应了代码中的metav1.TypeMeta元信息，metadata对应了metav1.ObjectMeta定义，spec对应了PodSpec。

可以看到明白了代码以后就可以从代码中直接翻字段含义了。

按照之前的学习，一个Pod有很多定义，例如Affinity等等许多定义，这么简单的一个定义肯定无法满足运维要求的，这个Pod的其它参数肯定是有默认值的，可以提前定义Pod模板，这个对象叫PodPreset，可以去翻一翻这个代码。

Pod本身的理解

Pod作为一个原子抽象，其本身也是有自己的原理的。

我们知道容器的原理是“Namespace 做隔离，Cgroups 做限制，rootfs 做文件系统”，一个容器相当于linux中的一个进程；Pod中可以放不止一个容器，它相当于进程组，这个进程组共享了Namespace，所以在一个Pod内的容器中，k8s attach -it id sh命令进入容器后，用ps命令可以看到Pod其它容器的进程号。

Pod是如何共享网络的呢，它做了一件事情

Infra container

Infra 容器一定要占用极少的资源，所以它使用的是一个非常特殊的镜像，叫作：k8s.gcr.io/pause，它一直处于休眠态，几乎不占用CPU和内存。

使用这个容器可以hold住NetworkNamespace，就可以将其它容器加入到这个NetworkNamespace中。这样有几个好处

Pod的生命周期和Infra容器一致，Pod中的其它容器挂掉并不会导致Pod挂掉。
Pod的网络资源可以被所有Pod共享
Pod只有一个IP地址，就是这个Pod对应的Namespace Network的IP地址。
Pod中的容器可以直接使用localhost通信。

Pod同样也可以共享volume，当声明了volume后，一个Pod的所有容器都共享这个volume。

看一看Pod本身这个原子调度单位中的干活还不少，它是一个基于容器的抽象创新。

Pod的设计模式

为什么Pod这么重要，还因为Pod是k8s设计模式的基础。
例如说：


apiVersion: v1
kind: Pod
metadata:
  name: javaweb-2
spec:
  initContainers:
  - image: geektime/sample:v2
    name: war
    command: ["cp", "/sample.war", "/app"]
    volumeMounts:
    - mountPath: /app
      name: app-volume
  containers:
  - image: geektime/tomcat:7.0
    name: tomcat
    command: ["sh","-c","/root/apache-tomcat-7.0.42-v2/bin/start.sh"]
    volumeMounts:
    - mountPath: /root/apache-tomcat-7.0.42-v2/webapps
      name: app-volume
    ports:
    - containerPort: 8080
      hostPort: 8001 
  volumes:
  - name: app-volume
    emptyDir: {}

Pod中的Initcontainer会比其它容器早启动，Initcontainer会把war包拷贝到制定目录，这样容器tomcat启动的时候war包已经存在，就会把war包加载上，这是一个典型的应用。

Pod最典型的设计模式还是sidecar，例如说istio，主要是给用户定义的pod增加了一个envoy的容器,这个容器是 Admission Controller在用户创建Pod的时候，在创建Pod之前，给Pod增加一个envoy的patch。

小结

Pod是个最基本的概念，应该深入好好理解，就像去学习操作系统的进程一样，理解好了这个抽象，学习其它API对象就比较容易了。另外对于开发人员，最好的方法还是针对性的翻一翻代码，代码是最好的文档。

k8s pod

Pod是什么

Pod的Yaml文件对应

Pod本身的理解

Pod的设计模式

小结

猜你喜欢

热点阅读