Mount命名空间之理论篇
Mount命名空间作为第一个加入Linux的命名空间,起名为CLONE_NEWNS,而不是CLONE_NEWMNT,它为进程提供独立的文件系统视图。
Mount namespaces provide isolation of the list of mount points seen by the processes in each namespace instance. Thus, the processes in each of the mount namespace instances will see distinct single-directory hierarchies.
Mount命名空间用来隔离每个命名空间实例里的进程所能看到的挂载点列表,因此每个Mount命名空间实例里的进程只能看到不同的单层次目录。进程只能看到自己Mount命名空间中的挂载点。
某进程Mount命名空间的挂载点信息可以在 /proc/[pid]/mounts、/proc/[pid]/mountinfo和/proc/[pid]/mountstats三个文件中找到。
A new mount namespace is created using either clone(2) or unshare(2) with the CLONE_NEWNS flag. When a new mount namespace is created, its mount point list is initialized as follows:
* If the namespace is created using clone(2), the mount point list of the child's namespace is a copy of the mount point list in the parent's namespace.
* If the namespace is created using unshare(2), the mount point list of the new namespace is a copy of the mount point list in the caller's previous mount namespace.
当使用clone函数或unshare函数并传入CLONE_NEWNS标志可以创建一个新的Mount命名空间。
当使用clone函数创建一个新的Mount命名空间,子Mount命名空间中的挂载点列表是从父Mount命名空间中拷贝过来的。
当使用unshare函数创建一个新的Mount命名空间,新Mount命名空间中的挂载点列表是从调用者的Mount命名空间中拷贝过来的。
Subsequent modifications to the mount point list (mount and umount) in either mount namespace will not (by default) affect the mount point list seen in the other namespace (but see the following discussion of shared subtrees).
默认情况下,对于Mount命名空间里的挂载点列表的后续修改,将不会影响到另外命名空间里看到的挂载点列表(除了下面提到的shared subtrees情况)。
Each mount namespace has an owner user namespace. As explained above, when a new mount namespace is created, its mount point list is initialized as a copy of the mount point list of another mount namespace. If the new namespace and the namespace from which the mount point list was copied are owned by different user namespaces, then the new mount namespace is considered less privileged.
每个Mount命名空间都有一个用户命名空间。当创建一个新的Mount命名空间,其挂载点列表从另外一个命名空间拷贝过来的。假如新的命名空间和挂载点列表被拷贝的命名空间被不同的用户命名空间所拥有,那么新的Mount命名空间视为降级了。
A file or directory that is a mount point in one namespace that is not a mount point in another namespace, may be renamed, unlinked, or removed (rmdir(2)) in the mount namespace in which it is not a mount point (subject to the usual permission checks). Consequently, the mount point is removed in the mount namespace where it was a mount point.
一个文件或文件夹在一个命名空间是一个挂载点,在另外一个命名空间不一定是挂载点,有可能被重新命名或取消软连接,也有可能在一个不是挂载点的mount命名空间中被移除,也有可能在一个是挂载点的mount命名空间里被移除。
Previously (before Linux 3.18), attempting to unlink, rename, or
remove a file or directory that was a mount point in another mount namespace would result in the error EBUSY. That behavior had technical problems of enforcement (for NFS) and permitted denial-of-service attacks against more privileged users. (i.e., preventing individual files from being updated by bind mounting on top of them).
在Linux3.18之前的版本,尝试将其它mount命名空间的挂载点进行取消连接,重命名和删除一个文件或文件夹将会报错EBUSY。这一技术问题对于NFS等有强制执行方面的问题和允许高权限用户的权限受到攻击(通过在顶层去bind mounting来阻止单个文件更新)。
SHARED SUBTREES:After the implementation of mount namespaces was completed,
experience showed that the isolation that they provided was, in some cases, too great. For example, in order to make a newly loaded
optical disk available in all mount namespaces, a mount operation was
required in each namespace. For this use case, and others, the shared subtree feature was introduced in Linux 2.6.15. This feature allows for automatic, controlled propagation of mount and unmount events between namespaces (or, more precisely, between the members of a peer group that are propagating events to one another).
Mount命名空间实现了挂载点的隔离,但经验告诉我们对于某些应用场景,并不合适。比如在所有命名空间(系统)新添加了一个磁盘设备,在每一个mount命名空间都要挂载操作一遍,显然这太不方便了。于是在Linux内核2.6.15引入了shared subtree的概念来解决这个问题。Shared subtree的核心是允许在命名空间之间自动地可控地mount和umount操作。
Each mount point is marked (via mount(2)) as having one of the following propagation types:
1) MS_SHARED:This mount point shares events with members of a peer group. Mount and unmount events immediately under this mount point will propagate to the other mount points that are members of the peer group. Propagation here means that the same mount or unmount will automatically occur under all of the other mount points in the peer group. Conversely , mount and unmount events that take place under peer mount points will propagate to this mount point.
MS_SHARED参数代表:挂载点在一个peer组内共享事件,在一个挂载点下的挂载和卸载事件将立即同步到一个peer组内的其他挂载点。Propagation在这里的意思指在同一peer组内的其他挂载点上将自动发生相同的挂载和卸载操作。相应地,同一peer组内的其他挂载点的挂载和卸载操作将Propagate到该挂载点。
2) MS_PRIVATE:This mount point is private; it does not have a peer group.Mount and unmount events do not propagate into or out of this mount point.
MS_PRIVATE:挂载点是私有的,它没有peer组,挂载和卸载事件不会propagate到该挂载点。
3) MS_SLAVE:Mount and unmount events propagate into this mount point from a (master) shared peer group. Mount and unmount events under this mount point do not propagate to any peer.
MS_SLAVE:挂载和卸载事情从一个共享peer组(master)propagate到该挂载点。该挂载点下的挂载和卸载事情不会propagate到任意peer。
Note that a mount point can be the slave of another peer group while at the same time sharing mount and unmount events with a peer group of which it is a member. (More precisely, one peer group can be the slave of another peer group.)
4) MS_UNBINDABLE: This is like a private mount, and in addition this mount can't be bind mounted. Attempts to bind mount this mout with the MS_BIND flag) will fail When a recursive bind mount with the MS_BIND and _MS_REC flags) is performed on a directory subtree, any bind mounts within the subtree are automatically pruned (i.e., not replicated) when replicating that subtree to produce the target subtree.
MS_UNBINDABLE相当于私有的挂载,这个挂载是不被bind mount的,如果尝试bind将会失败。
For a discussion of the propagation type assigned to a new mount, see NOTES.
The propagation type is a per-mount-point setting; some mount points may be marked as shared (with each shared mount point being a member of a distinct peer group), while others are private (or slaved or unbindable).
每个挂载点都有一个propagation类型设置,相同的挂载点可能会标记为共享(每个共享挂载点可能属于不同的peer组),其他的是私有的(slaved或unbind)。
Note that a mount's propagation type determines whether mounts and unmounts of mount points immediately under the mount point are propagated. Thus, the propagation type does not affect propagation of events for grandchildren and further removed descendant mount points. What happens if the mount point itself is unmounted is determined by the propagation type that is in effect for the parent of the mount point.
挂载的propagation传播类型决定了该挂载点下的挂载列表的挂载和卸载事件是否立即传播propagated。因此propagation传播类型不影响孙子事件的传播以及后续移除的后代挂载点列表。挂载点本身的卸载取决于父挂载点propagation同步类型的影响。
Members are added to a peer group when a mount point is marked as shared and either:
* the mount point is replicated during the creation of a new mount namespace; or
* a new bind mount is created from the mount point. In both of these cases, the new mount point joins the peer group of which the existing mount point is a member.
In both of these cases, the new mount point joins the peer group of which the existing mount point is a member.
以上场景下,新的挂载点作为一员加入一个现有挂载点的peer组内。
A new peer group is also created when a child mount point is created under an existing mount point that is marked as shared. In this case, the new child mount point is also marked as shared and the resulting peer group consists of all the mount points that are replicated under the peers of parent mount.
当一个子挂载点在现有挂载点下创建并标记为共享的同时,一个新的peer组也创建了。这种场景下新的子挂载点标记为共享,peer组包括了该父挂载点下复制的所有挂载点列表。
A mount ceases to be a member of a peer group when either the mount is explicitly unmounted, or when the mount is implicitly unmounted because a mount namespace is removed (because it has no more member processes).
当显示卸载mount点和隐式挂载mount,是因为mount命名空间被移除(因为没有更多的成员进程),因此该mount不是peer的一员。
The propagation type of the mount points in a mount namespace can be discovered via the "optional fields" exposed in /proc/[pid]/mountinfo. (See proc(5) for details of this file.) The following tags can appear in the optional fields for a record in that file:
在mount命名空间的挂载点的同步类型在/proc/[pid]/mountinfo中发现选项参数,相应标记可以出现在文件记录的选项:
shared:X This mount point is shared in peer group X. Each peer group has a unique ID that is automatically generated by the kernel, and all mount points in the same peer group will show the same ID. (These IDs are assigned starting from the value 1, and may be recycled when a peer group ceases to have any members.)
master:X This mount is a slave to shared peer group X.
propagate_from:X (since Linux 2.6.26) This mount is a slave and receives propagation from shared peer group X. This tag will always appear in conjunction with a master:X tag. Here, X is the closest dominant peer group under the process's root directory. If X is the immediate master of the mount, or if there is no dominant peer group under the same root, then only the master:X field is present and not the propagate_from:X field. For further details, see below.
unbindable This is an unbindable mount.
If none of the above tags is present, then this is a private mount.
参考
https://segmentfault.com/a/1190000006912742
https://www.cnblogs.com/sparkdev/p/9424649.html
https://man7.org/linux/man-pages/man7/mount_namespaces.7.html