Wednesday, November 3, 2010

per-process mount namespace

systemd 用到了这么一段代码,用来读取系统上的挂载信息:
m->proc_self_mountinfo = fopen("/proc/self/mountinfo", "re")
奇怪之处在于 mountinfo 这个文件是在 self 目录下,而不是 /proc 的最上层。也就是说,它是进程相关的,而不是整个系统范围的。

所以... 原来有 per-process mount namespace 这么个东西。
$ man proc
    /proc/[pid]/mountinfo (since Linux 2.6.26)
        This file contains information about mount points.

   /proc/mounts
        Before kernel 2.4.19, this file was a list of all the file systems currently
        mounted on the system.  With the introduction  of  per-process  mount
        namespaces  in  Linux  2.4.19,  this  file  became  a link to
        /proc/self/mounts, which lists the mount points of the process's own mount
        namespace.

$ man 2 mount
    Per-process Namespaces
        Starting with kernel 2.4.19, Linux provides per-process mount namespaces.  A
        mount namespace is the set of file system mounts that are visible to a process.
        Mount-point namespaces can be (and usually are) shared between multiple
        processes, and changes to the namespace (i.e., mounts and unmounts) by one
        process are visible to all other processes sharing the same namespace.  (The
        pre-2.4.19 Linux situation can  be considered as one in which a single
        namespace was shared by every process on the system.)

        A child process created by fork(2) shares its parent's mount namespace; the
        mount namespace is preserved across an execve(2).

        A  process  can  obtain a private mount namespace if: it was created using the
        clone() CLONE_NEWNS flag, in which case its new namespace is initialized to be
        a copy of the namespace of the process that called clone(); or it calls
        unshare(2)  with  the  CLONE_NEWNS  flag,  which causes  the caller's mount
        namespace to obtain a private copy of the namespace that it was previously
        sharing with other processes, so that future mounts and unmounts by the caller
        are invisible to other processes (except child processes that the caller
        subsequently creates) and vice versa.