From drozdziak1 at gmail.com Tue Aug 1 12:04:41 2017 From: drozdziak1 at gmail.com (Stanislaw Drozd) Date: Tue, 1 Aug 2017 14:04:41 +0200 Subject: Hello Message-ID: Hello, I'm looking for namespace and cgroups bugs to fix in the kernel. Is this the right list to find something like that? Stan From rppt at linux.vnet.ibm.com Tue Aug 1 16:13:21 2017 From: rppt at linux.vnet.ibm.com (Mike Rapoprt) Date: Tue, 1 Aug 2017 19:13:21 +0300 Subject: [ANNOUNCE] Checkpoint/Restore micro-conference at Linux Plumbers Message-ID: <20170801161320.GA13504@rapoport-lnx> Good news, everyone! The checkpoint/restore micro-conference has been accepted into Linux Plumbers [1] :) The Linux Plumbers Conference (LPC) is a developer conference for the open source community. The LPC brings together the top developers working on the ?plumbing? of Linux ? kernel subsystems, core libraries, windowing systems, etc. ? and gives them three days to work together on core design problems. The conference is divided into several working sessions focusing on different ?plumbing? topics, and checkpoint/restore is one of such topics. This year the checkpoint/restore MC is going to shift the focus towards discussion of problems and new ideas, however presentations and demos are not less welcome :) The talks and topics for the discussions can be submitted at [2]. CFP closes on the August, 15th. [1] https://www.linuxplumbersconf.org/2017/checkpoint-restart-microconference-accepted-into-the-linux-plumbers-conference/ [2] https://linuxplumbersconf.org/2017/ocw/events/LPC2017/proposals/new -- Sincerely yours, Mike. From tycho at docker.com Tue Aug 1 17:17:02 2017 From: tycho at docker.com (Tycho Andersen) Date: Tue, 1 Aug 2017 11:17:02 -0600 Subject: [RFC PATCH 3/5] ima: mamespace audit status flags In-Reply-To: <20170720225033.21298-4-mkayaalp@linux.vnet.ibm.com> References: <20170720225033.21298-1-mkayaalp@linux.vnet.ibm.com> <20170720225033.21298-4-mkayaalp@linux.vnet.ibm.com> Message-ID: <20170801171702.f2szj5huzbt7fdfl@docker> Hi Mehmet, On Thu, Jul 20, 2017 at 06:50:31PM -0400, Mehmet Kayaalp wrote: > --- a/security/integrity/ima/ima_ns.c > +++ b/security/integrity/ima/ima_ns.c > @@ -301,3 +301,24 @@ struct ns_status *ima_get_ns_status(struct ima_namespace *ns, > > return status; > } > + > +#define IMA_NS_STATUS_ACTIONS IMA_AUDIT > +#define IMA_NS_STATUS_FLAGS IMA_AUDITED > + Seems like these are defined in ima.h above in the patch, and re-defined here? > +unsigned long iint_flags(struct integrity_iint_cache *iint, > + struct ns_status *status) > +{ > + if (!status) > + return iint->flags; > + > + return iint->flags & (status->flags & IMA_NS_STATUS_FLAGS); Just to confirm, is there any situation where: iint->flags & IMA_NS_STATUS_FLAGS != status->flags & IMA_NS_STATUS_FLAGS ? i.e. can this line just be: return status->flags & IMA_NS_STATUS_FLAGS; Tycho > +} > + > +unsigned long set_iint_flags(struct integrity_iint_cache *iint, > + struct ns_status *status, unsigned long flags) > +{ > + iint->flags = flags; > + if (status) > + status->flags = flags & IMA_NS_STATUS_FLAGS; > + return flags; > +} > -- > 2.9.4 > From mkayaalp at linux.vnet.ibm.com Tue Aug 1 17:25:31 2017 From: mkayaalp at linux.vnet.ibm.com (Mehmet Kayaalp) Date: Tue, 1 Aug 2017 13:25:31 -0400 Subject: [RFC PATCH 3/5] ima: mamespace audit status flags In-Reply-To: <20170801171702.f2szj5huzbt7fdfl@docker> References: <20170720225033.21298-1-mkayaalp@linux.vnet.ibm.com> <20170720225033.21298-4-mkayaalp@linux.vnet.ibm.com> <20170801171702.f2szj5huzbt7fdfl@docker> Message-ID: <2848EE0A-2DB8-420B-A611-60967EB90F5C@linux.vnet.ibm.com> > On Aug 1, 2017, at 1:17 PM, Tycho Andersen wrote: > > Hi Mehmet, > > On Thu, Jul 20, 2017 at 06:50:31PM -0400, Mehmet Kayaalp wrote: >> --- a/security/integrity/ima/ima_ns.c >> +++ b/security/integrity/ima/ima_ns.c >> @@ -301,3 +301,24 @@ struct ns_status *ima_get_ns_status(struct ima_namespace *ns, >> >> return status; >> } >> + >> +#define IMA_NS_STATUS_ACTIONS IMA_AUDIT >> +#define IMA_NS_STATUS_FLAGS IMA_AUDITED >> + > > Seems like these are defined in ima.h above in the patch, and > re-defined here? Yes, it should be in the ima.h only. >> +unsigned long iint_flags(struct integrity_iint_cache *iint, >> + struct ns_status *status) >> +{ >> + if (!status) >> + return iint->flags; >> + >> + return iint->flags & (status->flags & IMA_NS_STATUS_FLAGS); > > Just to confirm, is there any situation where: > > iint->flags & IMA_NS_STATUS_FLAGS != status->flags & IMA_NS_STATUS_FLAGS > > ? i.e. can this line just be: > > return status->flags & IMA_NS_STATUS_FLAGS; > As Guilherme had pointed out, the first & should be |. Mehmet From caosf.fnst at cn.fujitsu.com Wed Aug 2 06:37:29 2017 From: caosf.fnst at cn.fujitsu.com (Cao Shufeng) Date: Wed, 2 Aug 2017 14:37:29 +0800 Subject: [PATCH_v4.1_3/3] Make core_pattern support namespace In-Reply-To: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> References: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> Message-ID: <1501655849-9149-4-git-send-email-caosf.fnst@cn.fujitsu.com> Currently, each container shared one copy of coredump setting with the host system, if host system changed the setting, each running containers will be affected. Same story happened when container changed core_pattern, both host and other container will be affected. For container based on namespace design, it is good to allow each container keeping their own coredump setting. It will bring us following benefit: 1: Each container can change their own coredump setting based on operation on /proc/sys/kernel/core_pattern 2: Coredump setting changed in host will not affect running containers. 3: Support both case of "putting coredump in guest" and "putting curedump in host". Each namespace-based software(lxc, docker, ..) can use this function to custom their dump setting. And this function makes each continer working as separate system, it fit for design goal of namespace. Test(in lxc): # In the host # ---------------- # echo host_core >/proc/sys/kernel/core_pattern # cat /proc/sys/kernel/core_pattern host_core # ulimit -c 1024000 # ./make_dump Segmentation fault (core dumped) # ls -l -rw------- 1 root root 331776 Feb 4 18:02 host_core.2175 -rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump # # In the container # ---------------- # cat /proc/sys/kernel/core_pattern host_core # echo container_core >/proc/sys/kernel/core_pattern # ./make_dump Segmentation fault (core dumped) # ls -l -rwxr-xr-x 1 root root 759731 Feb 4 10:45 make_dump -rw------- 1 root root 331776 Feb 4 10:45 container_core.16 # # Return to host # ---------------- # cat /proc/sys/kernel/core_pattern host_core # ls host_core.2175 make_dump make_dump.c # rm -f host_core.2175 # ./make_dump Segmentation fault (core dumped) # ls -l -rw------- 1 root root 331776 Feb 4 18:49 host_core.2351 -rwxr-xr-x 1 root root 759731 Feb 4 18:01 make_dump # --- fs/coredump.c | 25 ++++++++++++++++------ include/linux/pid_namespace.h | 3 +++ kernel/pid.c | 2 ++ kernel/pid_namespace.c | 2 ++ kernel/sysctl.c | 50 ++++++++++++++++++++++++++++++++++++++----- 5 files changed, 70 insertions(+), 12 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 745c757..b0ab533 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -52,7 +52,6 @@ int core_uses_pid; unsigned int core_pipe_limit; -char core_pattern[CORENAME_MAX_SIZE] = "core"; static int core_name_size = CORENAME_MAX_SIZE; struct core_name { @@ -60,8 +59,6 @@ struct core_name { int used, size; }; -/* The maximal length of core_pattern is also specified in sysctl.c */ - static int expand_corename(struct core_name *cn, int size) { char *corename = krealloc(cn->corename, size, GFP_KERNEL); @@ -186,10 +183,10 @@ static int cn_print_exe_file(struct core_name *cn) * name into corename, which must have space for at least * CORENAME_MAX_SIZE bytes plus one byte for the zero terminator. */ -static int format_corename(struct core_name *cn, struct coredump_params *cprm) +static int format_corename(struct core_name *cn, const char *pat_ptr, + struct coredump_params *cprm) { const struct cred *cred = current_cred(); - const char *pat_ptr = core_pattern; int ispipe = (*pat_ptr == '|'); int pid_in_pattern = 0; int err = 0; @@ -668,6 +665,8 @@ void do_coredump(const siginfo_t *siginfo) */ .mm_flags = mm->flags, }; + struct pid_namespace *pid_ns; + char core_pattern[CORENAME_MAX_SIZE]; audit_core_dumps(siginfo->si_signo); @@ -677,6 +676,18 @@ void do_coredump(const siginfo_t *siginfo) if (!__get_dumpable(cprm.mm_flags)) goto fail; + pid_ns = task_active_pid_ns(current); + spin_lock(&pid_ns->core_pattern_lock); + while (pid_ns != &init_pid_ns) { + if (pid_ns->core_pattern[0]) + break; + spin_unlock(&pid_ns->core_pattern_lock); + pid_ns = pid_ns->parent, + spin_lock(&pid_ns->core_pattern_lock); + } + strcpy(core_pattern, pid_ns->core_pattern); + spin_unlock(&pid_ns->core_pattern_lock); + cred = prepare_creds(); if (!cred) goto fail; @@ -698,7 +709,7 @@ void do_coredump(const siginfo_t *siginfo) old_cred = override_creds(cred); - ispipe = format_corename(&cn, &cprm); + ispipe = format_corename(&cn, core_pattern, &cprm); if (ispipe) { int dump_count; @@ -745,7 +756,7 @@ void do_coredump(const siginfo_t *siginfo) } rcu_read_lock(); - vinit_task = find_task_by_vpid(1); + vinit_task = find_task_by_pid_ns(1, pid_ns); rcu_read_unlock(); if (!vinit_task) { printk(KERN_WARNING "failed getting init task info, skipping core dump\n"); diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h index c2a989d..67f70de 100644 --- a/include/linux/pid_namespace.h +++ b/include/linux/pid_namespace.h @@ -9,6 +9,7 @@ #include #include #include +#include struct pidmap { atomic_t nr_free; @@ -52,6 +53,8 @@ struct pid_namespace { int hide_pid; int reboot; /* group exit code if this pidns was rebooted */ struct ns_common ns; + spinlock_t core_pattern_lock; + char core_pattern[CORENAME_MAX_SIZE]; }; extern struct pid_namespace init_pid_ns; diff --git a/kernel/pid.c b/kernel/pid.c index 731c4e5..c8cc65d 100644 --- a/kernel/pid.c +++ b/kernel/pid.c @@ -82,6 +82,8 @@ struct pid_namespace init_pid_ns = { #ifdef CONFIG_PID_NS .ns.ops = &pidns_operations, #endif + .core_pattern_lock = __SPIN_LOCK_UNLOCKED(init_pid_ns.core_pattern_lock), + .core_pattern = "core", }; EXPORT_SYMBOL_GPL(init_pid_ns); diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c index 74a5a72..c6540c6 100644 --- a/kernel/pid_namespace.c +++ b/kernel/pid_namespace.c @@ -140,6 +140,8 @@ static struct pid_namespace *create_pid_namespace(struct user_namespace *user_ns for (i = 1; i < PIDMAP_ENTRIES; i++) atomic_set(&ns->pidmap[i].nr_free, BITS_PER_PAGE); + spin_lock_init(&ns->core_pattern_lock); + return ns; out_free_map: diff --git a/kernel/sysctl.c b/kernel/sysctl.c index 4dfba1a..c841d5d 100644 --- a/kernel/sysctl.c +++ b/kernel/sysctl.c @@ -478,7 +478,7 @@ static struct ctl_table kern_table[] = { }, { .procname = "core_pattern", - .data = core_pattern, + .data = NULL, .maxlen = CORENAME_MAX_SIZE, .mode = 0644, .proc_handler = proc_dostring_coredump, @@ -2393,6 +2393,12 @@ int proc_dointvec_minmax(struct ctl_table *table, int write, static void validate_coredump_safety(void) { #ifdef CONFIG_COREDUMP + struct pid_namespace *pid_ns = task_active_pid_ns(current); + const char *core_pattern; + + spin_lock(&pid_ns->core_pattern_lock); + core_pattern = pid_ns->core_pattern; + if (suid_dumpable == SUID_DUMP_ROOT && core_pattern[0] != '/' && core_pattern[0] != '|') { printk(KERN_WARNING @@ -2401,6 +2407,8 @@ static void validate_coredump_safety(void) "Set kernel.core_pattern before fs.suid_dumpable.\n" ); } + + spin_unlock(&pid_ns->core_pattern_lock); #endif } @@ -2417,10 +2425,42 @@ static int proc_dointvec_minmax_coredump(struct ctl_table *table, int write, static int proc_dostring_coredump(struct ctl_table *table, int write, void __user *buffer, size_t *lenp, loff_t *ppos) { - int error = proc_dostring(table, write, buffer, lenp, ppos); - if (!error) - validate_coredump_safety(); - return error; + int ret; + char core_pattern[CORENAME_MAX_SIZE]; + struct pid_namespace *pid_ns = task_active_pid_ns(current); + + if (write) { + if (*ppos && sysctl_writes_strict == SYSCTL_WRITES_WARN) + warn_sysctl_write(table); + + ret = _proc_do_string(core_pattern, table->maxlen, write, + (char __user *)buffer, lenp, ppos); + if (ret) + return ret; + + spin_lock(&pid_ns->core_pattern_lock); + strcpy(pid_ns->core_pattern, core_pattern); + spin_unlock(&pid_ns->core_pattern_lock); + } else { + spin_lock(&pid_ns->core_pattern_lock); + while (pid_ns != &init_pid_ns) { + if (pid_ns->core_pattern[0]) + break; + spin_unlock(&pid_ns->core_pattern_lock); + pid_ns = pid_ns->parent, + spin_lock(&pid_ns->core_pattern_lock); + } + strcpy(core_pattern, pid_ns->core_pattern); + spin_unlock(&pid_ns->core_pattern_lock); + + ret = _proc_do_string(core_pattern, table->maxlen, write, + (char __user *)buffer, lenp, ppos); + if (ret) + return ret; + } + + validate_coredump_safety(); + return 0; } #endif -- 2.9.3 From caosf.fnst at cn.fujitsu.com Wed Aug 2 06:37:28 2017 From: caosf.fnst at cn.fujitsu.com (Cao Shufeng) Date: Wed, 2 Aug 2017 14:37:28 +0800 Subject: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container In-Reply-To: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> References: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> Message-ID: <1501655849-9149-3-git-send-email-caosf.fnst@cn.fujitsu.com> Currently when we set core_pattern to a pipe, the pipe program is forked by kthread running with root's permission, and write dumpfile into host's filesystem. Same thing happened for container, the dumper and dumpfile are also in host(not in container). It have following program: 1: Not consistent with file_type core_pattern When we set core_pattern to a file, the container will write dump into container's filesystem instead of host. 2: Not safe for privileged container In a privileged container, user can destroy host system by following command: # # In a container # echo "|/bin/dd of=/boot/vmlinuz" >/proc/sys/kernel/core_pattern # make_dump This patch switch dumper program's environment to init task, so, for container, dumper program have same environment with init task in container, which make dumper program put in container's filesystem, and write coredump into container's filesystem. The dumper's permission is also limited into subset of container's init process. Suggested-by: Eric W. Biederman Suggested-by: KOSAKI Motohiro Signed-off-by: Cao ShuFeng --- fs/coredump.c | 126 +++++++++++++++++++++++++++++++++++++++++++++++- include/linux/binfmts.h | 2 + 2 files changed, 126 insertions(+), 2 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 802f434..745c757 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -507,6 +507,45 @@ static void wait_for_dump_helpers(struct file *file) } /* + * umh_ns_setup + * set the namesapces to the bask task of a container. + * we need to switch back to the original namespaces + * so that the thread of workqueue is not influlenced. + * + * this method runs in workqueue kernel thread. + */ +static void umh_ns_setup(struct subprocess_info *info) +{ + struct coredump_params *cp = (struct coredump_params *)info->data; + struct task_struct *base_task = cp->base_task; + + if (base_task) { + cp->current_task_nsproxy = current->nsproxy; + //prevent current namespace from being freed + get_nsproxy(current->nsproxy); + /* Set namespaces to base_task */ + get_nsproxy(base_task->nsproxy); + switch_task_namespaces(current, base_task->nsproxy); + } +} + +/* + * umh_ns_cleanup + * cleanup what we have done in umh_ns_setup. + * + * this method runs in workqueue kernel thread. + */ +static void umh_ns_cleanup(struct subprocess_info *info) +{ + struct coredump_params *cp = (struct coredump_params *)info->data; + struct nsproxy *current_task_nsproxy = cp->current_task_nsproxy; + if (current_task_nsproxy) { + /* switch workqueue's original namespace back */ + switch_task_namespaces(current, current_task_nsproxy); + } +} + +/* * umh_pipe_setup * helper function to customize the process used * to collect the core in userspace. Specifically @@ -521,6 +560,8 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) { struct file *files[2]; struct coredump_params *cp = (struct coredump_params *)info->data; + struct task_struct *base_task; + int err = create_pipe_files(files, 0); if (err) return err; @@ -529,10 +570,76 @@ static int umh_pipe_setup(struct subprocess_info *info, struct cred *new) err = replace_fd(0, files[0], 0); fput(files[0]); + if (err) + return err; + /* and disallow core files too */ current->signal->rlim[RLIMIT_CORE] = (struct rlimit){1, 1}; - return err; + base_task = cp->base_task; + if (base_task) { + const struct cred *base_cred; + + /* Set fs_root to base_task */ + spin_lock(&base_task->fs->lock); + set_fs_root(current->fs, &base_task->fs->root); + set_fs_pwd(current->fs, &base_task->fs->pwd); + spin_unlock(&base_task->fs->lock); + + /* Set cgroup to base_task */ + current->flags &= ~PF_NO_SETAFFINITY; + err = cgroup_attach_task_all(base_task, current); + if (err < 0) + return err; + + /* Set cred to base_task */ + base_cred = get_task_cred(base_task); + + new->uid = base_cred->uid; + new->gid = base_cred->gid; + new->suid = base_cred->suid; + new->sgid = base_cred->sgid; + new->euid = base_cred->euid; + new->egid = base_cred->egid; + new->fsuid = base_cred->fsuid; + new->fsgid = base_cred->fsgid; + + new->securebits = base_cred->securebits; + + new->cap_inheritable = base_cred->cap_inheritable; + new->cap_permitted = base_cred->cap_permitted; + new->cap_effective = base_cred->cap_effective; + new->cap_bset = base_cred->cap_bset; + new->cap_ambient = base_cred->cap_ambient; + + security_cred_free(new); +#ifdef CONFIG_SECURITY + new->security = NULL; +#endif + err = security_prepare_creds(new, base_cred, GFP_KERNEL); + if (err < 0) { + put_cred(base_cred); + return err; + } + + free_uid(new->user); + new->user = base_cred->user; + get_uid(new->user); + + put_user_ns(new->user_ns); + new->user_ns = base_cred->user_ns; + get_user_ns(new->user_ns); + + put_group_info(new->group_info); + new->group_info = base_cred->group_info; + get_group_info(new->group_info); + + put_cred(base_cred); + + validate_creds(new); + } + + return 0; } void do_coredump(const siginfo_t *siginfo) @@ -595,6 +702,7 @@ void do_coredump(const siginfo_t *siginfo) if (ispipe) { int dump_count; + struct task_struct *vinit_task; char **helper_argv; struct subprocess_info *sub_info; @@ -636,6 +744,15 @@ void do_coredump(const siginfo_t *siginfo) goto fail_dropcount; } + rcu_read_lock(); + vinit_task = find_task_by_vpid(1); + rcu_read_unlock(); + if (!vinit_task) { + printk(KERN_WARNING "failed getting init task info, skipping core dump\n"); + goto fail_dropcount; + } + + helper_argv = argv_split(GFP_KERNEL, cn.corename, NULL); if (!helper_argv) { printk(KERN_WARNING "%s failed to allocate memory\n", @@ -643,15 +760,20 @@ void do_coredump(const siginfo_t *siginfo) goto fail_dropcount; } + get_task_struct(vinit_task); + + cprm.base_task = vinit_task; + retval = -ENOMEM; sub_info = call_usermodehelper_setup(helper_argv[0], helper_argv, NULL, GFP_KERNEL, - NULL, NULL, umh_pipe_setup, + umh_ns_setup, umh_ns_cleanup, umh_pipe_setup, NULL, &cprm); if (sub_info) retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); + put_task_struct(vinit_task); argv_free(helper_argv); if (retval) { printk(KERN_INFO "Core dump to |%s pipe failed\n", diff --git a/include/linux/binfmts.h b/include/linux/binfmts.h index 05488da..fa13104 100644 --- a/include/linux/binfmts.h +++ b/include/linux/binfmts.h @@ -61,6 +61,8 @@ struct linux_binprm { /* Function parameter for binfmt->coredump */ struct coredump_params { + struct task_struct *base_task; + struct nsproxy *current_task_nsproxy; const siginfo_t *siginfo; struct pt_regs *regs; struct file *file; -- 2.9.3 From caosf.fnst at cn.fujitsu.com Wed Aug 2 06:37:27 2017 From: caosf.fnst at cn.fujitsu.com (Cao Shufeng) Date: Wed, 2 Aug 2017 14:37:27 +0800 Subject: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces In-Reply-To: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> References: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> Message-ID: <1501655849-9149-2-git-send-email-caosf.fnst@cn.fujitsu.com> Current call_usermodehelper_work() can not set namespaces for the executed program. This patch add above function for call_usermodehelper_work(). The init_intermediate is introduced for init works which should be done before fork(). So that we get a method to set namespaces for children. The cleanup_intermediate is introduced for cleaning up what we have done in init_intermediate, like switching back the namespace. This function is helpful for coredump to run pipe_program in specific container environment. Signed-off-by: Cao Shufeng --- fs/coredump.c | 3 ++- include/linux/kmod.h | 5 ++++ init/do_mounts_initrd.c | 3 ++- kernel/kmod.c | 56 +++++++++++++++++++++++++++++++++++++-------- lib/kobject_uevent.c | 3 ++- security/keys/request_key.c | 4 ++-- 6 files changed, 59 insertions(+), 15 deletions(-) diff --git a/fs/coredump.c b/fs/coredump.c index 5926837..802f434 100644 --- a/fs/coredump.c +++ b/fs/coredump.c @@ -646,7 +646,8 @@ void do_coredump(const siginfo_t *siginfo) retval = -ENOMEM; sub_info = call_usermodehelper_setup(helper_argv[0], helper_argv, NULL, GFP_KERNEL, - umh_pipe_setup, NULL, &cprm); + NULL, NULL, umh_pipe_setup, + NULL, &cprm); if (sub_info) retval = call_usermodehelper_exec(sub_info, UMH_WAIT_EXEC); diff --git a/include/linux/kmod.h b/include/linux/kmod.h index c4e441e..bb4e1a6 100644 --- a/include/linux/kmod.h +++ b/include/linux/kmod.h @@ -61,6 +61,9 @@ struct subprocess_info { char **envp; int wait; int retval; + bool cleaned; + void (*init_intermediate)(struct subprocess_info *info); + void (*cleanup_intermediate)(struct subprocess_info *info); int (*init)(struct subprocess_info *info, struct cred *new); void (*cleanup)(struct subprocess_info *info); void *data; @@ -72,6 +75,8 @@ call_usermodehelper(const char *path, char **argv, char **envp, int wait); extern struct subprocess_info * call_usermodehelper_setup(const char *path, char **argv, char **envp, gfp_t gfp_mask, + void (*init_intermediate)(struct subprocess_info *info), + void (*cleanup_intermediate)(struct subprocess_info *info), int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *), void *data); diff --git a/init/do_mounts_initrd.c b/init/do_mounts_initrd.c index a1000ca..59d11c9 100644 --- a/init/do_mounts_initrd.c +++ b/init/do_mounts_initrd.c @@ -72,7 +72,8 @@ static void __init handle_initrd(void) current->flags |= PF_FREEZER_SKIP; info = call_usermodehelper_setup("/linuxrc", argv, envp_init, - GFP_KERNEL, init_linuxrc, NULL, NULL); + GFP_KERNEL, NULL, NULL, init_linuxrc, + NULL, NULL); if (!info) return; call_usermodehelper_exec(info, UMH_WAIT_PROC); diff --git a/kernel/kmod.c b/kernel/kmod.c index 563f97e..f75725b 100644 --- a/kernel/kmod.c +++ b/kernel/kmod.c @@ -41,6 +41,7 @@ #include #include #include +#include #include #include @@ -93,7 +94,8 @@ static int call_modprobe(char *module_name, int wait) argv[4] = NULL; info = call_usermodehelper_setup(modprobe_path, argv, envp, GFP_KERNEL, - NULL, free_modprobe_argv, NULL); + NULL, NULL, NULL, free_modprobe_argv, + NULL); if (!info) goto free_module_name; @@ -207,8 +209,15 @@ static void umh_complete(struct subprocess_info *sub_info) */ if (comp) complete(comp); - else + else { + for(;;) { + if (sub_info->cleaned == false) + udelay(20); + else + break; + } call_usermodehelper_freeinfo(sub_info); + } } /* @@ -302,7 +311,10 @@ static void call_usermodehelper_exec_sync(struct subprocess_info *sub_info) /* Restore default kernel sig handler */ kernel_sigaction(SIGCHLD, SIG_IGN); - + if(sub_info->cleanup_intermediate) { + sub_info->cleanup_intermediate(sub_info); + } + sub_info->cleaned = true; umh_complete(sub_info); } @@ -324,6 +336,9 @@ static void call_usermodehelper_exec_work(struct work_struct *work) { struct subprocess_info *sub_info = container_of(work, struct subprocess_info, work); + if(sub_info->init_intermediate) { + sub_info->init_intermediate(sub_info); + } if (sub_info->wait & UMH_WAIT_PROC) { call_usermodehelper_exec_sync(sub_info); @@ -336,6 +351,11 @@ static void call_usermodehelper_exec_work(struct work_struct *work) */ pid = kernel_thread(call_usermodehelper_exec_async, sub_info, CLONE_PARENT | SIGCHLD); + + if(sub_info->cleanup_intermediate) { + sub_info->cleanup_intermediate(sub_info); + } + sub_info->cleaned = true; if (pid < 0) { sub_info->retval = pid; umh_complete(sub_info); @@ -501,25 +521,38 @@ static void helper_unlock(void) * @argv: arg vector for process * @envp: environment for process * @gfp_mask: gfp mask for memory allocation - * @cleanup: a cleanup function + * @init_intermediate: init function which is called in parent task + * @cleanup_intermediate: clean function which is called in parent task * @init: an init function + * @cleanup: a cleanup function * @data: arbitrary context sensitive data * * Returns either %NULL on allocation failure, or a subprocess_info * structure. This should be passed to call_usermodehelper_exec to * exec the process and free the structure. * - * The init function is used to customize the helper process prior to - * exec. A non-zero return code causes the process to error out, exit, - * and return the failure to the calling process + * The init_intermediate is called in the parent task of user mode + * helper. It's designed for init works which must be done in + * parent task, like switching the pid_ns_for_children. + * + * The cleanup_intermediate is used when we want to cleanup what + * we have done in init_intermediate, it is also called in parent + * task. * - * The cleanup function is just before ethe subprocess_info is about to + * The init function is called after fork. It is used to customize the + * helper process prior to exec. A non-zero return code causes the + * process to error out, exit, and return the failure to the + * calling process. + * + * The cleanup function is just before the subprocess_info is about to * be freed. This can be used for freeing the argv and envp. The * Function must be runnable in either a process context or the * context in which call_usermodehelper_exec is called. */ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, char **envp, gfp_t gfp_mask, + void (*init_intermediate)(struct subprocess_info *info), + void (*cleanup_intermediate)(struct subprocess_info *info), int (*init)(struct subprocess_info *info, struct cred *new), void (*cleanup)(struct subprocess_info *info), void *data) @@ -539,8 +572,11 @@ struct subprocess_info *call_usermodehelper_setup(const char *path, char **argv, sub_info->argv = argv; sub_info->envp = envp; - sub_info->cleanup = cleanup; + sub_info->init_intermediate = init_intermediate; + sub_info->cleaned = false; + sub_info->cleanup_intermediate = cleanup_intermediate; sub_info->init = init; + sub_info->cleanup = cleanup; sub_info->data = data; out: return sub_info; @@ -635,7 +671,7 @@ int call_usermodehelper(const char *path, char **argv, char **envp, int wait) gfp_t gfp_mask = (wait == UMH_NO_WAIT) ? GFP_ATOMIC : GFP_KERNEL; info = call_usermodehelper_setup(path, argv, envp, gfp_mask, - NULL, NULL, NULL); + NULL, NULL, NULL, NULL, NULL); if (info == NULL) return -ENOMEM; diff --git a/lib/kobject_uevent.c b/lib/kobject_uevent.c index 719c155..b63e927 100644 --- a/lib/kobject_uevent.c +++ b/lib/kobject_uevent.c @@ -486,7 +486,8 @@ int kobject_uevent_env(struct kobject *kobj, enum kobject_action action, retval = -ENOMEM; info = call_usermodehelper_setup(env->argv[0], env->argv, env->envp, GFP_KERNEL, - NULL, cleanup_uevent_env, env); + NULL, NULL, NULL, + cleanup_uevent_env, env); if (info) { retval = call_usermodehelper_exec(info, UMH_NO_WAIT); env = NULL; /* freed by cleanup_uevent_env */ diff --git a/security/keys/request_key.c b/security/keys/request_key.c index 63e63a4..3f628ce 100644 --- a/security/keys/request_key.c +++ b/security/keys/request_key.c @@ -78,8 +78,8 @@ static int call_usermodehelper_keys(const char *path, char **argv, char **envp, struct subprocess_info *info; info = call_usermodehelper_setup(path, argv, envp, GFP_KERNEL, - umh_keys_init, umh_keys_cleanup, - session_keyring); + NULL, NULL, umh_keys_init, + umh_keys_cleanup, session_keyring); if (!info) return -ENOMEM; -- 2.9.3 From caosf.fnst at cn.fujitsu.com Wed Aug 2 06:37:26 2017 From: caosf.fnst at cn.fujitsu.com (Cao Shufeng) Date: Wed, 2 Aug 2017 14:37:26 +0800 Subject: [PATCH 0/3] Make core_pattern support namespace Message-ID: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> This patchset includes following function points: 1: Let usermodehelper function possible to set pid namespace done by: [PATCH_v4.1_1/3] Make call_usermodehelper_exec possible to set namespaces 2: Let pipe_type core_pattern write dump into container's rootfs done by: [PATCH_v4.1_2/3] Limit dump_pipe program's permission to init for container 3: Make separate core_pattern setting for each container done by: [PATCH_v4.1_3/3] Make core_pattern support namespace 4: Compatibility with current system also included in: [PATCH_v4.1_3/3] Make core_pattern support namespace If container hadn't change core_pattern setting, it will keep same setting with host. Test: 1: Pass a test script for each function of this patchset ## TEST IN HOST ## [root at kerneldev dumptest]# ./test_host Set file core_pattern: OK ./test_host: line 41: 2366 Segmentation fault (core dumped) "$SCRI= PT_BASE_DIR"/make_dump Checking dumpfile: OK Set file core_pattern: OK ./test_host: line 41: 2369 Segmentation fault (core dumped) "$SCRI= PT_BASE_DIR"/make_dump Checking dump_pipe triggered: OK Checking rootfs: OK Checking dumpfile: OK Checking namespace: OK Checking process list: OK Checking capabilities: OK ## TEST IN GUEST ## # ./test Segmentation fault (core dumped) Checking dump_pipe triggered: OK Checking rootfs: OK Checking dumpfile: OK Checking namespace: OK Checking process list: OK Checking cg pids: OK Checking capabilities: OK [ 64.940734] make_dump[2432]: segfault at 0 ip 000000000040049d sp 000= 07ffc4af025f0 error 6 in make_dump[400000+a6000] # 2: Pass other test(which is not easy to do in script) by hand. Changelog v3.1-v4: 1. remove extra fork pointed out by: Andrei Vagin 2: Rebase on top of v4.9-rc8. 3: Rebase on top of v4.12. Changelog v3-v3.1: 1. Switch "pwd" of pipe program to container's root fs. 2. Rebase on top of v4.9-rc1. Changelog v2->v3: 1: Fix problem of setting pid namespace, pointed out by: Andrei Vagin Changelog v1(RFC)->v2: 1: Add [PATCH 2/2] which was todo in [RFC v1]. 2: Pass a test script for each function. 3: Rebase on top of v4.7. Suggested-by: Eric W. Biederman Suggested-by: KOSAKI Motohiro Signed-off-by: Cao Shufeng Cao Shufeng (3): Make call_usermodehelper_exec possible to set namespaces Limit dump_pipe program's permission to init for container Make core_pattern support namespace fs/coredump.c | 150 +++++++++++++++++++++++++++++++++++++++--- include/linux/binfmts.h | 2 + include/linux/kmod.h | 5 ++ include/linux/pid_namespace.h | 3 + init/do_mounts_initrd.c | 3 +- kernel/kmod.c | 56 +++++++++++++--- kernel/pid.c | 2 + kernel/pid_namespace.c | 2 + kernel/sysctl.c | 50 ++++++++++++-- lib/kobject_uevent.c | 3 +- security/keys/request_key.c | 4 +- 11 files changed, 253 insertions(+), 27 deletions(-) -- 2.9.3 From asarai at suse.de Wed Aug 2 07:07:19 2017 From: asarai at suse.de (Aleksa Sarai) Date: Wed, 2 Aug 2017 17:07:19 +1000 Subject: [PATCH_v4.1_3/3] Make core_pattern support namespace In-Reply-To: <1501655849-9149-4-git-send-email-caosf.fnst@cn.fujitsu.com> References: <1501655849-9149-1-git-send-email-caosf.fnst@cn.fujitsu.com> <1501655849-9149-4-git-send-email-caosf.fnst@cn.fujitsu.com> Message-ID: <8bb63f0a-d0b7-edf7-6dca-4d12641074b4@suse.de> > Currently, each container shared one copy of coredump setting > with the host system, if host system changed the setting, each > running containers will be affected. > Same story happened when container changed core_pattern, both > host and other container will be affected. > > For container based on namespace design, it is good to allow > each container keeping their own coredump setting. From what I can see, this is basically setting a per-pidns core_pattern (which is hierarchically applied). I'm not sure this actually solves the more general problem (that usermode helper settings aren't generally namespace-aware) -- and what happens if you have processes in the same pidns that have different mount namespaces? If we _had_ to do it like this I would think it makes more sense to pin it to mountns, but I was under the impression that someone was working on making usermode helpers play nicer with namespaces. Just my $0.02. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ From teamalert at whereareyounow.com Wed Aug 2 13:03:37 2017 From: teamalert at whereareyounow.com (WAYN) Date: Wed, 2 Aug 2017 14:03:37 +0100 Subject: =?utf-8?B?V2UndmUgZ290IGEgV0FZTiBPZmZlciBmb3IgeW91IQ==?= Message-ID: <2920338b3edf01619e03c08705980ae8@whereareyounow.com> Hi Martin, We've got a new WAYN Offer for you. Click on the link below to view your offer: http://www2.wayn.com/-/105472-nz4krc/27877917-7a7982ca1b36755b Regards, The WAYN Team ---------------------------------------- If you don't want to receive these emails, unsubscribe: http://www2.wayn.com/-/105473-nz4krc/27877917-7a7982ca1b36755b. To manage all emails you receive, click here: http://www2.wayn.com/-/105474-nz4krc/27877917-7a7982ca1b36755b. From tycho at docker.com Wed Aug 2 21:48:41 2017 From: tycho at docker.com (Tycho Andersen) Date: Wed, 2 Aug 2017 15:48:41 -0600 Subject: [RFC PATCH 3/5] ima: mamespace audit status flags In-Reply-To: <2848EE0A-2DB8-420B-A611-60967EB90F5C@linux.vnet.ibm.com> References: <20170720225033.21298-1-mkayaalp@linux.vnet.ibm.com> <20170720225033.21298-4-mkayaalp@linux.vnet.ibm.com> <20170801171702.f2szj5huzbt7fdfl@docker> <2848EE0A-2DB8-420B-A611-60967EB90F5C@linux.vnet.ibm.com> Message-ID: <20170802214841.hw4pzjenxw47rcyp@docker> On Tue, Aug 01, 2017 at 01:25:31PM -0400, Mehmet Kayaalp wrote: > >> +unsigned long iint_flags(struct integrity_iint_cache *iint, > >> + struct ns_status *status) > >> +{ > >> + if (!status) > >> + return iint->flags; > >> + > >> + return iint->flags & (status->flags & IMA_NS_STATUS_FLAGS); > > > > Just to confirm, is there any situation where: > > > > iint->flags & IMA_NS_STATUS_FLAGS != status->flags & IMA_NS_STATUS_FLAGS > > > > ? i.e. can this line just be: > > > > return status->flags & IMA_NS_STATUS_FLAGS; > > > > As Guilherme had pointed out, the first & should be |. Sorry, that mail got filtered somehow, thanks. Per your discussion, I guess the most defensive way is: iint->flags & ~IMA_NS_STATUS_FLAGS | status->flags & IMA_NS_STATUS_FLAGS in case something comes along and sets IMA_AUDITED on the root iint, we don't want it to propagate to this ns' status unnecessarily. Anyway, thanks! Tycho From teamalert at whereareyounow.com Thu Aug 3 13:06:12 2017 From: teamalert at whereareyounow.com (WAYN) Date: Thu, 3 Aug 2017 14:06:12 +0100 Subject: Lidia has sent you a story Message-ID: <454b595a9b2c930fa7c26db701a89482@whereareyounow.com> Hi Martin, Lidia has sent you a story. Read the story: http://www2.wayn.com/-/105514-o18jrf/27877917-7a7982ca1b36755b Regards, The WAYN Team ---------------------------------------- If you don't want to receive these emails, unsubscribe: http://www2.wayn.com/-/105515-o18jrf/27877917-7a7982ca1b36755b. To manage all emails you receive, click here: http://www2.wayn.com/-/105516-o18jrf/27877917-7a7982ca1b36755b. From membership at whereareyounow.com Fri Aug 4 05:18:11 2017 From: membership at whereareyounow.com (WAYN) Date: Fri, 4 Aug 2017 06:18:11 +0100 Subject: Martin, we have a gift for you Message-ID: <3eaa25cc7757cefbfc2005c803e94725@whereareyounow.com> Hi Martin, We have a surprise for you! As a new member you have been awarded a free 3 days VIP Membership. To start benefitting from it immediately, activate your VIP now... Activate VIP trial: http://www2.wayn.com/-/36882-o39mku/27877917-7a7982ca1b36755b Thanks to VIP you can: * get upgrades and discount with worldwide hotels group, * benefit from range of unbeatable travel services, * meet more people and get more online privileges, * access some of the world's best nightclubs, with the VIP treatment, * enjoy discounts on airfares with global airlines, * be invisible and browse profiles in secret, * and more! Regards, WAYN ---------------------------------------- Terms and conditions apply: http://www2.wayn.com/-/36883-o39mku/27877917-7a7982ca1b36755b This is an automated one-off reminder. Please do not reply. To manage your future WAYN communication click here: http://www2.wayn.com/-/36884-o39mku/27877917-7a7982ca1b36755b From membership at whereareyounow.com Fri Aug 4 06:02:35 2017 From: membership at whereareyounow.com (WAYN Team) Date: Fri, 4 Aug 2017 07:02:35 +0100 Subject: Welcome to your Upgraded WAYN account! Message-ID: <9d5c913c93524be101ce76b705c2f004@whereareyounow.com> Dear Martin, Welcome to the WAYN VIP Club! You can now take advantage of a great selection of travel, lifestyle and profile benefits, only available to VIP members. To browse the range of benefits, visit the VIP Page. Your membership details are as follows: Name: Martin Carames Abente Member ID: 27877917 Expiry Date: 7th August 2017 Please retain a copy of this email as proof of purchase. If you need to get in touch, contact us on support at wayn.com. Check all benefits: http://www2.wayn.com/-/61955-o3a5sw/27877917-7a7982ca1b36755b Enjoy your new status! The WAYN Team From orders.namebadges at gmail.com Fri Aug 4 14:09:07 2017 From: orders.namebadges at gmail.com (=?utf-8?Q?Graham?=) Date: Fri, 4 Aug 2017 14:09:07 +0000 Subject: =?utf-8?Q?Names=20badges=20With=20your=20Company=20Lo?= =?utf-8?Q?go=20R38=20Each?= Message-ID: <0100015dad940cc3-a40ba679-fa09-4c30-bd30-84d1e2c42efe-000000@email.amazonses.com> we would love the opportunity to provide you with our Quality Name badges from R38 each Delivery to your door ... ask to see a picture and we will email one If interested email list of names company logo Regards Graham Lloyd -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 264682 bytes Desc: not available URL: From account at whereareyounow.com Fri Aug 4 18:29:26 2017 From: account at whereareyounow.com (Evelyn Tanislla via WAYN) Date: Fri, 4 Aug 2017 19:29:26 +0100 Subject: Evelyn Tanislla sent you a message Message-ID: <0350a0422a97e5aa775cc79500196638@whereareyounow.com> Evelyn Tanislla has sent you a message To read the message, click on the link below: http://www.wayn.com/-/28887-3e8ot5ti/27877917?message_key=247878347 You have also 4 unread messages from: Dagmara, Lidia, WAYN and more Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from Evelyn, click on the link below: http://www.wayn.com/-/28354-3e8ot5ti/27877917?bk=27909318&aid=72 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/27847-3e8ot5ti/27877917?aid=72 From account at whereareyounow.com Fri Aug 4 20:55:24 2017 From: account at whereareyounow.com (=?utf-8?B?0KLQsNGC0YzRj9C90LAg0JrQvtC30LvQvtCy0LAgdmlhIFdBWU4=?=) Date: Fri, 4 Aug 2017 21:55:24 +0100 Subject: Your photo has been rated Message-ID: <0b036c5c8b60606e445c757e05a915b5@whereareyounow.com> ??????? ??????? added a rate to one of your photos [View all rates on your photos] http://www.wayn.com/-/45429-3e8otaan/27877917-7a7982ca1b36755b?member=27877917" [View ??????? ???????'s photos] http://www.wayn.com/-/45430-3e8otaan/27877917-7a7982ca1b36755b?member_key=27663652 Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from ???????, click on the link below: http://www.wayn.com/-/45427-3e8otaan/27877917-7a7982ca1b36755b?bk=27663652&aid=59 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/45428-3e8otaan/27877917-7a7982ca1b36755b?aid=59 From account at whereareyounow.com Fri Aug 4 22:05:56 2017 From: account at whereareyounow.com (Sharon Jocelyn via WAYN) Date: Fri, 4 Aug 2017 23:05:56 +0100 Subject: Sharon Jocelyn sent you a message Message-ID: <8953dbe878003d20e4e415610017b543@whereareyounow.com> Sharon Jocelyn has sent you a message To read the message, click on the link below: http://www.wayn.com/-/28887-3e8otd70/27877917?message_key=247884011 Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from Sharon, click on the link below: http://www.wayn.com/-/28354-3e8otd70/27877917?bk=27909631&aid=72 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/27847-3e8otd70/27877917?aid=72 From account at whereareyounow.com Sat Aug 5 05:35:08 2017 From: account at whereareyounow.com (WAYN) Date: Sat, 5 Aug 2017 06:35:08 +0100 Subject: raulpl viewed your profile Message-ID: <7c20993243f65c61219b58e0055e71df@whereareyounow.com> Hi Martin, raulpl aurara viewed your Profile To view her profile, click on the link below. http://www.wayn.com/-/29450-3e8oton5/27877917-7a7982ca1b36755b?member_key=27910118 Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from raulpl, click on the link below: http://www.wayn.com/-/29452-3e8oton5/27877917-7a7982ca1b36755b?bk=27910118&aid=29 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/29451-3e8oton5/27877917-7a7982ca1b36755b?aid=29 From account at whereareyounow.com Sat Aug 5 10:23:48 2017 From: account at whereareyounow.com (=?utf-8?B?0KLQsNGC0YzRj9C90LAg0JrQvtC30LvQvtCy0LAgdmlhIFdBWU4=?=) Date: Sat, 5 Aug 2017 11:23:48 +0100 Subject: Your photo has been rated Message-ID: <733a00055142fc16932f184a0053b357@whereareyounow.com> ??????? ??????? added a rate to one of your photos [View all rates on your photos] http://www.wayn.com/-/45429-3e8otaan/27877917-7a7982ca1b36755b?member=27877917" [View ??????? ???????'s photos] http://www.wayn.com/-/45430-3e8otaan/27877917-7a7982ca1b36755b?member_key=27663652 Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from ???????, click on the link below: http://www.wayn.com/-/45427-3e8otaan/27877917-7a7982ca1b36755b?bk=27663652&aid=59 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/45428-3e8otaan/27877917-7a7982ca1b36755b?aid=59 From account at whereareyounow.com Sat Aug 5 11:26:33 2017 From: account at whereareyounow.com (Alice Holli via WAYN) Date: Sat, 5 Aug 2017 12:26:33 +0100 Subject: Alice Holli sent you a message Message-ID: <4ab02bbc669d01f4fc59450c001bce7e@whereareyounow.com> Alice Holli has sent you a message To read the message, click on the link below: http://www.wayn.com/-/28887-3e8ou3l2/27877917?message_key=247902284 You have also 2 unread messages from: Sharon Regards, The WAYN Team ---------------------------------------------------- To stop receiving emails from Alice, click on the link below: http://www.wayn.com/-/28354-3e8ou3l2/27877917?bk=27910808&aid=72 If you don't want to receive these emails anymore, click on the link below: http://www.wayn.com/-/27847-3e8ou3l2/27877917?aid=72 From support at whereareyounow.com Sat Aug 5 11:35:45 2017 From: support at whereareyounow.com (support at whereareyounow.com) Date: Sat, 05 Aug 2017 11:35:45 +0000 (UTC) Subject: Reset your WAYN Password Message-ID: <1kq_4dNoSmaqhTI21YlMGA@ismtpd0004p1lon1.sendgrid.net> @{subject} Hi Martin, You just requested a password reset for your WAYN account associated with this email address. Please click the following link to reset your password. Reset?password? (https://www2.wayn.com/password-reminder/27877917/B67CC819E99D44C39B68B95BE954932B) To keep your account secure Password Reset link will expire in 12 hours and you will need to request a new one. Please contact us if you continue to experience difficulty. Thanks, The WAYN Team This email is being sent to you because a password reset request has been made at http://www.wayn.com (http://www.wayn.com) for email address: containers at lists.linux-foundation.org If you did not request this then please forward this email to us at abuse at wayn.com (mailto:abuse at wayn.com) From macro at imgtec.com Mon Aug 7 16:18:11 2017 From: macro at imgtec.com (Maciej W. Rozycki) Date: Mon, 7 Aug 2017 17:18:11 +0100 Subject: [PATCH 4/7] signal/mips: Document a conflict with SI_USER with SIGFPE In-Reply-To: <20170718140651.15973-4-ebiederm@xmission.com> References: <87o9shg7t7.fsf_-_@xmission.com> <20170718140651.15973-4-ebiederm@xmission.com> Message-ID: On Tue, 18 Jul 2017, Eric W. Biederman wrote: > diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c > index b68b4d0726d3..6c9cca9c5341 100644 > --- a/arch/mips/kernel/traps.c > +++ b/arch/mips/kernel/traps.c > @@ -735,7 +735,7 @@ void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr, > else if (fcr31 & FPU_CSR_INE_X) > si.si_code = FPE_FLTRES; > else > - si.si_code = __SI_FAULT; > + si.si_code = FPE_FIXME; This is an "impossible" state to reach unless your hardware is on fire. One or more of the FCSR Cause bits will have been set (in `fcr31') or the FPE exception would not have happened. Of course there could be a simulator bug, or we could have breakage somewhere causing `process_fpemu_return' to be called with SIGFPE and inconsistent `fcr31'. So we need to handle it somehow. So what would be the right value of `si_code' to use here for such an unexpected exception condition? I think `BUG()' would be too big a hammer here. Or wouldn't it? Maciej From torvalds at linux-foundation.org Mon Aug 7 17:41:39 2017 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 7 Aug 2017 10:41:39 -0700 Subject: [PATCH 4/7] signal/mips: Document a conflict with SI_USER with SIGFPE In-Reply-To: References: <87o9shg7t7.fsf_-_@xmission.com> <20170718140651.15973-4-ebiederm@xmission.com> Message-ID: On Mon, Aug 7, 2017 at 9:18 AM, Maciej W. Rozycki wrote: > > So what would be the right value of `si_code' to use here for such an > unexpected exception condition? I think `BUG()' would be too big a > hammer here. Or wouldn't it? Hell no. NEVER EVER BUG(). The only case to use BUG() is if there is some core data structure (say, kernel stack) that is so corrupted that you know you cannot continue. That's the *only* valid use. If this is a "this condition cannot happen" issue, then just remove the damn conditional. It's pointless. Adding a BUG() to show "this cannot happen" is not acceptable. Linus From ralf at linux-mips.org Mon Aug 7 19:55:13 2017 From: ralf at linux-mips.org (Ralf Baechle) Date: Mon, 7 Aug 2017 21:55:13 +0200 Subject: [PATCH 4/7] signal/mips: Document a conflict with SI_USER with SIGFPE In-Reply-To: References: <87o9shg7t7.fsf_-_@xmission.com> <20170718140651.15973-4-ebiederm@xmission.com> Message-ID: <20170807195513.GD3509@linux-mips.org> On Mon, Aug 07, 2017 at 10:41:39AM -0700, Linus Torvalds wrote: > On Mon, Aug 7, 2017 at 9:18 AM, Maciej W. Rozycki wrote: > > > > So what would be the right value of `si_code' to use here for such an > > unexpected exception condition? I think `BUG()' would be too big a > > hammer here. Or wouldn't it? > > Hell no. NEVER EVER BUG(). > > The only case to use BUG() is if there is some core data structure > (say, kernel stack) that is so corrupted that you know you cannot > continue. That's the *only* valid use. > > If this is a "this condition cannot happen" issue, then just remove > the damn conditional. It's pointless. Adding a BUG() to show "this > cannot happen" is not acceptable. I queued a patch to remove the code for 4.14. Ralf From guilherme.magalhaes at hpe.com Tue Aug 8 13:22:23 2017 From: guilherme.magalhaes at hpe.com (Magalhaes, Guilherme (Brazil R&D-CL)) Date: Tue, 8 Aug 2017 13:22:23 +0000 Subject: [Linux-ima-devel] [RFC PATCH 1/5] ima: extend clone() with IMA Message-ID: Stefan, Still on the vTPM requirements, could you help answering the following questions? 1. Where will the boot measurements be stored? What is the integrity measurement domain for this vTPM? The current proposal is that the vTPM would be used for the container (or namespace) files/inodes. What else will be available from the vTPM? For example, will the vTPM provide the UEFI measurements on the first PCRs (copied/proxied from physical TPM)? 2. From an attestation/quote perspective, how do you envision the key material to be managed (e.g. the vTPM EK and/or Attestation Key is fixed to the physical TPM, or it's cryptographically bound to it)? 3. Can you elaborate more on the alignment of this solution with the TCG requirements, especially considering the lack of isolation on the vTPM solution, do you have a future plan to cover those issues? 4. In a micro services pattern, or a serverless compute pattern, in which one or more containers are created to handle each individual request it is possible that there will be several thousand containers created per hour on a busy server. What is the expected performance and scalability of vTPMs within such an environment? -- Guilherme > -----Original Message----- > From: Stefan Berger [mailto:stefanb at linux.vnet.ibm.com] > Sent: quinta-feira, 27 de julho de 2017 17:52 > To: Magalhaes, Guilherme (Brazil R&D-CL) ; > Mimi Zohar ; Serge E. Hallyn > Cc: Mehmet Kayaalp ; Yuqiong Sun > ; containers foundation.org>; linux-kernel ; David Safford > ; James Bottomley > ; linux-security-module security-module at vger.kernel.org>; ima-devel devel at lists.sourceforge.net>; Yuqiong Sun > Subject: Re: [Linux-ima-devel] [RFC PATCH 1/5] ima: extend clone() with IMA > namespace support > > On 07/27/2017 03:39 PM, Magalhaes, Guilherme (Brazil R&D-CL) wrote: > > > >> There's a vTPM proxy driver in the kernel that enables spawning a > >> frontend /dev/tpm%d and an anonymous backend file descriptor where a > >> vTPM can listen on for TPM commands. I integrated this with 'swtpm' and > >> I have been working on integrating this into runc. Currently each > >> container started with runc can get one (or multiple) vTPMs and > >> /dev/tpm0 [and /dev/tpmrm0 in case of TPM2] then appear inside the > >> container. > >> > > This is an interesting solution especially for nested namespaces with the > > recursive application of measurements and a having list per container. > > > > Following the TCG specs/requirements, what could we say about security > > guarantees of real TPMs Vs this vTPM implementation? > > > A non-root user may not be able to do access the (permanent) state of > the vTPM state files since the container management stack would restrict > access to the files using DAC. Access to runtime data is also prevented > since the vTPM would not run under the account of the non-root user. > > To protect the vTPM's permanent state file from access by a root user it > comes down to preventing the root user from getting a hold of the key > used for encrypting that file. Encrypting the state of the vTPM is > probably the best we can do to approximate a temper-resistant chip, but > preventing the root user from accessing the key may be more challenging. > Preventing root from accessing runtime data could be achieved by using > XGS or a similar technology. > > Stefan > From ebiederm at xmission.com Tue Aug 8 15:29:18 2017 From: ebiederm at xmission.com (Eric W. Biederman) Date: Tue, 08 Aug 2017 10:29:18 -0500 Subject: [PATCH 4/7] signal/mips: Document a conflict with SI_USER with SIGFPE In-Reply-To: (Maciej W. Rozycki's message of "Mon, 7 Aug 2017 17:18:11 +0100") References: <87o9shg7t7.fsf_-_@xmission.com> <20170718140651.15973-4-ebiederm@xmission.com> Message-ID: <87mv7agjsh.fsf@xmission.com> "Maciej W. Rozycki" writes: > On Tue, 18 Jul 2017, Eric W. Biederman wrote: > >> diff --git a/arch/mips/kernel/traps.c b/arch/mips/kernel/traps.c >> index b68b4d0726d3..6c9cca9c5341 100644 >> --- a/arch/mips/kernel/traps.c >> +++ b/arch/mips/kernel/traps.c >> @@ -735,7 +735,7 @@ void force_fcr31_sig(unsigned long fcr31, void __user *fault_addr, >> else if (fcr31 & FPU_CSR_INE_X) >> si.si_code = FPE_FLTRES; >> else >> - si.si_code = __SI_FAULT; >> + si.si_code = FPE_FIXME; > > This is an "impossible" state to reach unless your hardware is on fire. > One or more of the FCSR Cause bits will have been set (in `fcr31') or the > FPE exception would not have happened. > > Of course there could be a simulator bug, or we could have breakage > somewhere causing `process_fpemu_return' to be called with SIGFPE and > inconsistent `fcr31'. So we need to handle it somehow. > > So what would be the right value of `si_code' to use here for such an > unexpected exception condition? I think `BUG()' would be too big a > hammer here. Or wouldn't it? The possible solutions I can think of are: WARN_ON_ONCE with a comment. Add a new si_code to uapi/asm-generic/siginfo.h perhaps FPE_IMPOSSIBLE. Like syscall numbers si_codes are cheap. Call force_sig() instead of force_sig_info, using just a generic si_code. If this is truly impossible and the compiler doesn't complain just drop the code. Eric From macro at imgtec.com Tue Aug 8 23:19:12 2017 From: macro at imgtec.com (Maciej W. Rozycki) Date: Wed, 9 Aug 2017 00:19:12 +0100 Subject: [PATCH 4/7] signal/mips: Document a conflict with SI_USER with SIGFPE In-Reply-To: <87mv7agjsh.fsf@xmission.com> References: <87o9shg7t7.fsf_-_@xmission.com> <20170718140651.15973-4-ebiederm@xmission.com> <87mv7agjsh.fsf@xmission.com> Message-ID: On Tue, 8 Aug 2017, Eric W. Biederman wrote: > > This is an "impossible" state to reach unless your hardware is on fire. > > One or more of the FCSR Cause bits will have been set (in `fcr31') or the > > FPE exception would not have happened. > > > > Of course there could be a simulator bug, or we could have breakage > > somewhere causing `process_fpemu_return' to be called with SIGFPE and > > inconsistent `fcr31'. So we need to handle it somehow. > > > > So what would be the right value of `si_code' to use here for such an > > unexpected exception condition? I think `BUG()' would be too big a > > hammer here. Or wouldn't it? > > The possible solutions I can think of are: > > WARN_ON_ONCE with a comment. > > Add a new si_code to uapi/asm-generic/siginfo.h perhaps FPE_IMPOSSIBLE. > Like syscall numbers si_codes are cheap. I think we ought to do both. First, we have our own FP emulation code, which is changed from time to time, that uses the same exit path that the hardware exception does. It could happen that we miss something and return SIGFPE from the emulation code without setting the cause bits appropriately. This would be our own bug which might trigger exceedingly rarely and could then be caught by WARN_ON_ONCE or otherwise stay there forever in the absence of that check. Second, changing `si_code' from __SI_FAULT to 0 aka __SI_KILL will likely interfere with `copy_siginfo_to_user32' in arch/mips/kernel/signal32.c, making the userland lose the address of the faulting instruction in 32-bit software run on 64-bit hardware only, making our API inconsistent. Using a distinct `si_code' value such as FPE_IMPOSSIBLE (though we might choose say FPE_FLTUNK for "FLoaTing point UNKnown" instead, for consistency; mind that most `si_code' macros have the same number of characters within groups associated with individual signals) for such odd traps is allowed by SUS and will prevent the inconsistency from happening, very cheaply as you say. Maciej From drjlreh at hcsmail.net Wed Aug 9 02:20:33 2017 From: drjlreh at hcsmail.net (Mrs Pamela Griffin Wells) Date: Wed, 9 Aug 2017 05:20:33 +0300 Subject: Day To You Good My Dear Message-ID: Day To You Good My Dear, I Am Mrs Pamela Griffin. I Was Married To Late Mr. Griffin Wells, Who Was A Wealthy Business Man In This Country. We Were Married For Many Years Without A Child Before He Died After A Brief Illness. Before sudden death we his was devoted christian.When my late husband was alive he deposited the sum of $ 8.5.Eight Million Five Hundred Thousand Dollars. I am very sick from kidney cancer that i may i decided to donate this decided to donate this money to a honest individually who will like it like god's work such as orphans, widows and building of churches To fulfill the vow i and my late husband made to God. I have not deep thought that took me some day to make this as I have not been any child to inherit this Fund and our relatives are all unbelievers And I Do Not Want Our Hard Earned Money To Be Used In An Ungodly Way. So You Will Take 15 Percent Of The Fund For Your Efforts And Use As Remaining I Stated Above. As Soon As I Read From You I Will Give You More Details On how to achieve this goal and get this flow transferred your bank account to. I need your urgent reply as I do not know what tomorrow will result. Your sister in the lord Mrs Pamela Griffin. From noreply at iharayosuke.com Wed Aug 9 15:33:47 2017 From: noreply at iharayosuke.com (Fast-Drugs) Date: Wed, 9 Aug 2017 21:33:47 +0600 Subject: We want our pharmacy to become even better so that we could provide more quality drugs! Message-ID: Friendly service. Very fast delivery! Vbiagra C9ialis Reccomend her dislocated blouses and battery life cycle and knowledge. Mary, shopkeeperpeace mary, shopkeeperpeace wm sadly, many manufacturers. Adiscrete choice but entrance to innovation, collaboration with francais, or mechanism. From stefanb at linux.vnet.ibm.com Wed Aug 9 15:31:15 2017 From: stefanb at linux.vnet.ibm.com (Stefan Berger) Date: Wed, 9 Aug 2017 11:31:15 -0400 Subject: [Linux-ima-devel] [RFC PATCH 1/5] ima: extend clone() with IMA In-Reply-To: References: Message-ID: <16ab6c1f-3071-22ee-f526-3ac3603a047e@linux.vnet.ibm.com> On 08/08/2017 09:22 AM, Magalhaes, Guilherme (Brazil R&D-CL) wrote: > Stefan, > Still on the vTPM requirements, could you help answering the following > questions? > > 1. Where will the boot measurements be stored? What is the integrity > measurement domain for this vTPM? The current proposal is that the > vTPM would be used for the container (or namespace) files/inodes. > What else will be available from the vTPM? For example, will the vTPM > provide the UEFI measurements on the first PCRs (copied/proxied from > physical TPM)? The vTPM will receive PCR extends exclusively from the namespace it is associated with. The UEFI measurements could be retrieved from the hardware TPM. They are not copied since this would require copying the UEFI measurement list of the host as well. Otherwise the vTPM allows all commands to be used. > > 2. From an attestation/quote perspective, how do you envision the key > material to be managed (e.g. the vTPM EK and/or Attestation Key is > fixed to the physical TPM, or it's cryptographically bound to it)? For quotes by the vTPM to work the EK and AIK need to be inside the vTPM. Similarly the EK and AIK of the hardware TPM would need be bound to the hardware TPM for the quoting of the hardware TPM's PCRs to work. If there's an official way, design by TCG for example, for how to quote the PCRs of a virtual TPM by the hardware TPM, I would like to know. > > 3. Can you elaborate more on the alignment of this solution with the > TCG requirements, especially considering the lack of isolation on the > vTPM solution, do you have a future plan to cover those issues? A software emulated TPM does have its challenges when it comes to isolation from the root user, as explained in the last email. I am not sure there is a solution for protecting it from attacks from root, though we can protect it from non-root users fairly easily. If there are other isolation requirements than that, let me know. > > 4. In a micro services pattern, or a serverless compute pattern, in > which one or more containers are created to handle each individual > request it is possible that there will be several thousand containers > created per hour on a busy server. What is the expected performance > and scalability of vTPMs within such an environment? A vTPM would be created as part of creating a container. The creation of certificates takes a couple of seconds to contact the CA and mint the cert. I would say not all of the containers would need to have a certificate. Stefan > > -- > Guilherme > >> -----Original Message----- >> From: Stefan Berger [mailto:stefanb at linux.vnet.ibm.com] >> Sent: quinta-feira, 27 de julho de 2017 17:52 >> To: Magalhaes, Guilherme (Brazil R&D-CL) ; >> Mimi Zohar ; Serge E. Hallyn >> Cc: Mehmet Kayaalp ; Yuqiong Sun >> ; containers > foundation.org>; linux-kernel ; David Safford >> ; James Bottomley >> ; linux-security-module > security-module at vger.kernel.org>; ima-devel > devel at lists.sourceforge.net>; Yuqiong Sun >> Subject: Re: [Linux-ima-devel] [RFC PATCH 1/5] ima: extend clone() with IMA >> namespace support >> >> On 07/27/2017 03:39 PM, Magalhaes, Guilherme (Brazil R&D-CL) wrote: >>>> There's a vTPM proxy driver in the kernel that enables spawning a >>>> frontend /dev/tpm%d and an anonymous backend file descriptor where a >>>> vTPM can listen on for TPM commands. I integrated this with 'swtpm' and >>>> I have been working on integrating this into runc. Currently each >>>> container started with runc can get one (or multiple) vTPMs and >>>> /dev/tpm0 [and /dev/tpmrm0 in case of TPM2] then appear inside the >>>> container. >>>> >>> This is an interesting solution especially for nested namespaces with the >>> recursive application of measurements and a having list per container. >>> >>> Following the TCG specs/requirements, what could we say about security >>> guarantees of real TPMs Vs this vTPM implementation? >> >> A non-root user may not be able to do access the (permanent) state of >> the vTPM state files since the container management stack would restrict >> access to the files using DAC. Access to runtime data is also prevented >> since the vTPM would not run under the account of the non-root user. >> >> To protect the vTPM's permanent state file from access by a root user it >> comes down to preventing the root user from getting a hold of the key >> used for encrypting that file. Encrypting the state of the vTPM is >> probably the best we can do to approximate a temper-resistant chip, but >> preventing the root user from accessing the key may be more challenging. >> Preventing root from accessing runtime data could be achieved by using >> XGS or a similar technology. >> >> Stefan >> > -- > To unsubscribe from this list: send the line "unsubscribe linux-security-module" in > the body of a message to majordomo at vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > From stefanb at linux.vnet.ibm.com Fri Aug 11 15:00:52 2017 From: stefanb at linux.vnet.ibm.com (Stefan Berger) Date: Fri, 11 Aug 2017 11:00:52 -0400 Subject: [RFC PATCH 2/5] ima: Add ns_status for storing namespaced iint data In-Reply-To: <20170720225033.21298-3-mkayaalp@linux.vnet.ibm.com> References: <20170720225033.21298-1-mkayaalp@linux.vnet.ibm.com> <20170720225033.21298-3-mkayaalp@linux.vnet.ibm.com> Message-ID: <9a2ae1ef-95c3-6eb3-bf4d-78b84d5a77e4@linux.vnet.ibm.com> On 07/20/2017 06:50 PM, Mehmet Kayaalp wrote: > This patch adds an rbtree to the IMA namespace structure that stores a > namespaced version of iint->flags in ns_status struct. Similar to the > integrity_iint_cache, both the iint ns_struct are looked up using the > inode pointer value. The lookup, allocate, and insertion code is also > similar, except ns_struct is not free'd when the inode is free'd. > Instead, the lookup verifies the i_ino and i_generation fields are also a > match. A lazy clean up of the rbtree that removes free'd inodes could be > implemented to reclaim the invalid entries. > > Signed-off-by: Mehmet Kayaalp > --- > include/linux/ima.h | 3 + > security/integrity/ima/ima.h | 16 ++++++ > security/integrity/ima/ima_ns.c | 120 ++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 139 insertions(+) > > > @@ -181,3 +198,106 @@ struct ima_namespace init_ima_ns = { > .parent = NULL, > }; > EXPORT_SYMBOL(init_ima_ns); > + > +/* > + * __ima_ns_status_find - return the ns_status associated with an inode > + */ > +static struct ns_status *__ima_ns_status_find(struct ima_namespace *ns, > + struct inode *inode) > +{ > + struct ns_status *status; > + struct rb_node *n = ns->ns_status_tree.rb_node; > + > + while (n) { > + status = rb_entry(n, struct ns_status, rb_node); > + > + if (inode < status->inode) > + n = n->rb_left; > + else if (inode->i_ino > status->i_ino) > + n = n->rb_right; Above you are comparing with the inode ptr, here with i_ino. Why can you not compare with inode both times. Also the insertion only seems to consider the inode ptr. Stefan From rgb at redhat.com Mon Aug 14 05:47:11 2017 From: rgb at redhat.com (Richard Guy Briggs) Date: Mon, 14 Aug 2017 01:47:11 -0400 Subject: [PATCH 2/9] Implement containers as kernel objects In-Reply-To: <149547016213.10599.1969443294414531853.stgit@warthog.procyon.org.uk> References: <149547014649.10599.12025037906646164347.stgit@warthog.procyon.org.uk> <149547016213.10599.1969443294414531853.stgit@warthog.procyon.org.uk> Message-ID: <20170814054711.GB29957@madcap2.tricolour.ca> On 2017-05-22 17:22, David Howells wrote: > A container is then a kernel object that contains the following things: > > (1) Namespaces. > > (2) A root directory. > > (3) A set of processes, including one designated as the 'init' process. > > A container is created and attached to a file descriptor by: > > int cfd = container_create(const char *name, unsigned int flags); > > this inherits all the namespaces of the parent container unless otherwise > the mask calls for new namespaces. > > CONTAINER_NEW_FS_NS > CONTAINER_NEW_EMPTY_FS_NS > CONTAINER_NEW_CGROUP_NS [root only] > CONTAINER_NEW_UTS_NS > CONTAINER_NEW_IPC_NS > CONTAINER_NEW_USER_NS > CONTAINER_NEW_PID_NS > CONTAINER_NEW_NET_NS > > Other flags include: > > CONTAINER_KILL_ON_CLOSE > CONTAINER_CLOSE_ON_EXEC Hi David, I wanted to respond to this thread to attempt some constructive feedback, better late than never. I had a look at your fsopen/fsmount() patchset(s) to support this patchset which was interesting, but doesn't directly affect my work. The primary patch of interest to the audit kernel folks (Paul Moore and me) is this patch while the rest of the patchset is interesting, but not likely to directly affect us. This patch has most of what we need to solve our problem. Paul and I agree that audit is going to have a difficult time identifying containers or even namespaces without some change to the kernel. The audit subsystem in the kernel needs at least a basic clue about which container caused an event to be able to report this at the appropriate level and ignore it at other levels to avoid a DoS. We also agree that there will need to be some sort of trigger from userspace to indicate the creation of a container and its allocated resources and we're not really picky how that is done, such as a clone flag, a syscall or a sysfs write (or even a read, I suppose), but there will need to be some permission restrictions, obviously. (I'd like to see capabilities used for this by adding a specific container bit to the capabilities bitmask.) I doubt we will be able to accomodate all definitions or concepts of a container in a timely fashion. We'll need to start somewhere with a minimum definition so that we can get traction and actually move forward before another compelling shared kernel microservice method leaves our entire community behind. I'd like to declare that a container is a full set of cloned namespaces, but this is inefficient, overly constricting and unnecessary for our needs. If we could agree on a minimum definition of a container (which may have only one specific cloned namespace) then we have something on which to build. I could even see a container being defined by a trigger sent from userspace about a process (task) from which all its children are considered to be within that container, subject to further nesting. In the simplest usable model for audit, if a container (definition implies and) starts a PID namespace, then the container ID could simply be the container's "init" process PID in the initial PID namespace. This assumes that as soon as that process vanishes, that entire container and all its children are killed off (which you've done). There may be some container orchestration systems that don't use a unique PID namespace per container and that imposing this will cause them challenges. If containers have at minimum a unique mount namespace then the root path dentry inode device and inode number could be used, but there are likely better identifiers. Again, there may be container orchestrators that don't use a unique mount namespace per container and that imposing this will cause challenges. I expect there are similar examples for each of the other namespaces. If we could pick one namespace type for consensus for which each container has a unique instance of that namespace, we could use the dev/ino tuple from that namespace as had originally been suggested by Aristeu Rozanski more than 4 years ago as part of the set of namespace IDs. I had also attempted to solve this problem by using the namespace' proc inode, then switched over to generate a unique kernel serial number for each namespace and then went back to namespace proc dev/ino once Al Viro implemented nsfs: v1 https://lkml.org/lkml/2014/4/22/662 v2 https://lkml.org/lkml/2014/5/9/637 v3 https://lkml.org/lkml/2014/5/20/287 v4 https://lkml.org/lkml/2014/8/20/844 v5 https://lkml.org/lkml/2014/10/6/25 v6 https://lkml.org/lkml/2015/4/17/48 v7 https://lkml.org/lkml/2015/5/12/773 These patches don't use a container ID, but track all namespaces in use for an event. This has the benefit of punting this tracking to userspace for some other tool to analyse and determine to which container an event belongs. This will use a lot of bandwidth in audit log files when a single container ID that doesn't require nesting information to be complete would be a much more efficient use of audit log bandwidth. If we rely only on the setting of arbitrary container names from userspace, then we must provide a map or tree back to the initial audit domain for that running kernel to be able to differentiate between potentially identical container names assigned in a nested container system. If we assign a container serial number sequentially (atomic64_inc) from the kernel on request from userspace like the sessionID and log the creation with all nsIDs and the parent container serial number and/or container name, the nesting is clear due to lack of ambiguity in potential duplicate names in nesting. If a container serial number is used, the tree of inheritance of nested containers can be rebuilt from the audit records showing what containers were spawned from what parent. As was suggested in one of the previous threads, if there are any events not associated with a task (incoming network packets) we log the namespace ID and then only concern ourselves with its container serial number or container name once it becomes associated with a task at which point that tracking will be more important anyways. I'm not convinced that a userspace or kernel generated UUID is that useful since they are large, not human readable and may not be globally unique given the "pets vs cattle" direction we are going with potentially identical conditions in hosts or containers spawning containers, but I see no need to restrict them. How do we deal with setns()? Once it is determined that action is permitted, given the new combinaiton of namespaces and potential membership in a different container, record the transition from one container to another including all namespaces if the latter are a different subset than the target container initial set. David, this patch of yours provides most of what we need, but there is a danger that some compromises (complete freedom of which namespaces to clone) will make it unusable for our needs unless other mechanisms are added (internal container serial number). To answer Andy's inevitable question: We want to be able to attribute audit events, whether they are generated by userspace or by a kernel event, to a specific container. Since the kernel has no concept of a container, it needs at least a rudimentary one to be able to track activity of kernel objects, similar to what is already done with the loginuid (auid) and sessionid, neither of which are kernel concepts, but the kernel keeps track of these as a service to userspace. We are able to track activity by task, but we don't know when that task or its namespaces (both resources) were allocated to a nebulous "container". This resource tracking is required for security certifications. Thanks. > Note that I've added a pointer to the current container to task_struct. > This doesn't make the nsproxy pointer redundant as you can still make new > namespaces with clone(). > > I've also added a list_head to task_struct to form a list in the container > of its member processes. This is convenient, but redundant since the code > could iterate over all the tasks looking for ones that have a matching > task->container. > > > ================== > FUTURE DEVELOPMENT > ================== > > (1) Setting up the container. > > It should then be possible for the supervising process to modify the > new container by: > > container_mount(int cfd, > const char *source, > const char *target, /* NULL -> root */ > const char *filesystemtype, > unsigned long mountflags, > const void *data); > container_chroot(int cfd, const char *path); > container_bind_mount_across(int cfd, > const char *source, > const char *target); /* NULL -> root */ > mkdirat(int cfd, const char *path, mode_t mode); > mknodat(int cfd, const char *path, mode_t mode, dev_t dev); > int fd = openat(int cfd, const char *path, > unsigned int flags, mode_t mode); > int fd = container_socket(int cfd, int domain, int type, > int protocol); > > Opening a netlink socket inside the container should allow management > of the container's network namespace. > > (2) Starting the container. > > Once all modifications are complete, the container's 'init' process > can be started by: > > fork_into_container(int cfd); > > This precludes further external modification of the mount tree within > the container. Before this point, the container is simply destroyed > if the container fd is closed. > > (3) Waiting for the container to complete. > > The container fd can then be polled to wait for init process therein > to complete and the exit code collected by: > > container_wait(int container_fd, int *_wstatus, unsigned int wait, > struct rusage *rusage); > > The container and everything in it can be terminated or killed off: > > container_kill(int container_fd, int initonly, int signal); > > If 'init' dies, all other processes in the container are preemptively > SIGKILL'd by the kernel. > > By default, if the container is active and its fd is closed, the > container is left running and wil be cleaned up when its 'init' exits. > The default can be changed with the CONTAINER_KILL_ON_CLOSE flag. > > (4) Supervising the container. > > Given that we have an fd attached to the container, we could make it > such that the supervising process could monitor and override EPERM > returns for mount and other privileged operations within the > container. > > (5) Device restriction. > > Containers could come with a list of device IDs that the container is > allowed to open. Perhaps a list major numbers, each with a bitmap of > permitted minor numbers. > > (6) Per-container keyring. > > Each container could be given a per-container keyring for the holding > of integrity keys and filesystem keys. This list would be only > modifiable by the container's 'root' user and the supervisor process: > > container_add_key(const char *type, const char *description, > const void *payload, size_t plen, > int container_fd); > > The keys on the keyring would, however, be accessible/usable by all > processes within the keyring. > > > =============== > EXAMPLE PROGRAM > =============== > > #include > #include > #include > #include > > #define CONTAINER_NEW_FS_NS 0x00000001 /* Dup current fs namespace */ > #define CONTAINER_NEW_EMPTY_FS_NS 0x00000002 /* Provide new empty fs namespace */ > #define CONTAINER_NEW_CGROUP_NS 0x00000004 /* Dup current cgroup namespace [priv] */ > #define CONTAINER_NEW_UTS_NS 0x00000008 /* Dup current uts namespace */ > #define CONTAINER_NEW_IPC_NS 0x00000010 /* Dup current ipc namespace */ > #define CONTAINER_NEW_USER_NS 0x00000020 /* Dup current user namespace */ > #define CONTAINER_NEW_PID_NS 0x00000040 /* Dup current pid namespace */ > #define CONTAINER_NEW_NET_NS 0x00000080 /* Dup current net namespace */ > #define CONTAINER_KILL_ON_CLOSE 0x00000100 /* Kill all member processes when fd closed */ > #define CONTAINER_FD_CLOEXEC 0x00000200 /* Close the fd on exec */ > #define CONTAINER__FLAG_MASK 0x000003ff > > static inline int container_create(const char *name, unsigned int mask) > { > return syscall(333, name, mask, 0, 0, 0); > } > > static inline int fork_into_container(int containerfd) > { > return syscall(334, containerfd); > } > > int main() > { > pid_t pid; > int fd, ws; > > fd = container_create("foo-test", > CONTAINER__FLAG_MASK & ~( > CONTAINER_NEW_EMPTY_FS_NS | > CONTAINER_NEW_CGROUP_NS)); > if (fd == -1) { > perror("container_create"); > exit(1); > } > > system("cat /proc/containers"); > > switch ((pid = fork_into_container(fd))) { > case -1: > perror("fork_into_container"); > exit(1); > case 0: > close(fd); > setenv("PS1", "container>", 1); > execl("/bin/bash", "bash", NULL); > perror("execl"); > exit(1); > default: > if (waitpid(pid, &ws, 0) < 0) { > perror("waitpid"); > exit(1); > } > } > close(fd); > exit(0); > } > > Signed-off-by: David Howells > --- > > arch/x86/entry/syscalls/syscall_32.tbl | 1 > arch/x86/entry/syscalls/syscall_64.tbl | 1 > fs/namespace.c | 5 > include/linux/container.h | 85 ++++++ > include/linux/init_task.h | 4 > include/linux/lsm_hooks.h | 21 + > include/linux/sched.h | 3 > include/linux/security.h | 15 + > include/linux/syscalls.h | 3 > include/uapi/linux/container.h | 28 ++ > include/uapi/linux/magic.h | 1 > init/Kconfig | 7 > kernel/Makefile | 2 > kernel/container.c | 462 ++++++++++++++++++++++++++++++++ > kernel/exit.c | 1 > kernel/fork.c | 7 > kernel/namespaces.h | 15 + > kernel/nsproxy.c | 23 +- > kernel/sys_ni.c | 4 > security/security.c | 13 + > 20 files changed, 688 insertions(+), 13 deletions(-) > create mode 100644 include/linux/container.h > create mode 100644 include/uapi/linux/container.h > create mode 100644 kernel/container.c > create mode 100644 kernel/namespaces.h > > diff --git a/arch/x86/entry/syscalls/syscall_32.tbl b/arch/x86/entry/syscalls/syscall_32.tbl > index abe6ea95e0e6..9ccd0f52f874 100644 > --- a/arch/x86/entry/syscalls/syscall_32.tbl > +++ b/arch/x86/entry/syscalls/syscall_32.tbl > @@ -393,3 +393,4 @@ > 384 i386 arch_prctl sys_arch_prctl compat_sys_arch_prctl > 385 i386 fsopen sys_fsopen > 386 i386 fsmount sys_fsmount > +387 i386 container_create sys_container_create > diff --git a/arch/x86/entry/syscalls/syscall_64.tbl b/arch/x86/entry/syscalls/syscall_64.tbl > index 0977c5079831..dab92591511e 100644 > --- a/arch/x86/entry/syscalls/syscall_64.tbl > +++ b/arch/x86/entry/syscalls/syscall_64.tbl > @@ -341,6 +341,7 @@ > 332 common statx sys_statx > 333 common fsopen sys_fsopen > 334 common fsmount sys_fsmount > +335 common container_create sys_container_create > > # > # x32-specific system call numbers start at 512 to avoid cache impact > diff --git a/fs/namespace.c b/fs/namespace.c > index 4e9ad16db79c..7e2d5fe5728b 100644 > --- a/fs/namespace.c > +++ b/fs/namespace.c > @@ -28,6 +28,7 @@ > #include > #include > #include > +#include > > #include "pnode.h" > #include "internal.h" > @@ -3510,6 +3511,10 @@ static void __init init_mount_tree(void) > > set_fs_pwd(current->fs, &root); > set_fs_root(current->fs, &root); > +#ifdef CONFIG_CONTAINERS > + path_get(&root); > + init_container.root = root; > +#endif > } > > void __init mnt_init(void) > diff --git a/include/linux/container.h b/include/linux/container.h > new file mode 100644 > index 000000000000..084ea9982fe6 > --- /dev/null > +++ b/include/linux/container.h > @@ -0,0 +1,85 @@ > +/* Container objects > + * > + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. > + * Written by David Howells (dhowells at redhat.com) > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public Licence > + * as published by the Free Software Foundation; either version > + * 2 of the Licence, or (at your option) any later version. > + */ > + > +#ifndef _LINUX_CONTAINER_H > +#define _LINUX_CONTAINER_H > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +struct fs_struct; > +struct nsproxy; > +struct task_struct; > + > +/* > + * The container object. > + */ > +struct container { > + char name[24]; > + refcount_t usage; > + int exit_code; /* The exit code of 'init' */ > + const struct cred *cred; /* Creds for this container, including userns */ > + struct nsproxy *ns; /* This container's namespaces */ > + struct path root; /* The root of the container's fs namespace */ > + struct task_struct *init; /* The 'init' task for this container */ > + struct container *parent; /* Parent of this container. */ > + void *security; /* LSM data */ > + struct list_head members; /* Member processes, guarded with ->lock */ > + struct list_head child_link; /* Link in parent->children */ > + struct list_head children; /* Child containers */ > + wait_queue_head_t waitq; /* Someone waiting for init to exit waits here */ > + unsigned long flags; > +#define CONTAINER_FLAG_INIT_STARTED 0 /* Init is started - certain ops now prohibited */ > +#define CONTAINER_FLAG_DEAD 1 /* Init has died */ > +#define CONTAINER_FLAG_KILL_ON_CLOSE 2 /* Kill init if container handle closed */ > + spinlock_t lock; > + seqcount_t seq; /* Track changes in ->root */ > +}; > + > +extern struct container init_container; > + > +#ifdef CONFIG_CONTAINERS > +extern const struct file_operations containerfs_fops; > + > +extern int copy_container(unsigned long flags, struct task_struct *tsk, > + struct container *container); > +extern void exit_container(struct task_struct *tsk); > +extern void put_container(struct container *c); > + > +static inline struct container *get_container(struct container *c) > +{ > + refcount_inc(&c->usage); > + return c; > +} > + > +static inline bool is_container_file(struct file *file) > +{ > + return file->f_op == &containerfs_fops; > +} > + > +#else > + > +static inline int copy_container(unsigned long flags, struct task_struct *tsk, > + struct container *container) > +{ return 0; } > +static inline void exit_container(struct task_struct *tsk) { } > +static inline void put_container(struct container *c) {} > +static inline struct container *get_container(struct container *c) { return NULL; } > +static inline bool is_container_file(struct file *file) { return false; } > + > +#endif /* CONFIG_CONTAINERS */ > + > +#endif /* _LINUX_CONTAINER_H */ > diff --git a/include/linux/init_task.h b/include/linux/init_task.h > index e049526bc188..488385ad79db 100644 > --- a/include/linux/init_task.h > +++ b/include/linux/init_task.h > @@ -9,6 +9,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -273,6 +274,9 @@ extern struct cred init_cred; > .signal = &init_signals, \ > .sighand = &init_sighand, \ > .nsproxy = &init_nsproxy, \ > + .container = &init_container, \ > + .container_link.next = &init_container.members, \ > + .container_link.prev = &init_container.members, \ > .pending = { \ > .list = LIST_HEAD_INIT(tsk.pending.list), \ > .signal = {{0}}}, \ > diff --git a/include/linux/lsm_hooks.h b/include/linux/lsm_hooks.h > index 7064c0c15386..7b0d484a6a25 100644 > --- a/include/linux/lsm_hooks.h > +++ b/include/linux/lsm_hooks.h > @@ -1368,6 +1368,17 @@ > * @inode we wish to get the security context of. > * @ctx is a pointer in which to place the allocated security context. > * @ctxlen points to the place to put the length of @ctx. > + * > + * Security hooks for containers: > + * > + * @container_alloc: > + * Permit creation of a new container and assign security data. > + * @container: The new container. > + * > + * @container_free: > + * Free security data attached to a container. > + * @container: The container. > + * > * This is the main security structure. > */ > > @@ -1699,6 +1710,12 @@ union security_list_options { > struct audit_context *actx); > void (*audit_rule_free)(void *lsmrule); > #endif /* CONFIG_AUDIT */ > + > + /* Container management security hooks */ > +#ifdef CONFIG_CONTAINERS > + int (*container_alloc)(struct container *container, unsigned int flags); > + void (*container_free)(struct container *container); > +#endif > }; > > struct security_hook_heads { > @@ -1919,6 +1936,10 @@ struct security_hook_heads { > struct list_head audit_rule_match; > struct list_head audit_rule_free; > #endif /* CONFIG_AUDIT */ > +#ifdef CONFIG_CONTAINERS > + struct list_head container_alloc; > + struct list_head container_free; > +#endif /* CONFIG_CONTAINERS */ > }; > > /* > diff --git a/include/linux/sched.h b/include/linux/sched.h > index eba196521562..d9b92a98f99f 100644 > --- a/include/linux/sched.h > +++ b/include/linux/sched.h > @@ -33,6 +33,7 @@ struct backing_dev_info; > struct bio_list; > struct blk_plug; > struct cfs_rq; > +struct container; > struct fs_struct; > struct futex_pi_state; > struct io_context; > @@ -741,6 +742,8 @@ struct task_struct { > > /* Namespaces: */ > struct nsproxy *nsproxy; > + struct container *container; > + struct list_head container_link; > > /* Signal handlers: */ > struct signal_struct *signal; > diff --git a/include/linux/security.h b/include/linux/security.h > index 8c06e158c195..01bdf7637ec6 100644 > --- a/include/linux/security.h > +++ b/include/linux/security.h > @@ -68,6 +68,7 @@ struct ctl_table; > struct audit_krule; > struct user_namespace; > struct timezone; > +struct container; > > /* These functions are in security/commoncap.c */ > extern int cap_capable(const struct cred *cred, struct user_namespace *ns, > @@ -1672,6 +1673,20 @@ static inline void security_audit_rule_free(void *lsmrule) > #endif /* CONFIG_SECURITY */ > #endif /* CONFIG_AUDIT */ > > +#ifdef CONFIG_CONTAINERS > +#ifdef CONFIG_SECURITY > +int security_container_alloc(struct container *container, unsigned int flags); > +void security_container_free(struct container *container); > +#else > +static inline int security_container_alloc(struct container *container, > + unsigned int flags) > +{ > + return 0; > +} > +static inline void security_container_free(struct container *container) {} > +#endif > +#endif /* CONFIG_CONTAINERS */ > + > #ifdef CONFIG_SECURITYFS > > extern struct dentry *securityfs_create_file(const char *name, umode_t mode, > diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h > index 07e4f775f24d..5a0324dd024c 100644 > --- a/include/linux/syscalls.h > +++ b/include/linux/syscalls.h > @@ -908,5 +908,8 @@ asmlinkage long sys_statx(int dfd, const char __user *path, unsigned flags, > asmlinkage long sys_fsopen(const char *fs_name, int containerfd, unsigned int flags); > asmlinkage long sys_fsmount(int fsfd, int dfd, const char *path, unsigned int at_flags, > unsigned int flags); > +asmlinkage long sys_container_create(const char __user *name, unsigned int flags, > + unsigned long spare3, unsigned long spare4, > + unsigned long spare5); > > #endif > diff --git a/include/uapi/linux/container.h b/include/uapi/linux/container.h > new file mode 100644 > index 000000000000..43748099b28d > --- /dev/null > +++ b/include/uapi/linux/container.h > @@ -0,0 +1,28 @@ > +/* Container UAPI > + * > + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. > + * Written by David Howells (dhowells at redhat.com) > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public Licence > + * as published by the Free Software Foundation; either version > + * 2 of the Licence, or (at your option) any later version. > + */ > + > +#ifndef _UAPI_LINUX_CONTAINER_H > +#define _UAPI_LINUX_CONTAINER_H > + > + > +#define CONTAINER_NEW_FS_NS 0x00000001 /* Dup current fs namespace */ > +#define CONTAINER_NEW_EMPTY_FS_NS 0x00000002 /* Provide new empty fs namespace */ > +#define CONTAINER_NEW_CGROUP_NS 0x00000004 /* Dup current cgroup namespace */ > +#define CONTAINER_NEW_UTS_NS 0x00000008 /* Dup current uts namespace */ > +#define CONTAINER_NEW_IPC_NS 0x00000010 /* Dup current ipc namespace */ > +#define CONTAINER_NEW_USER_NS 0x00000020 /* Dup current user namespace */ > +#define CONTAINER_NEW_PID_NS 0x00000040 /* Dup current pid namespace */ > +#define CONTAINER_NEW_NET_NS 0x00000080 /* Dup current net namespace */ > +#define CONTAINER_KILL_ON_CLOSE 0x00000100 /* Kill all member processes when fd closed */ > +#define CONTAINER_FD_CLOEXEC 0x00000200 /* Close the fd on exec */ > +#define CONTAINER__FLAG_MASK 0x000003ff > + > +#endif /* _UAPI_LINUX_CONTAINER_H */ > diff --git a/include/uapi/linux/magic.h b/include/uapi/linux/magic.h > index 88ae83492f7c..758705412b44 100644 > --- a/include/uapi/linux/magic.h > +++ b/include/uapi/linux/magic.h > @@ -85,5 +85,6 @@ > #define BALLOON_KVM_MAGIC 0x13661366 > #define ZSMALLOC_MAGIC 0x58295829 > #define FS_FS_MAGIC 0x66736673 > +#define CONTAINERFS_MAGIC 0x636f6e74 > > #endif /* __LINUX_MAGIC_H__ */ > diff --git a/init/Kconfig b/init/Kconfig > index 1d3475fc9496..3a0ee88df6c8 100644 > --- a/init/Kconfig > +++ b/init/Kconfig > @@ -1288,6 +1288,13 @@ config NET_NS > Allow user space to create what appear to be multiple instances > of the network stack. > > +config CONTAINERS > + bool "Container support" > + default y > + help > + Allow userspace to create and manipulate containers as objects that > + have namespaces and hold a set of processes. > + > endif # NAMESPACES > > config SCHED_AUTOGROUP > diff --git a/kernel/Makefile b/kernel/Makefile > index 72aa080f91f0..117479b05fb1 100644 > --- a/kernel/Makefile > +++ b/kernel/Makefile > @@ -7,7 +7,7 @@ obj-y = fork.o exec_domain.o panic.o \ > sysctl.o sysctl_binary.o capability.o ptrace.o user.o \ > signal.o sys.o kmod.o workqueue.o pid.o task_work.o \ > extable.o params.o \ > - kthread.o sys_ni.o nsproxy.o \ > + kthread.o sys_ni.o nsproxy.o container.o \ > notifier.o ksysfs.o cred.o reboot.o \ > async.o range.o smpboot.o ucount.o > > diff --git a/kernel/container.c b/kernel/container.c > new file mode 100644 > index 000000000000..eef1566835eb > --- /dev/null > +++ b/kernel/container.c > @@ -0,0 +1,462 @@ > +/* Implement container objects. > + * > + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. > + * Written by David Howells (dhowells at redhat.com) > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public Licence > + * as published by the Free Software Foundation; either version > + * 2 of the Licence, or (at your option) any later version. > + */ > + > +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include "namespaces.h" > + > +struct container init_container = { > + .name = ".init", > + .usage = REFCOUNT_INIT(2), > + .cred = &init_cred, > + .ns = &init_nsproxy, > + .init = &init_task, > + .members.next = &init_task.container_link, > + .members.prev = &init_task.container_link, > + .children = LIST_HEAD_INIT(init_container.children), > + .flags = (1 << CONTAINER_FLAG_INIT_STARTED), > + .lock = __SPIN_LOCK_UNLOCKED(init_container.lock), > + .seq = SEQCNT_ZERO(init_fs.seq), > +}; > + > +#ifdef CONFIG_CONTAINERS > + > +static struct vfsmount *containerfs_mnt __read_mostly; > + > +/* > + * Drop a ref on a container and clear it if no longer in use. > + */ > +void put_container(struct container *c) > +{ > + struct container *parent; > + > + while (c && refcount_dec_and_test(&c->usage)) { > + BUG_ON(!list_empty(&c->members)); > + if (c->ns) > + put_nsproxy(c->ns); > + path_put(&c->root); > + > + parent = c->parent; > + if (parent) { > + spin_lock(&parent->lock); > + list_del(&c->child_link); > + spin_unlock(&parent->lock); > + } > + > + if (c->cred) > + put_cred(c->cred); > + security_container_free(c); > + kfree(c); > + c = parent; > + } > +} > + > +/* > + * Allow the user to poll for the container dying. > + */ > +static unsigned int containerfs_poll(struct file *file, poll_table *wait) > +{ > + struct container *container = file->private_data; > + unsigned int mask = 0; > + > + poll_wait(file, &container->waitq, wait); > + > + if (test_bit(CONTAINER_FLAG_DEAD, &container->flags)) > + mask |= POLLHUP; > + > + return mask; > +} > + > +static int containerfs_release(struct inode *inode, struct file *file) > +{ > + struct container *container = file->private_data; > + > + put_container(container); > + return 0; > +} > + > +const struct file_operations containerfs_fops = { > + .poll = containerfs_poll, > + .release = containerfs_release, > +}; > + > +/* > + * Indicate the name we want to display the container file as. > + */ > +static char *containerfs_dname(struct dentry *dentry, char *buffer, int buflen) > +{ > + return dynamic_dname(dentry, buffer, buflen, "container:[%lu]", > + d_inode(dentry)->i_ino); > +} > + > +static const struct dentry_operations containerfs_dentry_operations = { > + .d_dname = containerfs_dname, > +}; > + > +/* > + * Allocate a container. > + */ > +static struct container *alloc_container(const char __user *name) > +{ > + struct container *c; > + long len; > + int ret; > + > + c = kzalloc(sizeof(struct container), GFP_KERNEL); > + if (!c) > + return ERR_PTR(-ENOMEM); > + > + INIT_LIST_HEAD(&c->members); > + INIT_LIST_HEAD(&c->children); > + init_waitqueue_head(&c->waitq); > + spin_lock_init(&c->lock); > + refcount_set(&c->usage, 1); > + > + ret = -EFAULT; > + len = strncpy_from_user(c->name, name, sizeof(c->name)); > + if (len < 0) > + goto err; > + ret = -ENAMETOOLONG; > + if (len >= sizeof(c->name)) > + goto err; > + ret = -EINVAL; > + if (strchr(c->name, '/')) > + goto err; > + > + c->name[len] = 0; > + return c; > + > +err: > + kfree(c); > + return ERR_PTR(ret); > +} > + > +/* > + * Create a supervisory file for a new container > + */ > +static struct file *create_container_file(struct container *c) > +{ > + struct inode *inode; > + struct file *f; > + struct path path; > + int ret; > + > + inode = alloc_anon_inode(containerfs_mnt->mnt_sb); > + if (!inode) > + return ERR_PTR(-ENFILE); > + inode->i_fop = &containerfs_fops; > + > + ret = -ENOMEM; > + path.dentry = d_alloc_pseudo(containerfs_mnt->mnt_sb, &empty_name); > + if (!path.dentry) > + goto err_inode; > + path.mnt = mntget(containerfs_mnt); > + > + d_instantiate(path.dentry, inode); > + > + f = alloc_file(&path, 0, &containerfs_fops); > + if (IS_ERR(f)) { > + ret = PTR_ERR(f); > + goto err_file; > + } > + > + f->private_data = c; > + return f; > + > +err_file: > + path_put(&path); > + return ERR_PTR(ret); > + > +err_inode: > + iput(inode); > + return ERR_PTR(ret); > +} > + > +static const struct super_operations containerfs_ops = { > + .drop_inode = generic_delete_inode, > + .destroy_inode = free_inode_nonrcu, > + .statfs = simple_statfs, > +}; > + > +/* > + * containerfs should _never_ be mounted by userland - too much of security > + * hassle, no real gain from having the whole whorehouse mounted. So we don't > + * need any operations on the root directory. However, we need a non-trivial > + * d_name - container: will go nicely and kill the special-casing in procfs. > + */ > +static struct dentry *containerfs_mount(struct file_system_type *fs_type, > + int flags, const char *dev_name, > + void *data) > +{ > + return mount_pseudo(fs_type, "container:", &containerfs_ops, > + &containerfs_dentry_operations, CONTAINERFS_MAGIC); > +} > + > +static struct file_system_type container_fs_type = { > + .name = "containerfs", > + .mount = containerfs_mount, > + .kill_sb = kill_anon_super, > +}; > + > +static int __init init_container_fs(void) > +{ > + int ret; > + > + ret = register_filesystem(&container_fs_type); > + if (ret < 0) > + panic("Cannot register containerfs\n"); > + > + containerfs_mnt = kern_mount(&container_fs_type); > + if (IS_ERR(containerfs_mnt)) > + panic("Cannot mount containerfs: %ld\n", > + PTR_ERR(containerfs_mnt)); > + > + return 0; > +} > + > +fs_initcall(init_container_fs); > + > +/* > + * Handle fork/clone. > + * > + * A process inherits its parent's container. The first process into the > + * container is its 'init' process and the life of everything else in there is > + * dependent upon that. > + */ > +int copy_container(unsigned long flags, struct task_struct *tsk, > + struct container *container) > +{ > + struct container *c = container ?: tsk->container; > + int ret = -ECANCELED; > + > + spin_lock(&c->lock); > + > + if (!test_bit(CONTAINER_FLAG_DEAD, &c->flags)) { > + list_add_tail(&tsk->container_link, &c->members); > + get_container(c); > + tsk->container = c; > + if (!c->init) { > + set_bit(CONTAINER_FLAG_INIT_STARTED, &c->flags); > + c->init = tsk; > + } > + ret = 0; > + } > + > + spin_unlock(&c->lock); > + return ret; > +} > + > +/* > + * Remove a dead process from a container. > + * > + * If the 'init' process in a container dies, we kill off all the other > + * processes in the container. > + */ > +void exit_container(struct task_struct *tsk) > +{ > + struct task_struct *p; > + struct container *c = tsk->container; > + struct siginfo si = { > + .si_signo = SIGKILL, > + .si_code = SI_KERNEL, > + }; > + > + spin_lock(&c->lock); > + > + list_del(&tsk->container_link); > + > + if (c->init == tsk) { > + c->init = NULL; > + c->exit_code = tsk->exit_code; > + smp_wmb(); /* Order exit_code vs CONTAINER_DEAD. */ > + set_bit(CONTAINER_FLAG_DEAD, &c->flags); > + wake_up_bit(&c->flags, CONTAINER_FLAG_DEAD); > + > + list_for_each_entry(p, &c->members, container_link) { > + si.si_pid = task_tgid_vnr(p); > + send_sig_info(SIGKILL, &si, p); > + } > + } > + > + spin_unlock(&c->lock); > + put_container(c); > +} > + > +/* > + * Create some creds for the container. We don't want to pin things we don't > + * have to, so drop all keyrings from the new cred. The LSM gets to audit the > + * cred struct when security_container_alloc() is invoked. > + */ > +static const struct cred *create_container_creds(unsigned int flags) > +{ > + struct cred *new; > + int ret; > + > + new = prepare_creds(); > + if (!new) > + return ERR_PTR(-ENOMEM); > + > +#ifdef CONFIG_KEYS > + key_put(new->thread_keyring); > + new->thread_keyring = NULL; > + key_put(new->process_keyring); > + new->process_keyring = NULL; > + key_put(new->session_keyring); > + new->session_keyring = NULL; > + key_put(new->request_key_auth); > + new->request_key_auth = NULL; > +#endif > + > + if (flags & CONTAINER_NEW_USER_NS) { > + ret = create_user_ns(new); > + if (ret < 0) > + goto err; > + new->euid = new->user_ns->owner; > + new->egid = new->user_ns->group; > + } > + > + new->fsuid = new->suid = new->uid = new->euid; > + new->fsgid = new->sgid = new->gid = new->egid; > + return new; > + > +err: > + abort_creds(new); > + return ERR_PTR(ret); > +} > + > +/* > + * Create a new container. > + */ > +static struct container *create_container(const char *name, unsigned int flags) > +{ > + struct container *parent, *c; > + struct fs_struct *fs; > + struct nsproxy *ns; > + const struct cred *cred; > + int ret; > + > + c = alloc_container(name); > + if (IS_ERR(c)) > + return c; > + > + if (flags & CONTAINER_KILL_ON_CLOSE) > + __set_bit(CONTAINER_FLAG_KILL_ON_CLOSE, &c->flags); > + > + cred = create_container_creds(flags); > + if (IS_ERR(cred)) { > + ret = PTR_ERR(cred); > + goto err_cont; > + } > + c->cred = cred; > + > + ret = -ENOMEM; > + fs = copy_fs_struct(current->fs); > + if (!fs) > + goto err_cont; > + > + ns = create_new_namespaces( > + (flags & CONTAINER_NEW_FS_NS ? CLONE_NEWNS : 0) | > + (flags & CONTAINER_NEW_CGROUP_NS ? CLONE_NEWCGROUP : 0) | > + (flags & CONTAINER_NEW_UTS_NS ? CLONE_NEWUTS : 0) | > + (flags & CONTAINER_NEW_IPC_NS ? CLONE_NEWIPC : 0) | > + (flags & CONTAINER_NEW_PID_NS ? CLONE_NEWPID : 0) | > + (flags & CONTAINER_NEW_NET_NS ? CLONE_NEWNET : 0), > + current->nsproxy, cred->user_ns, fs); > + if (IS_ERR(ns)) { > + ret = PTR_ERR(ns); > + goto err_fs; > + } > + > + c->ns = ns; > + c->root = fs->root; > + c->seq = fs->seq; > + fs->root.mnt = NULL; > + fs->root.dentry = NULL; > + > + ret = security_container_alloc(c, flags); > + if (ret < 0) > + goto err_fs; > + > + parent = current->container; > + get_container(parent); > + c->parent = parent; > + spin_lock(&parent->lock); > + list_add_tail(&c->child_link, &parent->children); > + spin_unlock(&parent->lock); > + return c; > + > +err_fs: > + free_fs_struct(fs); > +err_cont: > + put_container(c); > + return ERR_PTR(ret); > +} > + > +/* > + * Create a new container object. > + */ > +SYSCALL_DEFINE5(container_create, > + const char __user *, name, > + unsigned int, flags, > + unsigned long, spare3, > + unsigned long, spare4, > + unsigned long, spare5) > +{ > + struct container *c; > + struct file *f; > + int ret, fd; > + > + if (!name || > + flags & ~CONTAINER__FLAG_MASK || > + spare3 != 0 || spare4 != 0 || spare5 != 0) > + return -EINVAL; > + if ((flags & (CONTAINER_NEW_FS_NS | CONTAINER_NEW_EMPTY_FS_NS)) == > + (CONTAINER_NEW_FS_NS | CONTAINER_NEW_EMPTY_FS_NS)) > + return -EINVAL; > + > + c = create_container(name, flags); > + if (IS_ERR(c)) > + return PTR_ERR(c); > + > + f = create_container_file(c); > + if (IS_ERR(f)) { > + ret = PTR_ERR(f); > + goto err_cont; > + } > + > + ret = get_unused_fd_flags(flags & CONTAINER_FD_CLOEXEC ? O_CLOEXEC : 0); > + if (ret < 0) > + goto err_file; > + > + fd = ret; > + fd_install(fd, f); > + return fd; > + > +err_file: > + fput(f); > + return ret; > +err_cont: > + put_container(c); > + return ret; > +} > + > +#endif /* CONFIG_CONTAINERS */ > diff --git a/kernel/exit.c b/kernel/exit.c > index 31b8617aee04..1ff87f7e40a2 100644 > --- a/kernel/exit.c > +++ b/kernel/exit.c > @@ -875,6 +875,7 @@ void __noreturn do_exit(long code) > if (group_dead) > disassociate_ctty(1); > exit_task_namespaces(tsk); > + exit_container(tsk); > exit_task_work(tsk); > exit_thread(tsk); > > diff --git a/kernel/fork.c b/kernel/fork.c > index aec6672d3f0e..ff2779426fe9 100644 > --- a/kernel/fork.c > +++ b/kernel/fork.c > @@ -1728,9 +1728,12 @@ static __latent_entropy struct task_struct *copy_process( > retval = copy_namespaces(clone_flags, p); > if (retval) > goto bad_fork_cleanup_mm; > - retval = copy_io(clone_flags, p); > + retval = copy_container(clone_flags, p, NULL); > if (retval) > goto bad_fork_cleanup_namespaces; > + retval = copy_io(clone_flags, p); > + if (retval) > + goto bad_fork_cleanup_container; > retval = copy_thread_tls(clone_flags, stack_start, stack_size, p, tls); > if (retval) > goto bad_fork_cleanup_io; > @@ -1918,6 +1921,8 @@ static __latent_entropy struct task_struct *copy_process( > bad_fork_cleanup_io: > if (p->io_context) > exit_io_context(p); > +bad_fork_cleanup_container: > + exit_container(p); > bad_fork_cleanup_namespaces: > exit_task_namespaces(p); > bad_fork_cleanup_mm: > diff --git a/kernel/namespaces.h b/kernel/namespaces.h > new file mode 100644 > index 000000000000..c44e3cf0e254 > --- /dev/null > +++ b/kernel/namespaces.h > @@ -0,0 +1,15 @@ > +/* Local namespaces defs > + * > + * Copyright (C) 2017 Red Hat, Inc. All Rights Reserved. > + * Written by David Howells (dhowells at redhat.com) > + * > + * This program is free software; you can redistribute it and/or > + * modify it under the terms of the GNU General Public Licence > + * as published by the Free Software Foundation; either version > + * 2 of the Licence, or (at your option) any later version. > + */ > + > +extern struct nsproxy *create_new_namespaces(unsigned long flags, > + struct nsproxy *nsproxy, > + struct user_namespace *user_ns, > + struct fs_struct *new_fs); > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c > index f6c5d330059a..4bb5184b3a80 100644 > --- a/kernel/nsproxy.c > +++ b/kernel/nsproxy.c > @@ -27,6 +27,7 @@ > #include > #include > #include > +#include "namespaces.h" > > static struct kmem_cache *nsproxy_cachep; > > @@ -61,8 +62,8 @@ static inline struct nsproxy *create_nsproxy(void) > * Return the newly created nsproxy. Do not attach this to the task, > * leave it to the caller to do proper locking and attach it to task. > */ > -static struct nsproxy *create_new_namespaces(unsigned long flags, > - struct task_struct *tsk, struct user_namespace *user_ns, > +struct nsproxy *create_new_namespaces(unsigned long flags, > + struct nsproxy *nsproxy, struct user_namespace *user_ns, > struct fs_struct *new_fs) > { > struct nsproxy *new_nsp; > @@ -72,39 +73,39 @@ static struct nsproxy *create_new_namespaces(unsigned long flags, > if (!new_nsp) > return ERR_PTR(-ENOMEM); > > - new_nsp->mnt_ns = copy_mnt_ns(flags, tsk->nsproxy->mnt_ns, user_ns, new_fs); > + new_nsp->mnt_ns = copy_mnt_ns(flags, nsproxy->mnt_ns, user_ns, new_fs); > if (IS_ERR(new_nsp->mnt_ns)) { > err = PTR_ERR(new_nsp->mnt_ns); > goto out_ns; > } > > - new_nsp->uts_ns = copy_utsname(flags, user_ns, tsk->nsproxy->uts_ns); > + new_nsp->uts_ns = copy_utsname(flags, user_ns, nsproxy->uts_ns); > if (IS_ERR(new_nsp->uts_ns)) { > err = PTR_ERR(new_nsp->uts_ns); > goto out_uts; > } > > - new_nsp->ipc_ns = copy_ipcs(flags, user_ns, tsk->nsproxy->ipc_ns); > + new_nsp->ipc_ns = copy_ipcs(flags, user_ns, nsproxy->ipc_ns); > if (IS_ERR(new_nsp->ipc_ns)) { > err = PTR_ERR(new_nsp->ipc_ns); > goto out_ipc; > } > > new_nsp->pid_ns_for_children = > - copy_pid_ns(flags, user_ns, tsk->nsproxy->pid_ns_for_children); > + copy_pid_ns(flags, user_ns, nsproxy->pid_ns_for_children); > if (IS_ERR(new_nsp->pid_ns_for_children)) { > err = PTR_ERR(new_nsp->pid_ns_for_children); > goto out_pid; > } > > new_nsp->cgroup_ns = copy_cgroup_ns(flags, user_ns, > - tsk->nsproxy->cgroup_ns); > + nsproxy->cgroup_ns); > if (IS_ERR(new_nsp->cgroup_ns)) { > err = PTR_ERR(new_nsp->cgroup_ns); > goto out_cgroup; > } > > - new_nsp->net_ns = copy_net_ns(flags, user_ns, tsk->nsproxy->net_ns); > + new_nsp->net_ns = copy_net_ns(flags, user_ns, nsproxy->net_ns); > if (IS_ERR(new_nsp->net_ns)) { > err = PTR_ERR(new_nsp->net_ns); > goto out_net; > @@ -162,7 +163,7 @@ int copy_namespaces(unsigned long flags, struct task_struct *tsk) > (CLONE_NEWIPC | CLONE_SYSVSEM)) > return -EINVAL; > > - new_ns = create_new_namespaces(flags, tsk, user_ns, tsk->fs); > + new_ns = create_new_namespaces(flags, tsk->nsproxy, user_ns, tsk->fs); > if (IS_ERR(new_ns)) > return PTR_ERR(new_ns); > > @@ -203,7 +204,7 @@ int unshare_nsproxy_namespaces(unsigned long unshare_flags, > if (!ns_capable(user_ns, CAP_SYS_ADMIN)) > return -EPERM; > > - *new_nsp = create_new_namespaces(unshare_flags, current, user_ns, > + *new_nsp = create_new_namespaces(unshare_flags, current->nsproxy, user_ns, > new_fs ? new_fs : current->fs); > if (IS_ERR(*new_nsp)) { > err = PTR_ERR(*new_nsp); > @@ -251,7 +252,7 @@ SYSCALL_DEFINE2(setns, int, fd, int, nstype) > if (nstype && (ns->ops->type != nstype)) > goto out; > > - new_nsproxy = create_new_namespaces(0, tsk, current_user_ns(), tsk->fs); > + new_nsproxy = create_new_namespaces(0, tsk->nsproxy, current_user_ns(), tsk->fs); > if (IS_ERR(new_nsproxy)) { > err = PTR_ERR(new_nsproxy); > goto out; > diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c > index a0fe764bd5dd..99b1e1f58d05 100644 > --- a/kernel/sys_ni.c > +++ b/kernel/sys_ni.c > @@ -262,3 +262,7 @@ cond_syscall(sys_pkey_free); > /* fd-based mount */ > cond_syscall(sys_fsopen); > cond_syscall(sys_fsmount); > + > +/* Containers */ > +cond_syscall(sys_container_create); > + > diff --git a/security/security.c b/security/security.c > index f4136ca5cb1b..b5c5b5ae1266 100644 > --- a/security/security.c > +++ b/security/security.c > @@ -1668,3 +1668,16 @@ int security_audit_rule_match(u32 secid, u32 field, u32 op, void *lsmrule, > actx); > } > #endif /* CONFIG_AUDIT */ > + > +#ifdef CONFIG_CONTAINERS > + > +int security_container_alloc(struct container *container, unsigned int flags) > +{ > + return call_int_hook(container_alloc, 0, container, flags); > +} > + > +void security_container_free(struct container *container) > +{ > + call_void_hook(container_free, container); > +} > +#endif /* CONFIG_CONTAINERS */ - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 From mtk.manpages at gmail.com Tue Aug 15 19:27:33 2017 From: mtk.manpages at gmail.com (Michael Kerrisk (man-pages)) Date: Tue, 15 Aug 2017 21:27:33 +0200 Subject: [PATCH] ioctl_tty.2: add TIOCGPTPEER documentation In-Reply-To: <20170609170147.32311-1-asarai@suse.de> References: <20170609170147.32311-1-asarai@suse.de> Message-ID: <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> On 06/09/2017 07:01 PM, Aleksa Sarai wrote: > The feature this patch references has currently only been accepted into > tty-testing, but Greg told me to kick this down to man-pages. As a > result, I can't reference upstream commit id's because the code isn't in > Linus' tree yet -- should I resend this once it lands in tty-next or > Linus' tree? > > Also obviously the release version is a bit of a lie. Hello Aleksa, I've applied this patch, and then tweaked the wording a little. Could you please check the following text: TIOCGPTPEER int flags (since Linux 4.13) Given a file descriptor in fd that refers to a pseudoterminal master, open (with the given open(2)-style flags) and return a new file descriptor that refers to the peer pseudoterminal slave device. This oper? ation can be performed regardless of whether the pathname of the slave device is accessible through the calling process's mount namespaces. Security-conscious programs interacting with namespaces may wish to use this operation rather than open(2) with the pathname returned by ptsname(3), and similar library func? tions that have insecure APIs. I also have a question on the last sentence: what are the "similar library functions that have insecure APIs"? It's not clear to me what you are referring to here. Cheers, Michael > > 8<----------------------------------------------------------------------- > > This is an ioctl(2) recently added by myself, to allow for container > runtimes and other programs that interact with (potentially hostile) > Linux namespaces to safely create {master,slave} pseudoterminal pairs > without needing to open potentially unsafe /dev/pts/... filenames that > may be malicious mountpoints or similar in an untrusted namespace > (avoiding the endless issues with ptsname(3) and similar approaches). > > Cc: > Signed-off-by: Aleksa Sarai > --- > man2/ioctl_tty.2 | 15 +++++++++++++++ > 1 file changed, 15 insertions(+) > > diff --git a/man2/ioctl_tty.2 b/man2/ioctl_tty.2 > index d280beacf..61e147d99 100644 > --- a/man2/ioctl_tty.2 > +++ b/man2/ioctl_tty.2 > @@ -380,6 +380,21 @@ Place the current lock state of the pseudoterminal slave device > in the location pointed to by > .IR argp > (since Linux 3.8). > +.TP > +.BI "TIOCGPTPEER int " flags > +Opens and returns a new file handle to the pseudoterminal slave > +device with the given > +.BR open (2)-style > +.IR flags , > +regardless of whether the path is accessible through the calling process's > +mount namespaces. > + > +Security-conscious programs interacting with namespaces may wish to use this > +over > +.BR open (2) > +with the path provided by > +.BR ptsname (3), > +and similar library methods that have insecure APIs (since Linux 4.13). > .PP > The BSD ioctls > .BR TIOCSTOP , > -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Linux/UNIX System Programming Training: http://man7.org/training/ From asarai at suse.de Wed Aug 16 04:43:29 2017 From: asarai at suse.de (Aleksa Sarai) Date: Wed, 16 Aug 2017 14:43:29 +1000 Subject: [PATCH] ioctl_tty.2: add TIOCGPTPEER documentation In-Reply-To: <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> References: <20170609170147.32311-1-asarai@suse.de> <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> Message-ID: > I've applied this patch, and then tweaked the wording a little. Could > you please check the following text: > > TIOCGPTPEER int flags > (since Linux 4.13) Given a file descriptor in fd that > refers to a pseudoterminal master, open (with the given > open(2)-style flags) and return a new file descriptor that > refers to the peer pseudoterminal slave device. This oper? > ation can be performed regardless of whether the pathname > of the slave device is accessible through the calling > process's mount namespaces. > > Security-conscious programs interacting with namespaces may > wish to use this operation rather than open(2) with the > pathname returned by ptsname(3), and similar library func? > tions that have insecure APIs. Yup, that sounds good. > I also have a question on the last sentence: what are the "similar library > functions that have insecure APIs"? It's not clear to me what you are > referring to here. There are a few posix_-style functions provided by glibc that are just wrappers around the open+ptsname combo that I mention earlier in the sentence (and thus are vulnerable to the same issue). But if you feel it's confusing you can feel free to drop it. Thanks. -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ From ebiederm at xmission.com Wed Aug 16 16:43:39 2017 From: ebiederm at xmission.com (Eric W. Biederman) Date: Wed, 16 Aug 2017 11:43:39 -0500 Subject: [PATCH] ioctl_tty.2: add TIOCGPTPEER documentation In-Reply-To: <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> (Michael Kerrisk's message of "Tue, 15 Aug 2017 21:27:33 +0200") References: <20170609170147.32311-1-asarai@suse.de> <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> Message-ID: <878tijwjic.fsf@xmission.com> "Michael Kerrisk (man-pages)" writes: > On 06/09/2017 07:01 PM, Aleksa Sarai wrote: >> The feature this patch references has currently only been accepted into >> tty-testing, but Greg told me to kick this down to man-pages. As a >> result, I can't reference upstream commit id's because the code isn't in >> Linus' tree yet -- should I resend this once it lands in tty-next or >> Linus' tree? >> >> Also obviously the release version is a bit of a lie. > > Hello Aleksa, > > I've applied this patch, and then tweaked the wording a little. Could > you please check the following text: > > TIOCGPTPEER int flags > (since Linux 4.13) Given a file descriptor in fd that > refers to a pseudoterminal master, open (with the given > open(2)-style flags) and return a new file descriptor that > refers to the peer pseudoterminal slave device. This oper? > ation can be performed regardless of whether the pathname > of the slave device is accessible through the calling > process's mount namespaces. > > Security-conscious programs interacting with namespaces may > wish to use this operation rather than open(2) with the > pathname returned by ptsname(3), and similar library func? > tions that have insecure APIs. > > I also have a question on the last sentence: what are the "similar library > functions that have insecure APIs"? It's not clear to me what you are > referring to here. A couple of things to note on the bigger picture. The glibc library on all distributions has been changed to not have a setuid binary pt_chown, that uses ptsname. This was the primary fix for the security issue. The behavior of opening /dev/ptmx has been changed to perform a path lookup relative to the location of /dev/ptmx of ./pts/ptmx and open it it is a devpts filesystem and to fail otherwise. This further makes it hard to confuse userspace this way as /dev/ptmx always corresponds to /dev/pts/ptmx. Even in chroots and in other mount namespaces. Both of these changes largely makes glibc's use of these features secure. /dev/ptmx always corresponds to /dev/pts and there no readily available suid root applications too fool. That makes TIOCGPTPEER a very nice addition, but not something people have to scramble to use to ensure their system is secure. As a hostile environment now has to work very hard to confuse the existing mechanisms. >> This is an ioctl(2) recently added by myself, to allow for container >> runtimes and other programs that interact with (potentially hostile) >> Linux namespaces to safely create {master,slave} pseudoterminal pairs >> without needing to open potentially unsafe /dev/pts/... filenames that >> may be malicious mountpoints or similar in an untrusted namespace >> (avoiding the endless issues with ptsname(3) and similar approaches). >> >> Cc: >> Signed-off-by: Aleksa Sarai Eric From asarai at suse.de Wed Aug 16 16:54:03 2017 From: asarai at suse.de (Aleksa Sarai) Date: Thu, 17 Aug 2017 02:54:03 +1000 Subject: [PATCH] ioctl_tty.2: add TIOCGPTPEER documentation In-Reply-To: <878tijwjic.fsf@xmission.com> References: <20170609170147.32311-1-asarai@suse.de> <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> <878tijwjic.fsf@xmission.com> Message-ID: > A couple of things to note on the bigger picture. > > The glibc library on all distributions has been changed to not have a > setuid binary pt_chown, that uses ptsname. This was the primary fix > for the security issue. > > The behavior of opening /dev/ptmx has been changed to perform a path > lookup relative to the location of /dev/ptmx of ./pts/ptmx and open > it it is a devpts filesystem and to fail otherwise. This further > makes it hard to confuse userspace this way as /dev/ptmx always > corresponds to /dev/pts/ptmx. Even in chroots and in other mount > namespaces. I have a feeling that there might be a way to trick glibc if you use FUSE, but I haven't actually tried to create a PoC for it. Fair point though. > That makes TIOCGPTPEER a very nice addition, but not something people > have to scramble to use to ensure their system is secure. As a hostile > environment now has to work very hard to confuse the existing mechanisms. There are usecases where you simply need TIOCGPTPEER, and no other userspace alternative will do, but maybe if we modified the paragraph to read (as suggested): Security-conscious programs interacting with namespaces may wish to use this operation rather than open(2) with the pathname returned by ptsname(3). This would clarify that there are usecases where you need this particular feature, without saying causing people to panic over inaccurate claims of glibc being broken. Does that sound better? -- Aleksa Sarai Software Engineer (Containers) SUSE Linux GmbH https://www.cyphar.com/ From ebiederm at xmission.com Wed Aug 16 17:14:37 2017 From: ebiederm at xmission.com (Eric W. Biederman) Date: Wed, 16 Aug 2017 12:14:37 -0500 Subject: [PATCH] ioctl_tty.2: add TIOCGPTPEER documentation In-Reply-To: (Aleksa Sarai's message of "Thu, 17 Aug 2017 02:54:03 +1000") References: <20170609170147.32311-1-asarai@suse.de> <11706e49-8271-ed8c-3747-19b3e8f2850d@gmail.com> <878tijwjic.fsf@xmission.com> Message-ID: <87ziaztoxu.fsf@xmission.com> Aleksa Sarai writes: >> A couple of things to note on the bigger picture. >> >> The glibc library on all distributions has been changed to not have a >> setuid binary pt_chown, that uses ptsname. This was the primary fix >> for the security issue. >> >> The behavior of opening /dev/ptmx has been changed to perform a path >> lookup relative to the location of /dev/ptmx of ./pts/ptmx and open >> it it is a devpts filesystem and to fail otherwise. This further >> makes it hard to confuse userspace this way as /dev/ptmx always >> corresponds to /dev/pts/ptmx. Even in chroots and in other mount >> namespaces. > > I have a feeling that there might be a way to trick glibc if you use > FUSE, but I haven't actually tried to create a PoC for it. Fair point > though. To trick glibc fuse would have to be mounted somewhere on /dev. >> That makes TIOCGPTPEER a very nice addition, but not something people >> have to scramble to use to ensure their system is secure. As a hostile >> environment now has to work very hard to confuse the existing mechanisms. > > There are usecases where you simply need TIOCGPTPEER, and no other > userspace alternative will do, but maybe if we modified the paragraph > to read (as suggested): > > Security-conscious programs interacting with namespaces may > wish to use this operation rather than open(2) with the > pathname returned by ptsname(3). > > This would clarify that there are usecases where you need this > particular feature, without saying causing people to panic over > inaccurate claims of glibc being broken. Does that sound better? I think your original words sounded fine. I would even go for new programs may want to use the new ioctl as it fundamentally less racy and more of what is actually trying to be implemented with the userspace pieces. I just wanted to point out that TIOCGPTPEER while being the interface that it would have been nice had we had since the beginning (and would have avoided all of the problems) is actually not something we need to scramble and use it is just a very nice to have. As the immediate issues have been fixed in other ways. It was not clear to me from the other discussions if you and Michael Kerrisk were aware of the mitigations that had been made to address the security issue. The change to the behavior of /dev/ptmx may need to be documented somewhere. I am not certain if anything has been documented since devpts has started allowing multiple mounts. Eric From paul at paul-moore.com Wed Aug 16 22:21:49 2017 From: paul at paul-moore.com (Paul Moore) Date: Wed, 16 Aug 2017 18:21:49 -0400 Subject: [PATCH 2/9] Implement containers as kernel objects In-Reply-To: <20170814054711.GB29957@madcap2.tricolour.ca> References: <149547014649.10599.12025037906646164347.stgit@warthog.procyon.org.uk> <149547016213.10599.1969443294414531853.stgit@warthog.procyon.org.uk> <20170814054711.GB29957@madcap2.tricolour.ca> Message-ID: On Mon, Aug 14, 2017 at 1:47 AM, Richard Guy Briggs wrote: > Hi David, > > I wanted to respond to this thread to attempt some constructive feedback, > better late than never. I had a look at your fsopen/fsmount() patchset(s) to > support this patchset which was interesting, but doesn't directly affect my > work. The primary patch of interest to the audit kernel folks (Paul Moore and > me) is this patch while the rest of the patchset is interesting, but not likely > to directly affect us. This patch has most of what we need to solve our > problem. > > Paul and I agree that audit is going to have a difficult time identifying > containers or even namespaces without some change to the kernel. The audit > subsystem in the kernel needs at least a basic clue about which container > caused an event to be able to report this at the appropriate level and ignore > it at other levels to avoid a DoS. While there is some increased risk of "death by audit", this is really only an issue once we start supporting multiple audit daemons; simply associating auditable events with the container that triggered them shouldn't add any additional overhead (I hope). For a number of use cases, a single auditd running outside the containers, but recording all their events with some type of container attribution will be sufficient. This is step #1. However, we will obviously want to go a bit further and support multiple audit daemons on the system to allow containers to record/process their own events (side note: the non-container auditd instance will still see all the events). There are a number of ways we could tackle this, both via in-kernel and in-userspace record routing, each with their own pros/cons. However, how this works is going to be dependent on how we identify containers and track their audit events: the bits from step #1. For this reason I'm not really interested in worrying about the multiple auditd problem just yet; it's obviously important, and something to keep in mind while working up a solution, but it isn't something we should focus on right now. > We also agree that there will need to be some sort of trigger from userspace to > indicate the creation of a container and its allocated resources and we're not > really picky how that is done, such as a clone flag, a syscall or a sysfs write > (or even a read, I suppose), but there will need to be some permission > restrictions, obviously. (I'd like to see capabilities used for this by adding > a specific container bit to the capabilities bitmask.) To be clear, from an audit perspective I think the only thing we would really care about controlling access to is the creation and assignment of a new audit container ID/token, not necessarily the container itself. It's a small point, but an important one I think. > I doubt we will be able to accomodate all definitions or concepts of a > container in a timely fashion. We'll need to start somewhere with a minimum > definition so that we can get traction and actually move forward before another > compelling shared kernel microservice method leaves our entire community > behind. I'd like to declare that a container is a full set of cloned > namespaces, but this is inefficient, overly constricting and unnecessary for > our needs. If we could agree on a minimum definition of a container (which may > have only one specific cloned namespace) then we have something on which to > build. I could even see a container being defined by a trigger sent from > userspace about a process (task) from which all its children are considered to > be within that container, subject to further nesting. I really would prefer if we could avoid defining the term "container". Even if we manage to get it right at this particular moment, we will surely be made fools a year or two from now when things change. At the very least lets avoid a rigid definition of container, I'll concede that we will probably need to have some definition simply so we can implement something, I just don't want the design or implementation to depend on a particular definition. This comment is jumping ahead a bit, but from an audit perspective I think we handle this by emitting an audit record whenever a container ID is created which describes it as the kernel sees it; as of now that probably means a list of namespace IDs. Richard mentions this in his email, I just wanted to make it clear that I think we should see this as a flexible mechanism. At the very least we will likely see a few more namespaces before the world moves on from containers. > In the simplest usable model for audit, if a container (definition implies and) > starts a PID namespace, then the container ID could simply be the container's > "init" process PID in the initial PID namespace. This assumes that as soon as > that process vanishes, that entire container and all its children are killed > off (which you've done). There may be some container orchestration systems > that don't use a unique PID namespace per container and that imposing this will > cause them challenges. I don't follow how this would cause challenges if the containers do not use a unique PID namespace; you are suggesting using the PID from in the context of the initial PID namespace, yes? Regardless, I do worry that using a PID could potentially be a bit racy once we start jumping between kernel and userspace (audit configuration, logs, etc.). > If containers have at minimum a unique mount namespace then the root path > dentry inode device and inode number could be used, but there are likely better > identifiers. Again, there may be container orchestrators that don't use a > unique mount namespace per container and that imposing this will cause > challenges. > > I expect there are similar examples for each of the other namespaces. The PID case is a bit unique as each process is going to have a unique PID regardless of namespaces, but even that has some drawbacks as discussed above. As for the other namespaces, I agree that we can't rely on them (see my earlier comments). > If we could pick one namespace type for consensus for which each container has > a unique instance of that namespace, we could use the dev/ino tuple from that > namespace as had originally been suggested by Aristeu Rozanski more than 4 > years ago as part of the set of namespace IDs. I had also attempted to > solve this problem by using the namespace' proc inode, then switched over to > generate a unique kernel serial number for each namespace and then went back to > namespace proc dev/ino once Al Viro implemented nsfs: > v1 https://lkml.org/lkml/2014/4/22/662 > v2 https://lkml.org/lkml/2014/5/9/637 > v3 https://lkml.org/lkml/2014/5/20/287 > v4 https://lkml.org/lkml/2014/8/20/844 > v5 https://lkml.org/lkml/2014/10/6/25 > v6 https://lkml.org/lkml/2015/4/17/48 > v7 https://lkml.org/lkml/2015/5/12/773 > > These patches don't use a container ID, but track all namespaces in use for an > event. This has the benefit of punting this tracking to userspace for some > other tool to analyse and determine to which container an event belongs. > This will use a lot of bandwidth in audit log files when a single > container ID that doesn't require nesting information to be complete > would be a much more efficient use of audit log bandwidth. Relying on a particular namespace to identify a containers is a non-starter from my perspective for all the reasons previously discussed. > If we rely only on the setting of arbitrary container names from userspace, > then we must provide a map or tree back to the initial audit domain for that > running kernel to be able to differentiate between potentially identical > container names assigned in a nested container system. If we assign a > container serial number sequentially (atomic64_inc) from the kernel on request > from userspace like the sessionID and log the creation with all nsIDs and the > parent container serial number and/or container name, the nesting is clear due > to lack of ambiguity in potential duplicate names in nesting. If a container > serial number is used, the tree of inheritance of nested containers can be > rebuilt from the audit records showing what containers were spawned from what > parent. I believe we are going to need a container ID to container definition (namespace, etc.) mapping mechanism regardless of if the container ID is provided by userspace or a kernel generated serial number. This mapping should be recorded in the audit log when the container ID is created/defined. > As was suggested in one of the previous threads, if there are any events not > associated with a task (incoming network packets) we log the namespace ID and > then only concern ourselves with its container serial number or container name > once it becomes associated with a task at which point that tracking will be > more important anyways. Agreed. After all, a single namespace can be shared between multiple containers. For those security officers who need to track individual events like this they will have the container ID mapping information in the logs as well so they should be able to trace the unassociated event to a set of containers. > I'm not convinced that a userspace or kernel generated UUID is that useful > since they are large, not human readable and may not be globally unique given > the "pets vs cattle" direction we are going with potentially identical > conditions in hosts or containers spawning containers, but I see no need to > restrict them. >From a kernel perspective I think an int should suffice; after all, you can't have more containers then you have processes. If the container engine requires something more complex, it can use the int as input to its own mapping function. > How do we deal with setns()? Once it is determined that action is permitted, > given the new combinaiton of namespaces and potential membership in a different > container, record the transition from one container to another including all > namespaces if the latter are a different subset than the target container > initial set. That is a fun one, isn't it? I think this is where the container ID-to-definition mapping comes into play. If setns() changes the process such that the existing container ID is no longer valid then we need to do a new lookup in the table to see if another container ID is valid; if no established container ID mappings are valid, the container ID becomes "undefined". -- paul moore www.paul-moore.com From Bojana at gronet.rs Thu Aug 17 21:36:37 2017 From: Bojana at gronet.rs (Adam Richter) Date: Thu, 17 Aug 2017 13:36:37 -0800 Subject: No subject Message-ID: <201508E8-B9B7-41BE-BB86-D3EDE8F50CD0@gronet.rs> http://well.thephoneswipe.com Adam Richter From rgb at redhat.com Fri Aug 18 08:03:00 2017 From: rgb at redhat.com (Richard Guy Briggs) Date: Fri, 18 Aug 2017 04:03:00 -0400 Subject: [PATCH 2/9] Implement containers as kernel objects In-Reply-To: References: <149547014649.10599.12025037906646164347.stgit@warthog.procyon.org.uk> <149547016213.10599.1969443294414531853.stgit@warthog.procyon.org.uk> <20170814054711.GB29957@madcap2.tricolour.ca> Message-ID: <20170818080300.GQ7187@madcap2.tricolour.ca> On 2017-08-16 18:21, Paul Moore wrote: > On Mon, Aug 14, 2017 at 1:47 AM, Richard Guy Briggs wrote: > > Hi David, > > > > I wanted to respond to this thread to attempt some constructive feedback, > > better late than never. I had a look at your fsopen/fsmount() patchset(s) to > > support this patchset which was interesting, but doesn't directly affect my > > work. The primary patch of interest to the audit kernel folks (Paul Moore and > > me) is this patch while the rest of the patchset is interesting, but not likely > > to directly affect us. This patch has most of what we need to solve our > > problem. > > > > Paul and I agree that audit is going to have a difficult time identifying > > containers or even namespaces without some change to the kernel. The audit > > subsystem in the kernel needs at least a basic clue about which container > > caused an event to be able to report this at the appropriate level and ignore > > it at other levels to avoid a DoS. > > While there is some increased risk of "death by audit", this is really > only an issue once we start supporting multiple audit daemons; simply > associating auditable events with the container that triggered them > shouldn't add any additional overhead (I hope). For a number of use > cases, a single auditd running outside the containers, but recording > all their events with some type of container attribution will be > sufficient. This is step #1. > > However, we will obviously want to go a bit further and support > multiple audit daemons on the system to allow containers to > record/process their own events (side note: the non-container auditd > instance will still see all the events). There are a number of ways > we could tackle this, both via in-kernel and in-userspace record > routing, each with their own pros/cons. However, how this works is > going to be dependent on how we identify containers and track their > audit events: the bits from step #1. For this reason I'm not really > interested in worrying about the multiple auditd problem just yet; > it's obviously important, and something to keep in mind while working > up a solution, but it isn't something we should focus on right now. > > > We also agree that there will need to be some sort of trigger from userspace to > > indicate the creation of a container and its allocated resources and we're not > > really picky how that is done, such as a clone flag, a syscall or a sysfs write > > (or even a read, I suppose), but there will need to be some permission > > restrictions, obviously. (I'd like to see capabilities used for this by adding > > a specific container bit to the capabilities bitmask.) > > To be clear, from an audit perspective I think the only thing we would > really care about controlling access to is the creation and assignment > of a new audit container ID/token, not necessarily the container > itself. It's a small point, but an important one I think. > > > I doubt we will be able to accomodate all definitions or concepts of a > > container in a timely fashion. We'll need to start somewhere with a minimum > > definition so that we can get traction and actually move forward before another > > compelling shared kernel microservice method leaves our entire community > > behind. I'd like to declare that a container is a full set of cloned > > namespaces, but this is inefficient, overly constricting and unnecessary for > > our needs. If we could agree on a minimum definition of a container (which may > > have only one specific cloned namespace) then we have something on which to > > build. I could even see a container being defined by a trigger sent from > > userspace about a process (task) from which all its children are considered to > > be within that container, subject to further nesting. > > I really would prefer if we could avoid defining the term "container". > Even if we manage to get it right at this particular moment, we will > surely be made fools a year or two from now when things change. At > the very least lets avoid a rigid definition of container, I'll > concede that we will probably need to have some definition simply so > we can implement something, I just don't want the design or > implementation to depend on a particular definition. > > This comment is jumping ahead a bit, but from an audit perspective I > think we handle this by emitting an audit record whenever a container > ID is created which describes it as the kernel sees it; as of now that > probably means a list of namespace IDs. Richard mentions this in his > email, I just wanted to make it clear that I think we should see this > as a flexible mechanism. At the very least we will likely see a few > more namespaces before the world moves on from containers. > > > In the simplest usable model for audit, if a container (definition implies and) > > starts a PID namespace, then the container ID could simply be the container's > > "init" process PID in the initial PID namespace. This assumes that as soon as > > that process vanishes, that entire container and all its children are killed > > off (which you've done). There may be some container orchestration systems > > that don't use a unique PID namespace per container and that imposing this will > > cause them challenges. > > I don't follow how this would cause challenges if the containers do > not use a unique PID namespace; you are suggesting using the PID from > in the context of the initial PID namespace, yes? The PID of the "init" process of a container (PID=1 inside container, but PID=containerID from the initial PID namespace perspective). > Regardless, I do worry that using a PID could potentially be a bit > racy once we start jumping between kernel and userspace (audit > configuration, logs, etc.). How do you think this could be racy? An event happenning before or as the container has been defined? > > If containers have at minimum a unique mount namespace then the root path > > dentry inode device and inode number could be used, but there are likely better > > identifiers. Again, there may be container orchestrators that don't use a > > unique mount namespace per container and that imposing this will cause > > challenges. > > > > I expect there are similar examples for each of the other namespaces. > > The PID case is a bit unique as each process is going to have a unique > PID regardless of namespaces, but even that has some drawbacks as > discussed above. As for the other namespaces, I agree that we can't > rely on them (see my earlier comments). (In general can you specify which earlier comments so we can be sure to what you are referring?) > > If we could pick one namespace type for consensus for which each container has > > a unique instance of that namespace, we could use the dev/ino tuple from that > > namespace as had originally been suggested by Aristeu Rozanski more than 4 > > years ago as part of the set of namespace IDs. I had also attempted to > > solve this problem by using the namespace' proc inode, then switched over to > > generate a unique kernel serial number for each namespace and then went back to > > namespace proc dev/ino once Al Viro implemented nsfs: > > v1 https://lkml.org/lkml/2014/4/22/662 > > v2 https://lkml.org/lkml/2014/5/9/637 > > v3 https://lkml.org/lkml/2014/5/20/287 > > v4 https://lkml.org/lkml/2014/8/20/844 > > v5 https://lkml.org/lkml/2014/10/6/25 > > v6 https://lkml.org/lkml/2015/4/17/48 > > v7 https://lkml.org/lkml/2015/5/12/773 > > > > These patches don't use a container ID, but track all namespaces in use for an > > event. This has the benefit of punting this tracking to userspace for some > > other tool to analyse and determine to which container an event belongs. > > This will use a lot of bandwidth in audit log files when a single > > container ID that doesn't require nesting information to be complete > > would be a much more efficient use of audit log bandwidth. > > Relying on a particular namespace to identify a containers is a > non-starter from my perspective for all the reasons previously > discussed. I'd rather not either and suspect there isn't much danger of it, but if it is determined that there is one namespace in particular that is a minimum requirement, I'd prefer to use that nsID instead of creating an additional ID. > > If we rely only on the setting of arbitrary container names from userspace, > > then we must provide a map or tree back to the initial audit domain for that > > running kernel to be able to differentiate between potentially identical > > container names assigned in a nested container system. If we assign a > > container serial number sequentially (atomic64_inc) from the kernel on request > > from userspace like the sessionID and log the creation with all nsIDs and the > > parent container serial number and/or container name, the nesting is clear due > > to lack of ambiguity in potential duplicate names in nesting. If a container > > serial number is used, the tree of inheritance of nested containers can be > > rebuilt from the audit records showing what containers were spawned from what > > parent. > > I believe we are going to need a container ID to container definition > (namespace, etc.) mapping mechanism regardless of if the container ID > is provided by userspace or a kernel generated serial number. This > mapping should be recorded in the audit log when the container ID is > created/defined. Agreed. > > As was suggested in one of the previous threads, if there are any events not > > associated with a task (incoming network packets) we log the namespace ID and > > then only concern ourselves with its container serial number or container name > > once it becomes associated with a task at which point that tracking will be > > more important anyways. > > Agreed. After all, a single namespace can be shared between multiple > containers. For those security officers who need to track individual > events like this they will have the container ID mapping information > in the logs as well so they should be able to trace the unassociated > event to a set of containers. > > > I'm not convinced that a userspace or kernel generated UUID is that useful > > since they are large, not human readable and may not be globally unique given > > the "pets vs cattle" direction we are going with potentially identical > > conditions in hosts or containers spawning containers, but I see no need to > > restrict them. > > From a kernel perspective I think an int should suffice; after all, > you can't have more containers then you have processes. If the > container engine requires something more complex, it can use the int > as input to its own mapping function. PIDs roll over. That already causes some ambiguity in reporting. If a system is constantly spawning and reaping containers, especially single-process containers, I don't want to have to worry about that ID rolling to keep track of it even though there should be audit records of the spawn and death of each container. There isn't significant cost added here compared with some of the other overhead we're dealing with. > > How do we deal with setns()? Once it is determined that action is permitted, > > given the new combinaiton of namespaces and potential membership in a different > > container, record the transition from one container to another including all > > namespaces if the latter are a different subset than the target container > > initial set. > > That is a fun one, isn't it? I think this is where the container > ID-to-definition mapping comes into play. If setns() changes the > process such that the existing container ID is no longer valid then we > need to do a new lookup in the table to see if another container ID is > valid; if no established container ID mappings are valid, the > container ID becomes "undefined". Hopefully we can design this stuff so that container IDs are still valid while that transition occurs. > paul moore - RGB -- Richard Guy Briggs Sr. S/W Engineer, Kernel Security, Base Operating Systems Remote, Ottawa, Red Hat Canada IRC: rgb, SunRaycer Voice: +1.647.777.2635, Internal: (81) 32635 From us1nmnvs at gmail.com Fri Aug 18 12:07:29 2017 From: us1nmnvs at gmail.com (Tg Hh) Date: Fri, 18 Aug 2017 05:07:29 -0700 Subject: No subject Message-ID: From us1nmnvs at gmail.com Fri Aug 18 21:16:48 2017 From: us1nmnvs at gmail.com (Tg Hh) Date: Fri, 18 Aug 2017 14:16:48 -0700 Subject: No subject Message-ID: From noreply at defenderpower.com Sat Aug 19 18:50:56 2017 From: noreply at defenderpower.com (Fast-Pharmacy) Date: Sat, 19 Aug 2017 15:50:56 -0300 Subject: Use our month of unbelievable discounts to keep yourself and your family healthy and happy! In-Reply-To: Message-ID: Low price offered! Trusted delivery. Excellent service! bMedseforeMen2 7MedseforeWomena Mock at automatically and uneven texture. Jeopardy and hard wired logic. Smallerships on counteract that but claudia. Getsfartherand farther into science college through oppression of order was available. From noreply at identytag.com Mon Aug 21 23:21:21 2017 From: noreply at identytag.com (Canadian-Drugstore) Date: Mon, 21 Aug 2017 17:21:21 -0600 Subject: Shopping for medications don't be stupid and don't buy fake medicine! Visit our pharmacy! References: Message-ID: Wonderful service. Very fast delivery! ENTER HERE From cdlyim at btibn.com Fri Aug 25 03:34:13 2017 From: cdlyim at btibn.com (cdlyim at btibn.com) Date: Fri, 25 Aug 2017 11:34:13 +0800 Subject: =?utf-8?Q?=E5=B8=A6=E5=AE=A2=E6=9C=8D=E5=8A=9F=E8=83=BD=E7=9A=84?= =?utf-8?Q?=E8=90=A5=E9=94=80=E5=9E=8B=E5=AE=98=E7=BD=91=EF=BC=8C?= =?utf-8?Q?=E5=8F=AA=E8=A6=81160=E5=85=83=EF=BC=81?= Message-ID: <1.a00d95b94bcd840d99f4@O-B-21> ?????20??????50??????????????????? ???1688??????????? ??????? 90?????????????????? ??????????????? 1688??????????????????? ?????????????????? ????APP????????? ???????????? ?????????????????????? ???????????????????? ????APP????????? ???????????????????????????????????????????????????????????????????????:?????? ??????????????????? From cdlyim at btibn.com Fri Aug 25 03:34:13 2017 From: cdlyim at btibn.com (cdlyim at btibn.com) Date: Fri, 25 Aug 2017 11:34:13 +0800 Subject: =?utf-8?Q?=E5=B8=A6=E5=AE=A2=E6=9C=8D=E5=8A=9F=E8=83=BD=E7=9A=84?= =?utf-8?Q?=E8=90=A5=E9=94=80=E5=9E=8B=E5=AE=98=E7=BD=91=EF=BC=8C?= =?utf-8?Q?=E5=8F=AA=E8=A6=81160=E5=85=83=EF=BC=81?= Message-ID: <1.a00d95b94bcd840d99f4@O-B-21> ?????20??????50??????????????????? ???1688??????????? ??????? 90?????????????????? ??????????????? 1688??????????????????? ?????????????????? ????APP????????? ???????????? ?????????????????????? ???????????????????? ????APP????????? ???????????????????????????????????????????????????????????????????????:?????? ??????????????????? From jess.cameron at forematica.com Fri Aug 25 14:11:58 2017 From: jess.cameron at forematica.com (jess.cameron at forematica.com) Date: Fri, 25 Aug 2017 14:11:58 +0000 Subject: IBM Lotus Software Users List Message-ID: <94eb2c1164d837f8b9055794879a@google.com>

Hi there,

?

Would you be interested in acquiring IBM?Lotus Software Users List?for marketing or email campaign?

List contains: Name, Company's Name, Phone Number, Job Title, Email address, Complete Mailing address, Company revenue, size, Web address etc.

?

We also have other technology users: Rational Software, Novell, Oracle Corporation, VMware, and many more.

?

We also provide Decision Makers Contacts such as?C- level, VP level, Directors and Managers contact details.

If it sounds valuable for you, kindly let me know your criteria.

Await your response!

Regards,
Jess Cameron

Demand Generation-Database Coordinator

if you are not the right person, feel free to forward this email to the right person in your organization.

?

To opt out kindly reply back with unsubscribe

 

powered by GSM. Free mail merge and email marketing software for Gmail. From jess.cameron at forematica.com Tue Aug 29 14:02:12 2017 From: jess.cameron at forematica.com (jess.cameron at forematica.com) Date: Tue, 29 Aug 2017 14:02:12 +0000 Subject: Linux Users List Message-ID: <001a113edfdaa40aaf0557e4dbab@google.com>

Hi,

?

We would like to learn your interest in acquiring our recently updated Linux Users List which helps you to improve your business campaign.

?

We have a verified list of MSPs with complete contact information like Company name, Website, Contact name (First, Middle, Last), Title, Direct email address, Phone number, Postal address, Industry, Employee size, Revenue size, Fax etc.

?

We have other Innovation information also like: Ubuntu, CentOS, Fedora, macOS Sierra, Chromium OS, Oracle Linux, Tizen, and many more.

?

Specialties: Ubuntu, CentOS, Fedora, macOS Sierra, Chromium OS, Oracle Linux, Tizen.

?

Please let me know if this is something of interest to you? I would love to share further details for your review.

?

Best Regards,

?

Jess Cameron

Database Consultant- Global IT Growth

If you don?t wish to receive further emails, please reply with Remove.

?

?

 

powered by GSM. Free mail merge and email marketing software for Gmail.