[PATCH 09/10] Enable multiple instances of devpts

Serge E. Hallyn serue at us.ibm.com
Mon Sep 29 06:01:31 PDT 2008


Quoting sukadev at linux.vnet.ibm.com (sukadev at linux.vnet.ibm.com):
> Thanks for detailed review Serge.
> 
> Sorry was unexpectedly off most of this week. Have some responses below
> will review/address other comments over the next few days.
> 
> 
> Serge E. Hallyn [serue at us.ibm.com] wrote:
> | Quoting sukadev at us.ibm.com (sukadev at us.ibm.com):
> | > 
> | > >From 4567a37856205a04cc0617e3fcc8ede36b25bcf5 Mon Sep 17 00:00:00 2001
> | > From: Sukadev Bhattiprolu <sukadev at us.ibm.com>
> | > Date: Tue, 9 Sep 2008 18:52:56 -0700
> | > Subject: [PATCH 09/10] Enable multiple instances of devpts
> | > 
> | > To support containers, allow multiple instances of devpts filesystem, such
> | > that indices of ptys allocated in one instance are independent of ptys
> | > allocated in other instances of devpts.
> | > 
> | > But to preserve backward compatibility, enable this support for multiple
> | > instances only if:
> | > 
> | > 	- CONFIG_DEVPTS_MULTIPLE_INSTANCES is set to Y, and
> | > 	- '-o newinstance' mount option is specified while mounting devpts
> | > 
> | > See Documentation/fs/devpts.txt (next patch in series) for details.
> | > 
> | > To use multi-instance mount, a container startup script could:
> | > 
> | > 	$ ns_exec -cm /bin/bash
> | > 	$ umount /dev/pts
> | > 	$ mount -t devpts -o newinstance lxcpts /dev/pts
> | > 	$ mount -o bind /dev/pts/ptmx /dev/ptmx
> | > 	$ sshd -p 1234
> | > 
> | > where 'ns_exec -cm /bin/bash' is calls clone() with CLONE_NEWNS flag and execs
> | > /bin/bash in the child process. A pty created by the sshd is not visible in
> | > the original mount of /dev/pts.
> | > 
> | > USER-SPACE-IMPACT:
> | > 
> | > 	See Documentation/fs/devpts.txt (included in next patch) for user-space
> | > 	impact in multi-instance and mixed-mode operation.
> | > TODO:
> | > 	- Update mount(8), pts(4) man pages. Highlight impact of not
> | > 	  redirecting /dev/ptmx to /dev/pts/ptmx after a multi-instance mount.
> | > 
> | > Implementation note:
> | > 
> | > 	See comments in new get_sb_ref() function in fs/super.c on why
> | > 	get_sb_single() cannot be directly used.
> | > 
> | > Changelog[v5]:
> | > 	- Move get_sb_ref() definition to earlier patch
> | > 
> | > 	- Move usage info to Documentation/filesystems/devpts.txt (next patch)
> | > 
> | > 	- Make ptmx node even in init_pts_ns, now that default mode is 0000
> | > 	  (defined in earlier patch, enabled here).
> | > 
> | > 	- Cache ptmx dentry and use to update mode during remount
> | > 	  (defined in earlier patch, enabled here).
> | > 
> | > 	- Bugfix: explicitly ignore newinstance on remount (if newinstance was
> | > 	  specified on remount of initial mount, it would be ignored but
> | > 	  /proc/mounts would imply that the option was set)
> | > 
> | > Changelog[v4]:
> | > 
> | > 	- Update patch description to address H. Peter Anvin's comments
> | > 
> | > 	- Consolidate multi-instance mode code under new config token,
> | > 	  CONFIG_DEVPTS_MULTIPLE_INSTANCE.
> | > 
> | > 	- Move usage-details from patch description to
> | > 	  Documentation/fs/devpts.txt
> | > 
> | > Changelog[v3]:
> | > 	- Rename new mount option to 'newinstance'
> | > 
> | > 	- Create ptmx nodes only in 'newinstance' mounts
> | > 
> | > 	- Bugfix: parse_mount_options() modifies @data but since we need to
> | > 	  parse the @data twice (once in devpts_get_sb() and once during
> | > 	  do_remount_sb()), parse a local copy of @data in devpts_get_sb().
> | > 	  (restructured code in devpts_get_sb() to fix this)
> | > 
> | > Changelog[v2]:
> | > 	- Support both single-mount and multiple-mount semantics and
> | > 	  provide '-onewmnt' option to select the semantics.
> | > 
> | > Signed-off-by: Sukadev Bhattiprolu <sukadev at us.ibm.com>
> | > ---
> | >  fs/devpts/inode.c |  168 +++++++++++++++++++++++++++++++++++++++++++++++++++--
> | >  1 files changed, 163 insertions(+), 5 deletions(-)
> | > 
> | > diff --git a/fs/devpts/inode.c b/fs/devpts/inode.c
> | > index 6b56255..c54b010 100644
> | > --- a/fs/devpts/inode.c
> | > +++ b/fs/devpts/inode.c
> | > @@ -48,10 +48,11 @@ struct pts_mount_opts {
> | >  	gid_t   gid;
> | >  	umode_t mode;
> | >  	umode_t ptmxmode;
> | > +	int newinstance;
> | >  };
> | > 
> | >  enum {
> | > -	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode,
> | > +	Opt_uid, Opt_gid, Opt_mode, Opt_ptmxmode, Opt_newinstance,
> | >  	Opt_err
> | >  };
> | > 
> | > @@ -61,6 +62,7 @@ static match_table_t tokens = {
> | >  	{Opt_mode, "mode=%o"},
> | >  #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
> | >  	{Opt_ptmxmode, "ptmxmode=%o"},
> | > +	{Opt_newinstance, "newinstance"},
> | >  #endif
> | >  	{Opt_err, NULL}
> | >  };
> | > @@ -78,13 +80,15 @@ static inline struct pts_fs_info *DEVPTS_SB(struct super_block *sb)
> | > 
> | >  static inline struct super_block *pts_sb_from_inode(struct inode *inode)
> | >  {
> | > +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
> | >  	if (inode->i_sb->s_magic == DEVPTS_SUPER_MAGIC)
> | >  		return inode->i_sb;
> | > -
> | > +#endif
> | >  	return devpts_mnt->mnt_sb;
> | >  }
> | > 
> | > -static int parse_mount_options(char *data, struct pts_mount_opts *opts)
> | > +static int parse_mount_options(char *data, int remount,
> | > +		struct pts_mount_opts *opts)
> | >  {
> | >  	char *p;
> | > 
> | > @@ -95,6 +99,10 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
> | >  	opts->mode    = DEVPTS_DEFAULT_MODE;
> | >  	opts->ptmxmode = DEVPTS_DEFAULT_PTMX_MODE;
> | 
> | So does this mean that if I do a remount, the mode and ptmxmode will get
> | reset to the defaults unless I specify them again?
> 
> Yes.
> 
> I think I pointed out this behavior earlier but don't think I received any
> comments and stuck to current behavior (which for instance resets the mode
> to default).
> 
> Besides, this behavior may help in the transition - initial kernel mount
> would set ptmx-mode to 0 (default) and admin can update /etc/fstab to set
> ptmx mode meaningfully.
> 
> | 
> | > 
> | > +	/* ignore newinstance on remount to avoid confusing show_options */
> | > +	if (!remount)
> | > +		opts->newinstance = 0;
> | > +
> | >  	while ((p = strsep(&data, ",")) != NULL) {
> | >  		substring_t args[MAX_OPT_ARGS];
> | >  		int token;
> | > @@ -128,6 +136,10 @@ static int parse_mount_options(char *data, struct pts_mount_opts *opts)
> | >  				return -EINVAL;
> | >  			opts->ptmxmode = option & S_IALLUGO;
> | >  			break;
> | > +		case Opt_newinstance:
> | > +			if (!remount)
> | > +				opts->newinstance = 1;
> | > +			break;
> | >  #endif
> | >  		default:
> | >  			printk(KERN_ERR "devpts: called with bogus options\n");
> | > @@ -180,6 +192,8 @@ static int mknod_ptmx(struct super_block *sb)
> | > 
> | >  	d_add(dentry, inode);
> | > 
> | > +	fsi->ptmx_dentry = dentry;
> | > +
> | >  	printk(KERN_DEBUG "Created ptmx node in devpts ino %lu\n",
> | >  			inode->i_ino);
> | > 
> | > @@ -207,7 +221,7 @@ static int devpts_remount(struct super_block *sb, int *flags, char *data)
> | >  	struct pts_fs_info *fsi = DEVPTS_SB(sb);
> | >  	struct pts_mount_opts *opts = &fsi->mount_opts;
> | > 
> | > -	err = parse_mount_options(data, opts);
> | > +	err = parse_mount_options(data, 1, opts);
> | 
> | A guess a #define rather than '1' would be better here.
> 
> Ok.
> 
> | 
> | > 
> | >  	/*
> | >  	 * parse_mount_options() restores options to default values
> | > @@ -232,6 +246,8 @@ static int devpts_show_options(struct seq_file *seq, struct vfsmount *vfs)
> | >  	seq_printf(seq, ",mode=%03o", opts->mode);
> | >  #ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
> | >  	seq_printf(seq, ",ptmxmode=%03o", opts->ptmxmode);
> | > +	if (opts->newinstance)
> | > +		seq_printf(seq, ",newinstance");
> | 
> | Is actually that something we want to show?  It doesn't seem
> | informative.
> 
> Without this users have no easy way of knowing whether they have a 
> private mount specially if they mounted from command line ?

If they were in a container to begin with, then they still don't know.

Now if you were to keep a unique per-instance id and have show_options
list 'instance=%x', that would be helpful.  Either that or just
dropping the info altogether make sense.  This 'newinstance' listing
is meaningless.

> Does it mislead or confuse ?
> 
> | 
> | >  #endif
> | > 
> | >  	return 0;
> | > @@ -298,12 +314,153 @@ fail:
> | >  	return -ENOMEM;
> | >  }
> | > 
> | > +#ifdef CONFIG_DEVPTS_MULTIPLE_INSTANCES
> | > +/*
> | > + * Safely parse the mount options in @data and update @opts.
> | > + *
> | > + * devpts ends up parsing options two times during mount, due to the
> | > + * two modes of operation it supports. The first parse occurs in
> | > + * devpts_get_sb() when determining the mode (single-instance or
> | > + * multi-instance mode). The second parse happens in devpts_remount()
> | > + * or new_pts_mount() depending on the mode.
> | > + *
> | > + * Parsing of options modifies the @data making subsequent parsing
> | > + * incorrect. So make a local copy of @data and parse it.
> | > + *
> | > + * Return: 0 On success, -errno on error
> | > + */
> | > +static int safe_parse_mount_options(void *data, struct pts_mount_opts *opts)
> | > +{
> | > +	int rc;
> | > +	void *datacp;
> | > +
> | > +	if (!data)
> | > +		return 0;
> | > +
> | > +	/* Use kstrdup() ?  */
> | > +	datacp = kmalloc(PAGE_SIZE, GFP_KERNEL);
> | > +	if (!datacp)
> | > +		return -ENOMEM;
> | > +
> | > +	memcpy(datacp, data, PAGE_SIZE);
> | > +	rc = parse_mount_options((char *)datacp, 0, opts);
> | > +	kfree(datacp);
> | > +
> | > +	return rc;
> | > +}
> | > +
> | > +/*
> | > + * Mount a new (private) instance of devpts.  PTYs created in this
> | > + * instance are independent of the PTYs in other devpts instances.
> | > + */
> | > +static int new_pts_mount(struct file_system_type *fs_type, int flags,
> | > +		void *data, struct vfsmount *mnt)
> | > +{
> | > +	int err;
> | > +	struct pts_fs_info *fsi;
> | > +	struct pts_mount_opts *opts;
> | > +
> | > +	printk(KERN_NOTICE "devpts: newinstance mount\n");
> | > +
> | > +	err = get_sb_nodev(fs_type, flags, data, devpts_fill_super, mnt);
> | > +	if (err)
> | > +		return err;
> | > +
> | > +	fsi = DEVPTS_SB(mnt->mnt_sb);
> | > +	opts = &fsi->mount_opts;
> | > +
> | > +	err = parse_mount_options(data, 0, opts);
> | > +	if (err)
> | > +		goto fail;
> | > +
> | > +	err = mknod_ptmx(mnt->mnt_sb);
> | > +	if (err)
> | > +		goto fail;
> | > +
> | > +	return 0;
> | > +
> | > +fail:
> | > +	dput(mnt->mnt_sb->s_root);
> | > +	deactivate_super(mnt->mnt_sb);
> | > +	return err;
> | > +}
> | > +
> | > +/*
> | > + * Check if 'newinstance' mount option was specified in @data.
> | > + *
> | > + * Return: -errno  	on error (eg: invalid mount options specified)
> | > + * 	 : 1 		if 'newinstance' mount option was specified
> | > + * 	 : 0 		if 'newinstance' mount option was NOT specified
> | > + */
> | > +static int is_new_instance_mount(void *data)
> | > +{
> | > +	int rc;
> | > +	struct pts_mount_opts opts;
> | > +
> | > +	if (!data)
> | > +		return 0;
> | > +
> | > +	rc = safe_parse_mount_options(data, &opts);
> | > +	if (!rc)
> | > +		rc = opts.newinstance;
> | > +
> | > +	return rc;
> | > +}
> | > +
> | > +/*
> | > + * Mount or remount the initial kernel mount of devpts. This type of
> | > + * mount maintains the legacy, single-instance semantics, while the
> | > + * kernel still allows multiple-instances.
> | > + */
> | > +static int init_pts_mount(struct file_system_type *fs_type, int flags,
> | > +		void *data, struct vfsmount *mnt)
> | > +{
> | > +	int err;
> | > +
> | > +	if (!devpts_mnt) {
> | > +		err = get_sb_single(fs_type, flags, data, devpts_fill_super,
> | > +				mnt);
> | > +
> | > +		err = mknod_ptmx(mnt->mnt_sb);
> | > +		if (err) {
> | > +			dput(mnt->mnt_sb->s_root);
> | > +			deactivate_super(mnt->mnt_sb);
> | > +		} else
> | > +			devpts_mnt = mnt;
> | > +
> | > +		return err;
> | 
> | There is no locking here, so in early-userspace two competing processes
> | could both try to set devpts_mnt, right?
> 
> Hmm. I was thinking there would be only one thread calling the
> vfs_kern_mount() in init_devpts_fs.

But what if init happens to (perhaps mistakenly) lead to 2 racing ones?

Sure it's just a small memory leak, but why not just prevent it.

> | 
> | > +	}
> | > +
> | > +	return get_sb_ref(devpts_mnt->mnt_sb, flags, data, mnt);
> | > +}
> | > +
> | >  static int devpts_get_sb(struct file_system_type *fs_type,
> | >  	int flags, const char *dev_name, void *data, struct vfsmount *mnt)
> | >  {
> | > +	int new;
> | > +
> | > +	new = is_new_instance_mount(data);
> | > +	if (new < 0)
> | > +		return new;
> | > +
> | > +	if (new)
> | > +		return new_pts_mount(fs_type, flags, data, mnt);
> | > +
> | > +	return init_pts_mount(fs_type, flags, data, mnt);
> | 
> | Wait a sec - so if a container does
> | 
> | 	mount -t devpts -o newinstance none /dev/pts
> | 	and then later on just does
> | 	mount -t devpts none /dev/pts
> | 
> | it'll get the init_pts_ns, not the one it had created?
> 
> Yes.  Should we treat the latter as remount of the private instance ?
> If so, user could add '-oremount' ?
> 
> The logic seems simple: With newinstance create a private namespace.
> Without newinstance, bind to initial ns.

But if I'm in a container in a new mounts ns and somehow managed 
to umount -l /dev/pts, shouldn't i be able to remount my container's
devpts by just doing 'mount -t devpts devpts /dev/pts'?

> | 
> | Yup, just confirmed, using:
> | 
> | 	ns_exec -cmiup /bin/sh
> | 	   mount -t devpts -o newinstance -n none /dev/pts
> | 	   mount -t devpts -n none /mnt
> | 	   ls /dev/pts
> | 	     ptmx
> | 	   ls /mnt
> | 	     0 ptmx
> | That's weird.
> | > +}
> | > +#else
> | > +/*
> | > + * This supports only the legacy single-instance semantics (no
> | > + * multiple-instance semantics)
> | > + */
> | > +static int devpts_get_sb(struct file_system_type *fs_type, int flags,
> | > +		const char *dev_name, void *data, struct vfsmount *mnt)
> | > +{
> | >  	return get_sb_single(fs_type, flags, data, devpts_fill_super, mnt);
> | >  }
> | > 
> | > +#endif
> | > +
> | >  static void devpts_kill_sb(struct super_block *sb)
> | >  {
> | >  	struct pts_fs_info *fsi = DEVPTS_SB(sb);
> | > @@ -431,8 +588,9 @@ void devpts_pty_kill(struct tty_struct *tty)
> | >  	if (dentry && !IS_ERR(dentry)) {
> | >  		inode->i_nlink--;
> | >  		d_delete(dentry);
> | > -		dput(dentry);
> | > +		dput(dentry);		// d_lookup in devpts_pty_new
> | >  	}
> | > +	dput(dentry);			// d_find_alias above
> | > 
> | >  	mutex_unlock(&root->d_inode->i_mutex);
> | >  }
> | > -- 
> | > 1.5.2.5


More information about the Containers mailing list