[Linux-kernel-mentees] [PATCH] cred: Use RCU primitives to access RCU pointers

Joel Fernandes joel at joelfernandes.org
Thu Feb 6 01:32:51 UTC 2020


On Wed, Jan 29, 2020 at 03:14:56PM +0100, Jann Horn wrote:
> On Wed, Jan 29, 2020 at 7:57 AM Amol Grover <frextrite at gmail.com> wrote:
> > On Tue, Jan 28, 2020 at 08:09:17PM +0100, Jann Horn wrote:
> > > On Tue, Jan 28, 2020 at 6:04 PM Amol Grover <frextrite at gmail.com> wrote:
> > > > On Tue, Jan 28, 2020 at 10:30:19AM +0100, Jann Horn wrote:
> > > > > On Tue, Jan 28, 2020 at 8:28 AM Amol Grover <frextrite at gmail.com> wrote:
> > > > > > task_struct.cred and task_struct.real_cred are annotated by __rcu,
> > > > >
> > > > > task_struct.cred doesn't actually have RCU semantics though, see
> > > > > commit d7852fbd0f0423937fa287a598bfde188bb68c22. For task_struct.cred,
> > > > > it would probably be more correct to remove the __rcu annotation?
> > > >
> > > > Hi Jann,
> > > >
> > > > I went through the commit you mentioned. If I understand it correctly,
> > > > ->cred was not being accessed concurrently (via RCU), hence, a non_rcu
> > > > flag was introduced, which determined if the clean-up should wait for
> > > > RCU grace-periods or not. And since, the changes were 'thread local'
> > > > there was no need to wait for an entire RCU GP to elapse.
> > >
> > > Yeah.
> > >
> > > > The commit too, as you said, mentions the removal of __rcu annotation.
> > > > However, simply removing the annotation won't work, as there are quite a
> > > > few instances where RCU primitives are used. Even get_current_cred()
> > > > uses RCU APIs to get a reference to ->cred.
> > >
> > > Luckily, there aren't too many places that directly access ->cred,
> > > since luckily there are helper functions like get_current_cred() that
> > > will do it for you. Grepping through the kernel, I see:
> [...]
> > > So actually, the number of places that already don't use RCU accessors
> > > is much higher than the number of places that use them.
> > >
> > > > So, currently, maybe we
> > > > should continue to use RCU APIs and leave the __rcu annotation in?
> > > > (Until someone who takes it on himself to remove __rcu annotation and
> > > > fix all the instances). Does that sound good? Or do you want me to
> > > > remove __rcu annotation and get the process started?
> > >
> > > I don't think it's a good idea to add more uses of RCU APIs for
> > > ->cred; you shouldn't "fix" warnings by making the code more wrong.
> > >
> > > If you want to fix this, I think it would be relatively easy to fix
> > > this properly - as far as I can tell, there are only seven places that
> > > you'll have to change, although you may have to split it up into three
> > > patches.
> >
> > Thank you for the detailed analysis. I'll try my best and send you a
> > patch.

Amol, Jann, if I understand the discussion correctly, objects ->cred
point (the subjective creds) are never (or never need to be) RCU-managed.
This makes sense in light of the commit Jann pointed out
(d7852fbd0f0423937fa287a598bfde188bb68c22).

How about the following diff as a starting point?

1. Remove all ->cred accessing happening through RCU primitive.
2. Remove __rcu from task_struct ->cred
3. Also I removed the whole non_rcu flag, and introduced a new put_cred_non_rcu() API
   which places that task-synchronously use ->cred can overwrite. Callers
   doing such accesses like access() can use this API instead.

I have only build tested the below diff and it is likely buggy but Amol you
can use it as a starting point, or we can discuss more on this thread.

---8<-----------------------

diff --git a/fs/open.c b/fs/open.c
index 8cdb2b6758678..bbdd1f352e4e3 100644
--- a/fs/open.c
+++ b/fs/open.c
@@ -374,25 +374,6 @@ long do_faccessat(int dfd, const char __user *filename, int mode)
 				override_cred->cap_permitted;
 	}
 
-	/*
-	 * The new set of credentials can *only* be used in
-	 * task-synchronous circumstances, and does not need
-	 * RCU freeing, unless somebody then takes a separate
-	 * reference to it.
-	 *
-	 * NOTE! This is _only_ true because this credential
-	 * is used purely for override_creds() that installs
-	 * it as the subjective cred. Other threads will be
-	 * accessing ->real_cred, not the subjective cred.
-	 *
-	 * If somebody _does_ make a copy of this (using the
-	 * 'get_current_cred()' function), that will clear the
-	 * non_rcu field, because now that other user may be
-	 * expecting RCU freeing. But normal thread-synchronous
-	 * cred accesses will keep things non-RCY.
-	 */
-	override_cred->non_rcu = 1;
-
 	old_cred = override_creds(override_cred);
 retry:
 	res = user_path_at(dfd, filename, lookup_flags, &path);
@@ -436,7 +417,13 @@ long do_faccessat(int dfd, const char __user *filename, int mode)
 	}
 out:
 	revert_creds(old_cred);
-	put_cred(override_cred);
+
+	/*
+	 * override_cred points to current task's ->cred and will not be used
+	 * by anyone but the current task. Since we are done using it, remove it
+	 * without waiting for a grace period.
+	 */
+	put_cred_non_rcu(override_cred);
 	return res;
 }
 
diff --git a/include/linux/cred.h b/include/linux/cred.h
index 18639c069263f..2f41a0fa75741 100644
--- a/include/linux/cred.h
+++ b/include/linux/cred.h
@@ -145,14 +145,10 @@ struct cred {
 	struct user_struct *user;	/* real user ID subscription */
 	struct user_namespace *user_ns; /* user_ns the caps and keyrings are relative to. */
 	struct group_info *group_info;	/* supplementary groups for euid/fsgid */
-	/* RCU deletion */
-	union {
-		int non_rcu;			/* Can we skip RCU deletion? */
-		struct rcu_head	rcu;		/* RCU deletion hook */
-	};
+	struct rcu_head	rcu;		/* RCU deletion hook if needed*/
 } __randomize_layout;
 
-extern void __put_cred(struct cred *);
+extern void __put_cred(struct cred *, bool rcu);
 extern void exit_creds(struct task_struct *);
 extern int copy_creds(struct task_struct *, unsigned long);
 extern const struct cred *get_task_cred(struct task_struct *);
@@ -250,7 +246,6 @@ static inline const struct cred *get_cred(const struct cred *cred)
 	if (!cred)
 		return cred;
 	validate_creds(cred);
-	nonconst_cred->non_rcu = 0;
 	return get_new_cred(nonconst_cred);
 }
 
@@ -262,7 +257,6 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred)
 	if (!atomic_inc_not_zero(&nonconst_cred->usage))
 		return NULL;
 	validate_creds(cred);
-	nonconst_cred->non_rcu = 0;
 	return cred;
 }
 
@@ -270,8 +264,9 @@ static inline const struct cred *get_cred_rcu(const struct cred *cred)
  * put_cred - Release a reference to a set of credentials
  * @cred: The credentials to release
  *
- * Release a reference to a set of credentials, deleting them when the last ref
- * is released.  If %NULL is passed, nothing is done.
+ * Release a reference to a set of credentials, deleting them after an RCU
+ * grace period when the last ref is released.
+ * If %NULL is passed, nothing is done.
  *
  * This takes a const pointer to a set of credentials because the credentials
  * on task_struct are attached by const pointers to prevent accidental
@@ -284,18 +279,35 @@ static inline void put_cred(const struct cred *_cred)
 	if (cred) {
 		validate_creds(cred);
 		if (atomic_dec_and_test(&(cred)->usage))
-			__put_cred(cred);
+			__put_cred(cred, true);
+	}
+}
+
+/**
+ * put_cred - Release a reference to a set of credentials
+ * @cred: The credentials to release
+ *
+ * Same as put_cred() except that the freeing of the cred object is done
+ * immediately without waiting for RCU grace period. This is useful when
+ * using a task's temporary subjective credentials (task_struct::cred).
+ */
+static inline void put_cred_non_rcu(const struct cred *_cred)
+{
+	struct cred *cred = (struct cred *) _cred;
+
+	if (cred) {
+		validate_creds(cred);
+		if (atomic_dec_and_test(&(cred)->usage))
+			__put_cred(cred, false);
 	}
 }
 
 /**
  * current_cred - Access the current task's subjective credentials
  *
- * Access the subjective credentials of the current task.  RCU-safe,
- * since nobody else can modify it.
+ * Access the subjective credentials of the current task.
  */
-#define current_cred() \
-	rcu_dereference_protected(current->cred, 1)
+#define current_cred() READ_ONCE(current->cred)
 
 /**
  * current_real_cred - Access the current task's objective credentials
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 716ad1d8d95e1..227375311d992 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -879,8 +879,9 @@ struct task_struct {
 	/* Objective and real subjective task credentials (COW): */
 	const struct cred __rcu		*real_cred;
 
-	/* Effective (overridable) subjective task credentials (COW): */
-	const struct cred __rcu		*cred;
+	/* Effective (overridable) subjective task credentials (COW):
+	 * Access is not managed by RCU and is used task-synchronously */
+	const struct cred		*cred;
 
 #ifdef CONFIG_KEYS
 	/* Cached requested key. */
diff --git a/kernel/auditsc.c b/kernel/auditsc.c
index 4effe01ebbe2b..2e70a73646d53 100644
--- a/kernel/auditsc.c
+++ b/kernel/auditsc.c
@@ -430,24 +430,19 @@ static int audit_field_compare(struct task_struct *tsk,
 /* Determine if any context name data matches a rule's watch data */
 /* Compare a task_struct with an audit_rule.  Return 1 on match, 0
  * otherwise.
- *
- * If task_creation is true, this is an explicit indication that we are
- * filtering a task rule at task creation time.  This and tsk == current are
- * the only situations where tsk->cred may be accessed without an rcu read lock.
  */
 static int audit_filter_rules(struct task_struct *tsk,
 			      struct audit_krule *rule,
 			      struct audit_context *ctx,
 			      struct audit_names *name,
-			      enum audit_state *state,
-			      bool task_creation)
+			      enum audit_state *state)
 {
 	const struct cred *cred;
 	int i, need_sid = 1;
 	u32 sid;
 	unsigned int sessionid;
 
-	cred = rcu_dereference_check(tsk->cred, tsk == current || task_creation);
+	cred = READ_ONCE(tsk->cred);
 
 	for (i = 0; i < rule->field_count; i++) {
 		struct audit_field *f = &rule->fields[i];
@@ -745,7 +740,7 @@ static enum audit_state audit_filter_task(struct task_struct *tsk, char **key)
 	rcu_read_lock();
 	list_for_each_entry_rcu(e, &audit_filter_list[AUDIT_FILTER_TASK], list) {
 		if (audit_filter_rules(tsk, &e->rule, NULL, NULL,
-				       &state, true)) {
+				       &state)) {
 			if (state == AUDIT_RECORD_CONTEXT)
 				*key = kstrdup(e->rule.filterkey, GFP_ATOMIC);
 			rcu_read_unlock();
@@ -791,7 +786,7 @@ static enum audit_state audit_filter_syscall(struct task_struct *tsk,
 	list_for_each_entry_rcu(e, list, list) {
 		if (audit_in_mask(&e->rule, ctx->major) &&
 		    audit_filter_rules(tsk, &e->rule, ctx, NULL,
-				       &state, false)) {
+				       &state)) {
 			rcu_read_unlock();
 			ctx->current_state = state;
 			return state;
@@ -815,7 +810,7 @@ static int audit_filter_inode_name(struct task_struct *tsk,
 
 	list_for_each_entry_rcu(e, list, list) {
 		if (audit_in_mask(&e->rule, ctx->major) &&
-		    audit_filter_rules(tsk, &e->rule, ctx, n, &state, false)) {
+		    audit_filter_rules(tsk, &e->rule, ctx, n, &state)) {
 			ctx->current_state = state;
 			return 1;
 		}
diff --git a/kernel/cred.c b/kernel/cred.c
index 809a985b17934..120258b9d87df 100644
--- a/kernel/cred.c
+++ b/kernel/cred.c
@@ -129,7 +129,7 @@ static void put_cred_rcu(struct rcu_head *rcu)
  *
  * Destroy a set of credentials on which no references remain.
  */
-void __put_cred(struct cred *cred)
+void __put_cred(struct cred *cred, bool rcu)
 {
 	kdebug("__put_cred(%p{%d,%d})", cred,
 	       atomic_read(&cred->usage),
@@ -144,10 +144,10 @@ void __put_cred(struct cred *cred)
 	BUG_ON(cred == current->cred);
 	BUG_ON(cred == current->real_cred);
 
-	if (cred->non_rcu)
-		put_cred_rcu(&cred->rcu);
-	else
+	if (rcu)
 		call_rcu(&cred->rcu, put_cred_rcu);
+	else
+		put_cred_rcu(&cred->rcu);
 }
 EXPORT_SYMBOL(__put_cred);
 
@@ -264,7 +264,6 @@ struct cred *prepare_creds(void)
 	old = task->cred;
 	memcpy(new, old, sizeof(struct cred));
 
-	new->non_rcu = 0;
 	atomic_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_group_info(new->group_info);
@@ -485,7 +484,7 @@ int commit_creds(struct cred *new)
 	if (new->user != old->user)
 		atomic_inc(&new->user->processes);
 	rcu_assign_pointer(task->real_cred, new);
-	rcu_assign_pointer(task->cred, new);
+	WRITE_ONCE(task->cred, new);
 	if (new->user != old->user)
 		atomic_dec(&old->user->processes);
 	alter_cred_subscribers(old, -2);
@@ -549,20 +548,9 @@ const struct cred *override_creds(const struct cred *new)
 	validate_creds(old);
 	validate_creds(new);
 
-	/*
-	 * NOTE! This uses 'get_new_cred()' rather than 'get_cred()'.
-	 *
-	 * That means that we do not clear the 'non_rcu' flag, since
-	 * we are only installing the cred into the thread-synchronous
-	 * '->cred' pointer, not the '->real_cred' pointer that is
-	 * visible to other threads under RCU.
-	 *
-	 * Also note that we did validate_creds() manually, not depending
-	 * on the validation in 'get_cred()'.
-	 */
-	get_new_cred((struct cred *)new);
+	get_cred(new);
 	alter_cred_subscribers(new, 1);
-	rcu_assign_pointer(current->cred, new);
+	WRITE_ONCE(current->cred, new);
 	alter_cred_subscribers(old, -1);
 
 	kdebug("override_creds() = %p{%d,%d}", old,
@@ -590,7 +578,7 @@ void revert_creds(const struct cred *old)
 	validate_creds(old);
 	validate_creds(override);
 	alter_cred_subscribers(old, 1);
-	rcu_assign_pointer(current->cred, old);
+	WRITE_ONCE(current->cred, old);
 	alter_cred_subscribers(override, -1);
 	put_cred(override);
 }
@@ -697,7 +685,6 @@ struct cred *prepare_kernel_cred(struct task_struct *daemon)
 	validate_creds(old);
 
 	*new = *old;
-	new->non_rcu = 0;
 	atomic_set(&new->usage, 1);
 	set_cred_subscribers(new, 0);
 	get_uid(new->user);


More information about the Linux-kernel-mentees mailing list