[Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues

Lukas Bulwahn lukas.bulwahn at gmail.com
Fri Sep 25 07:35:37 UTC 2020



On Fri, 25 Sep 2020, Dwaipayan Ray wrote:

> > Let us try to put some systematic structure into this handling of the
> > NO_AUTHOR_SIGN_OFF.
> >
> > There are basically two aspects to consider:
> >
> > - which classes are potential mismatches can we potentially observe?
> >
> > - which level of severity do we assign to each class of mismatch?
> >
> > To the first point, you started with some initial classes above.
> >
> >   0) same name, same address (no mismatch)
> >   1) same name, different address
> >      subclasses: (how 'different' address?)
> >      - email address differ by valid difference, e.g., extensions that
> >        will go to the same inbox, e.g.,  the xyz+fds extension in mail
> >        addresses. (that is your class 3, right?)
> >      - two email addresses that are known to belong to the same individual
> >        - known because of .mailmap
> >        - known otherwise?
> >   2) different name, same address
> >      subclasses: (how 'different' name?)
> >      - examples:
> >        - Firstname, Lastname (but middle-name initials differ)
> >        - special "regional" characters, like ü vs. ue, etc.
> >        - firstname and lastname are reverted etc.
> >   3) different name, different address (but still we believe it "matches")
> >      combinations of subclasses of 1) and 2), which we want to consider.
> >
> > Then, to the second question:
> >
> > So, these different classes we can think of and "assign" different
> > severities that checkpatch.pl can use.
> >
> > Checkpatch.pl already has four levels:
> >
> > three severity reporting levels: ERROR, WARN, CHECK, and
> > the level _do not report_ (which is implemented by just ignoring some kind
> > of difference).
> >
> > I think a lot of discussion will be around what severity to assign to
> > which class above, and in some way, long-term maintainers have a larger
> > say than we do here.
> >
> > So, let us first modify checkpatch.pl to identify the various classes
> > above, maybe just starting very basic, with different name, same address
> > and same name, different address and run it over the commit range and see
> > which examples show up and how often.
> >
> > For now, we first just use checkpatch.pl to identify the different types
> > and refine them into subclasses; then, we can discuss with severity should
> > be assigned to which type of mismatch.
> >
> > The second question should not invalidate our data collection and
> > identification of subclasses, though.
> >
> > Does that help? What do you think?
> >
> >
> > Lukas
> 
> Yes, that should help I think. Currently checkpatch is very vague
> about Author sign offs, so subclassing it will really be helpful.
> But at the same point I don't think it should become
> too complex either.

Agree, that is where the evaluation will help (what is really happening, 
what do we need to consider) and the severity comes into play (if we 
consider all subclasses of same severity, we do not need to differentiate 
between them.)

> 
> I have already prepared a patch for the basic two classes:
> 1) same name, different address
> 2) same address, different name
> 
> It would be great to have your feedback on this.

Okay, I suggest to run this script on a larger set of commits and see.

Probably, you can speed it your evaluation with checkpatch.pl by 
restricting it to the AUTHOR_SIGN_OFF type. I will try that over the 
weekend as well.

Lukas

> 
> ---
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index 9e65d21456f1..e23e753fcaf0 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1210,14 +1210,22 @@ sub reformat_email {
>   return format_email($email_name, $email_address);
>  }
> 
> -sub same_email_addresses {
> +sub same_mail_check {
>   my ($email1, $email2) = @_;
> 
>   my ($email1_name, $name1_comment, $email1_address, $comment1) =
> parse_email($email1);
>   my ($email2_name, $name2_comment, $email2_address, $comment2) =
> parse_email($email2);
> 
> - return $email1_name eq $email2_name &&
> -        $email1_address eq $email2_address;
> + my $same_name = $email1_name eq $email2_name;
> + my $same_address= $email1_address eq $email2_address;
> +
> + return ($same_name, $same_address);
> +}
> +
> +sub same_email_addresses {
> + my ($same_name, $same_address) = same_mail_check(@_);
> +
> + return $same_name && $same_address;
>  }
> 
>  sub which {
> @@ -2347,6 +2355,7 @@ sub process {
>   my $signoff = 0;
>   my $author = '';
>   my $authorsignoff = 0;
> + my $authorsignerr = '';
>   my $is_patch = 0;
>   my $is_binding_patch = -1;
>   my $in_header_lines = $file ? 0 : 1;
> @@ -2677,6 +2686,13 @@ sub process {
>   if ($author ne '') {
>   if (same_email_addresses($1, $author)) {
>   $authorsignoff = 1;
> + } else {
> + my ($same_name, $same_address) = same_mail_check($1, $author);
> + if($same_name) {
> + $authorsignerr = "ADDRESS_MISMATCH:${1}";
> + } elsif ($same_address) {
> + $authorsignerr = "NAME_MISMATCH:${1}";
> + }
>   }
>   }
>   }
> @@ -6891,8 +6907,16 @@ sub process {
>   ERROR("MISSING_SIGN_OFF",
>         "Missing Signed-off-by: line(s)\n");
>   } elsif (!$authorsignoff) {
> - WARN("NO_AUTHOR_SIGN_OFF",
> -      "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + if($authorsignerr =~ /^NAME_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author name mismatch: 'From: $author' != 'Signed-off-by: $1'\n");
> + } elsif ($authorsignerr =~ /^ADDRESS_MISMATCH:(.*)/) {
> + WARN("NO_AUTHOR_SIGN_OFF",
> +      "Author email address mismatch: 'From: $author' !=
> 'Signed-off-by: $1'\n");
> + } else {
> + WARN("NO_AUTHOR_SIGN_OFF",
> + "Missing Signed-off-by: line by nominal patch author '$author'\n");
> + }
>   }
>   }
> ---
> 
> Thanks,
> Dwaipayan.
> 


More information about the Linux-kernel-mentees mailing list