[Linux-kernel-mentees] checkpatch.pl investigation: NO_AUTHOR_SIGN_OFF issues

Lukas Bulwahn lukas.bulwahn at gmail.com
Fri Sep 18 10:44:14 UTC 2020



On Fri, 18 Sep 2020, Dwaipayan Ray wrote:

> On Fri, Sep 18, 2020 at 3:36 PM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
> >
> > Hi Dwaipayan, hi others,
> >
> > I had a quick look on the NO AUTHOR SIGN OFF issues reported by
> > checkpatch.pl on v5.4..v5.8 on my own small script.
> >
> > After collecting all the data in a tsv (no details on that tsv here):
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | wc -l
> > 1064
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv  | cut -f 7 | sort  | uniq -c |
> > sort -nr | head -n 8
> >     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter <daniel.vetter at ffwll.ch>'
> >     116 Missing Signed-off-by: line by nominal patch author ''
> >      68 Missing Signed-off-by: line by nominal patch author 'Trond Myklebust <trondmy at gmail.com>'
> >      43 Missing Signed-off-by: line by nominal patch author 'Thinh Nguyen <Thinh.Nguyen at synopsys.com>'
> >      40 Missing Signed-off-by: line by nominal patch author 'Pascal van Leeuwen <pascalvanl at gmail.com>'
> >      36 Missing Signed-off-by: line by nominal patch author 'Alex Maftei <amaftei at solarflare.com>'
> >      31 Missing Signed-off-by: line by nominal patch author 'Valdis Kletnieks <valdis.kletnieks at vt.edu>'
> >      24 Missing Signed-off-by: line by nominal patch author 'Luke Nelson <lukenels at cs.washington.edu>'
> >
> >
> > Here a quick look at the first two:
> >
> >     175 Missing Signed-off-by: line by nominal patch author 'Daniel Vetter
> > <daniel.vetter at ffwll.ch>'
> >
> > $ grep "NO_AUTHOR_SIGN_OFF" v5.4..v5.8.tsv | grep "Daniel Vetter" \
> >   | cut -f 1 > commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter
> > $ cat commits-NO_AUTHOR_SIGN_OFF-Daniel-Vetter | \
> >   xargs git show  --format="%b" -s | \
> >   grep "Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>" | \
> >   wc -l
> >
> > So all 175 commits are of the type:
> >
> > Author: Daniel Vetter <daniel.vetter at ffwll.ch>
> > ...
> > Signed-off-by: Daniel Vetter <daniel.vetter at intel.com>
> >
> >
> > and the second:
> >
> >     116 Missing Signed-off-by: line by nominal patch author ''
> >
> > That is probably due to not parsing the patch author with a line break.
> >
> >
> > So, if we can find a solution for Daniel Vetter (of course, not
> > hard-coding it in checkpatch.pl), e.g., by adding a .mailmap entry for him
> > and making use of that in checkpatch.pl,
> >
> > ... and for the encoding problem, then we got around 27% of the
> > NO_AUTHOR_SIGN_OFF 'problems' solved. Next, we can continue to look at the
> > next few remaining ones. The longer tail of warnings are clearly warnings
> > that deserve to be pointed out to newbies with a broken setup.
> >
> > I hope you can continue to work on a solution for this class.
> >
> >
> > Thanks for your initial investigation,
> >
> > Lukas
> 
> Hi,
> I think I have figured out how to fix the authors with a line break.
> The checkpatch script keeps a track of the previous line in a variable
> $prevline. I can use it to fix the author's initial signature. I will
> send you a patch on this when it's done.
>

Okay, let us start to have this first patch correct and accepted by Joe 
Perches.
 
> And for Daniel Vetter's issue, it will be a bit more complex i think
> as checkpatch
> seems to check both author's name and author's email by parsing two
> strings.
>

When the first patch is accepted, let us discuss with Joe the best 
solution for this case.

> At revision 10b82d5176488acee2820e5a2cf0f2ec5c3488b6,
> scripts/checkpatch.pl,
> 
> 
> line 2673:
> if ($author ne '') {
> if (same_email_addresses($1, $author)) {
> $authorsignoff = 1;
> }
> }
> 
> line 1213:
> sub same_email_addresses {
> my ($email1, $email2) = @_;
> 
> my ($email1_name, $name1_comment, $email1_address, $comment1)
>  = parse_email($email1);
> my ($email2_name, $name2_comment, $email2_address, $comment2)
>  = parse_email($email2);
> 
> return $email1_name eq $email2_name &&
> $email1_address eq $email2_address;
> }
> 
> The easiest way at this point will be to modify same_email_address
> subroutine and add checks for other email addresses belonging to
> the same author by loading .mailmap.
> But will this make the subroutine heavy?
>

Yes, it is becoming heavier... but maybe still as performant as before if 
done right.

You could always run a quick fast path check and only if you cannot find 
the sign-off, you do a slow path, loading .mailmap and checking.

By the way, there is already the function read_mailmap in 
./scripts/get_maintainers.pl, so we would not need to implement that, just 
find a good way for both scripts to share this implementation. 

But let us do the first patch and then discuss with Joe.

Lukas


More information about the Linux-kernel-mentees mailing list