[Linux-kernel-mentees] [PATCH RFC] checkpatch: improve handling of email comments

Lukas Bulwahn lukas.bulwahn at gmail.com
Wed Oct 28 15:59:12 UTC 2020



On Wed, 28 Oct 2020, Dwaipayan Ray wrote:

> On Wed, Oct 28, 2020 at 8:55 PM Dwaipayan Ray <dwaipayanray1 at gmail.com> wrote:
> >
> > checkpatch has limited support for parsing email comments. It only
> > support single name comments or single after address comments.
> > Whereas, RFC 5322 specifies that comments can be inserted in
> > between any tokens of the email fields.
> >
> > On analyzing 50,000 commits from v5.4 it was found that there were
> > about 370 false positives resulting from wrong parsing of comments.
> >
> > Improve comment parsing mechanism in checkpatch.
> >
> > What is handled now:
> >
> > - Multiple name/address comments
> > - Comments anywhere in between name/address
> > - Multi level comments like (John (Doe) )
> >
> > Signed-off-by: Dwaipayan Ray <dwaipayanray1 at gmail.com>
> > ---
> >  scripts/checkpatch.pl | 19 ++++++++++++++-----
> >  1 file changed, 14 insertions(+), 5 deletions(-)
> >
> > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > index fab38b493cef..ae8436385fc1 100755
> > --- a/scripts/checkpatch.pl
> > +++ b/scripts/checkpatch.pl
> > @@ -1183,14 +1183,20 @@ sub parse_email {
> >                 }
> >         }
> >
> > -       $comment = trim($comment);
> > +       # Comments in between name like John(A nice chap) Doe
> > +       while ($name =~ s/\s*($balanced_parens)\s*/ /) {
> > +               $name_comment .= trim($1);
> > +       }
> >         $name = trim($name);
> >         $name =~ s/^\"|\"$//g;
> > -       if ($name =~ s/(\s*\([^\)]+\))\s*//) {
> > -               $name_comment = trim($1);
> > +
> > +       # Comments in between address like <john(his account)@doe.com>
> > +       while ($address =~ s/\s*($balanced_parens)\s*//) {
> > +               $comment .= trim($1);
> >         }
> >         $address = trim($address);
> >         $address =~ s/^\<|\>$//g;
> > +       $comment = trim($comment);
> >
> >         if ($name =~ /[^\w \-]/i) { ##has "must quote" chars
> >                 $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
> > @@ -1205,8 +1211,6 @@ sub format_email {
> >
> >         my $formatted_email;
> >
> > -       $name_comment = trim($name_comment);
> > -       $comment = trim($comment);
> >         $name = trim($name);
> >         $name =~ s/^\"|\"$//g;
> >         $address = trim($address);
> > @@ -1216,6 +1220,11 @@ sub format_email {
> >                 $name = "\"$name\"";
> >         }
> >
> > +       $name_comment = trim($name_comment);
> > +       $name_comment =~ s/(.+)/ $1/;
> > +       $comment = trim($comment);
> > +       $comment =~ s/(.+)/ $1/;
> > +
> >         if ("$name" eq "") {
> >                 $formatted_email = "$address";
> >         } else {
> > --
> > 2.27.0
> >
> 
> Hi,
> This patch is a follow up to our discussion earlier about improving the
> comment handling at:
> https://lore.kernel.org/linux-kernel-mentees/alpine.DEB.2.21.2010251022060.25172@felia/
> 
> I have only tested it on a few patterns.  But I will run an
> evaluation again on those 50k commits to see what changes.
> 
> What do you think of this?
>

That was exactly the comment I wanted to make.

Commit message is clear; code addition also looks pretty clear.

We are just missing the evaluation to make sure how much we improve and 
that we did not add some stupid bug on the way.

Aditya, can you help us here with evaluation?

E.g., Dwaipayan takes from v5.4 into the future, e.g., until v5.8/v5.9 or 
so.

Aditya, you could then check e.g., v5.0..v5.4, so v5.4 down to the past as 
far as your computer and scripts handles within roughly half a day...

An evaluation on 100,000 commits is certainly a good basis.

Of course, every mentorship candidate can join here as well and show off 
their own evaluation script.

Lukas


More information about the Linux-kernel-mentees mailing list