[Linux-kernel-mentees] [PATCH RFC] checkpatch: improve handling of email comments

Dwaipayan Ray dwaipayanray1 at gmail.com
Wed Oct 28 15:28:12 UTC 2020


On Wed, Oct 28, 2020 at 8:55 PM Dwaipayan Ray <dwaipayanray1 at gmail.com> wrote:
>
> checkpatch has limited support for parsing email comments. It only
> support single name comments or single after address comments.
> Whereas, RFC 5322 specifies that comments can be inserted in
> between any tokens of the email fields.
>
> On analyzing 50,000 commits from v5.4 it was found that there were
> about 370 false positives resulting from wrong parsing of comments.
>
> Improve comment parsing mechanism in checkpatch.
>
> What is handled now:
>
> - Multiple name/address comments
> - Comments anywhere in between name/address
> - Multi level comments like (John (Doe) )
>
> Signed-off-by: Dwaipayan Ray <dwaipayanray1 at gmail.com>
> ---
>  scripts/checkpatch.pl | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
>
> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> index fab38b493cef..ae8436385fc1 100755
> --- a/scripts/checkpatch.pl
> +++ b/scripts/checkpatch.pl
> @@ -1183,14 +1183,20 @@ sub parse_email {
>                 }
>         }
>
> -       $comment = trim($comment);
> +       # Comments in between name like John(A nice chap) Doe
> +       while ($name =~ s/\s*($balanced_parens)\s*/ /) {
> +               $name_comment .= trim($1);
> +       }
>         $name = trim($name);
>         $name =~ s/^\"|\"$//g;
> -       if ($name =~ s/(\s*\([^\)]+\))\s*//) {
> -               $name_comment = trim($1);
> +
> +       # Comments in between address like <john(his account)@doe.com>
> +       while ($address =~ s/\s*($balanced_parens)\s*//) {
> +               $comment .= trim($1);
>         }
>         $address = trim($address);
>         $address =~ s/^\<|\>$//g;
> +       $comment = trim($comment);
>
>         if ($name =~ /[^\w \-]/i) { ##has "must quote" chars
>                 $name =~ s/(?<!\\)"/\\"/g; ##escape quotes
> @@ -1205,8 +1211,6 @@ sub format_email {
>
>         my $formatted_email;
>
> -       $name_comment = trim($name_comment);
> -       $comment = trim($comment);
>         $name = trim($name);
>         $name =~ s/^\"|\"$//g;
>         $address = trim($address);
> @@ -1216,6 +1220,11 @@ sub format_email {
>                 $name = "\"$name\"";
>         }
>
> +       $name_comment = trim($name_comment);
> +       $name_comment =~ s/(.+)/ $1/;
> +       $comment = trim($comment);
> +       $comment =~ s/(.+)/ $1/;
> +
>         if ("$name" eq "") {
>                 $formatted_email = "$address";
>         } else {
> --
> 2.27.0
>

Hi,
This patch is a follow up to our discussion earlier about improving the
comment handling at:
https://lore.kernel.org/linux-kernel-mentees/alpine.DEB.2.21.2010251022060.25172@felia/

I have only tested it on a few patterns.  But I will run an
evaluation again on those 50k commits to see what changes.

What do you think of this?

Thanks,
Dwaipayan.


More information about the Linux-kernel-mentees mailing list