[Linux-kernel-mentees] [PATCH v2] checkpatch: improve email parsing

Lukas Bulwahn lukas.bulwahn at gmail.com
Tue Nov 3 08:10:54 UTC 2020


On Tue, Nov 3, 2020 at 8:28 AM Joe Perches <joe at perches.com> wrote:
>
> On Tue, 2020-11-03 at 11:28 +0530, Dwaipayan Ray wrote:
> > On Tue, Nov 3, 2020 at 11:18 AM Dwaipayan Ray <dwaipayanray1 at gmail.com> wrote:
> > >
> > > checkpatch doesn't report warnings for many common mistakes
> > > in emails. Some of which are trailing commas and incorrect
> > > use of email comments.
> > >
> > > At the same time several false positives are reported due to
> > > incorrect handling of mail comments. The most common of which
> > > is due to the pattern:
> > >
> > > <stable at vger.kernel.org> # X.X
> > >
> > > Improve email parsing mechanism in checkpatch.
> > >
> > > What is added:
> > >
> > > - Support for multiple name/address comments.
> > > - Improved handling of quoted names.
> > > - Sanitize improperly formatted comments.
> > > - Sanitize trailing semicolon or dot after email.
> []
> > What do you think? Should warnings for the names which should
> > be quoted be reported considering this result?
>
> Clearly the quote suggestion is unnecessary.
>
> I think that "cc: stable@(?:vger\.)?kernel\.org" should be
> treated differently from other forms of invalid/odd address lines.
>
> My suggestion is that the case insensitive form of
>
> Cc: stable at vger.kernel.org
>
> or only another similar case insensitive forms with a
> # comment separator like
>
> Cc: <stable at vger.kernel.org> # some comment
>
> be acceptable for stable.
>
> All other forms with stable@ should emit some message.
>

I agree that handling stable at vger.kernel.org should be a special case.

We can even ask Greg KH and Sasha if they have certain preferences for
the format of this meta information after the #, so that their scripts
could pick this up.

> And other <foo>-by: and cc: addresses should only have a form like
>
> Signed-off-by: "Full.Name" (possible comment) <email at domain.tld>
> or
> Signed-off-by: Full Name (possible comment) <email at domain.tld>
>
> etc..
>
> and any additional content after .tld in the email address be flagged
> with some message like "unexpected content after email address" rather
> than "might be better as".
>

I agree with refining the error message here. Also, Aditya, Dwaipayan,
here we can probably have some suitable fix methods, e.g., detect
where the parsing fails (a missing ">" or a space where the should not
be one, or a just few characters at the end, or a long list of email
addresses which should be split etc.

Maybe you can coordinate among each other who would want to create
suitable fix rules here?

Also, start with the class of the most frequent mistakes for
unexpected content after email addresses.

I imagine that a maintainer can simply run a tag sanitizing script
which just cleans up those stupid mistakes before creating their git
trees or sending git pulls to Linus. Let us try to add these
sanitizing rules to checkpatch.pl with fix options for now; if that
sanitizing feature becomes a monster script of its own within
checkpatch.pl, we can refactor that into an independent script for
cleaning up.

Lukas


More information about the Linux-kernel-mentees mailing list