[Linux-kernel-mentees] checkpatch: improving comment parsing in email

Dwaipayan Ray dwaipayanray1 at gmail.com
Sun Oct 25 07:25:37 UTC 2020


On Wed, Oct 21, 2020 at 2:26 PM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
>
>
>
> On Wed, 21 Oct 2020, Dwaipayan Ray wrote:
>
> > On Wed, Oct 21, 2020 at 11:01 AM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
> > >
> > >
> > >
> > > On Tue, 20 Oct 2020, Dwaipayan Ray wrote:
> > >
> > > > Hi,
> > > > checkpatch seems to have problems parsing some types of
> > > > email comments. Some examples I could find were of the type:
> > > >
> > > > 1) address at example.com (Comment)
> > > > 2) "Name (Comment) " address at example.com
> > > > 3) address at example.com #comment
> > > >
> > > > These comments aren't processed currently which causes
> > > > false BAD_SIGN_OFF warnings.
> > > >
> > > > Examples are:
> > > >
> > > > WARNING:BAD_SIGN_OFF: email address 'David.Laight at aculab.com
> > > > (big endian system concerns)' might be better as 'David.Laight at aculab.com
> > > > (big endian system concerns)'
> > > >
> > > > WARNING:BAD_SIGN_OFF: email address 'stable at vger.kernel.org #4.20+'
> > > > might be better as 'stable at vger.kernel.org#4.20+'
> > > >
> > > > The earlier warning is very frequent in the kernel.
> > > >
> > > > I did send a patch solving (1 and 3),
> > > > https://lore.kernel.org/linux-kernel-mentees/a15a6cc0ddea068d78113f5e315eaba6f52b917a.camel@perches.com/
> > > >
> > > > Joe points out that comments and multiple comments can
> > > > exist at any part of email. (perhaps RFC 5322 Appendix A.5). So
> > > > that patch didn't solve the problem at the very root.
> > > >
> > > > What do you recommend be done?
> > > >
> > >
> > > I cannot say much of BIG value and insight.
> > >
> > > As of now, I think these cases should be handled correctly and the check
> > > for: is it the same name or is it the same email should work properly no
> > > matter where and which kind of comment is used (as described by RFC 5322
> > > as you pointed out).
> > >
> > > I might change my opinion when:
> > >
> > > 1. We have an evaluation on how many cases of BAD_SIGN_OFF (and related
> > > types) do we currently still observed among 100,000 commits and how many
> > > due to not handling comments properly?
> > >
> > > 2. We have a rough idea how complex this whole checking function will get.
> > > At the moment, I cannot say if it "just 10 lines of code addition" (even
> > > if they might be quite intrinsic to get right) or if we need hundreds of
> > > lines with thousand of special cases etc. (which I do not expect, but who
> > > knows which complexity might be involved, once we go into the details).
> > >
> > > So, I suggest to evaluate and prototype and then we can better judge if it
> > > is worth really going forward, discussing it with Joeand adding it to mainline.
> > >
> > >
> > > What do you think, Dwaipayan?
> > >
> >
> > Hi,
> > I completely agree with you. We could run the evaluation first and
> > see what kinds of comment patterns are used. We could also keep
> > the RFC 5322 guidelines as a base and modify the code to recognize
> > the comment blocks as per the specifications.
> >
> > It could get fairly complex, because comment blocks are allowed almost
> > anywhere. So maybe we should just solve the most
> > common ones first. That would help the committers largely.
> >
> > I think Joe wanted a complete solution for it. So I guess i will try
> > to cover as many cases which doesn't increase the code complexity
> > much.
> >
> > I will start with the evaluation. If others can help out too it will be great.
> > After it, we can determine what cases need to be handled and what
> > can be ignored.
> >
>
> Go for it and share your results with the group once you have something
> which others can review and help to evaluate.
>
> The requirement for future applicants to be selected is to show that they
> can help you with their work on evaluations and their explanations.
>
> Lukas

Hi,
Sorry I took a bit of time for this. It was awfully slow and I had to cancel
it several times.

Nevertheless, here is my report.
I ran checkpatch over 50,000 commits from v5.4.

There were:
1516 BAD_SIGN_OFF warnings.

out of which:
798 BAD_SIGN_OFF: email address <email1> might be better as <email2>
warnings.

Since this was the potential error point, I dug deeper, and there were
293 BAD_SIGN_OFF: email address '<stable at vger.kernel.org> # .*' might
be better as warnings.

There are 68 other potential warnings, some of which are are of type:

email address '"Kai Mäkisara (Kolumbus)" <kai.makisara at kolumbus.fi>' might be
 better as '"Kai Mäkisara"(Kolumbus) <kai.makisara at kolumbus.fi>'

The name comment wasn't parsed properly due to the quotes handling.

Recursive parantheses types like (john (dean)) were not found.

So I think for now, taking care of the correct name comment parsing
for quoted names and proper comment parsing after the email address
should fix most of the BAD_SIGN_OFF false positives.

What do you think?

Thanks,
Dwaipayan.


More information about the Linux-kernel-mentees mailing list