[Linux-kernel-mentees] checkpatch: improving comment parsing in email

Lukas Bulwahn lukas.bulwahn at gmail.com
Sun Oct 25 09:26:53 UTC 2020



On Sun, 25 Oct 2020, Dwaipayan Ray wrote:

> On Wed, Oct 21, 2020 at 2:26 PM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
> >
> >
> >
> > On Wed, 21 Oct 2020, Dwaipayan Ray wrote:
> >
> > > On Wed, Oct 21, 2020 at 11:01 AM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
> > > >
> > > >
> > > >
> > > > On Tue, 20 Oct 2020, Dwaipayan Ray wrote:
> > > >
> > > > > Hi,
> > > > > checkpatch seems to have problems parsing some types of
> > > > > email comments. Some examples I could find were of the type:
> > > > >
> > > > > 1) address at example.com (Comment)
> > > > > 2) "Name (Comment) " address at example.com
> > > > > 3) address at example.com #comment
> > > > >
> > > > > These comments aren't processed currently which causes
> > > > > false BAD_SIGN_OFF warnings.
> > > > >
> > > > > Examples are:
> > > > >
> > > > > WARNING:BAD_SIGN_OFF: email address 'David.Laight at aculab.com
> > > > > (big endian system concerns)' might be better as 'David.Laight at aculab.com
> > > > > (big endian system concerns)'
> > > > >
> > > > > WARNING:BAD_SIGN_OFF: email address 'stable at vger.kernel.org #4.20+'
> > > > > might be better as 'stable at vger.kernel.org#4.20+'
> > > > >
> > > > > The earlier warning is very frequent in the kernel.
> > > > >
> > > > > I did send a patch solving (1 and 3),
> > > > > https://lore.kernel.org/linux-kernel-mentees/a15a6cc0ddea068d78113f5e315eaba6f52b917a.camel@perches.com/
> > > > >
> > > > > Joe points out that comments and multiple comments can
> > > > > exist at any part of email. (perhaps RFC 5322 Appendix A.5). So
> > > > > that patch didn't solve the problem at the very root.
> > > > >
> > > > > What do you recommend be done?
> > > > >
> > > >
> > > > I cannot say much of BIG value and insight.
> > > >
> > > > As of now, I think these cases should be handled correctly and the check
> > > > for: is it the same name or is it the same email should work properly no
> > > > matter where and which kind of comment is used (as described by RFC 5322
> > > > as you pointed out).
> > > >
> > > > I might change my opinion when:
> > > >
> > > > 1. We have an evaluation on how many cases of BAD_SIGN_OFF (and related
> > > > types) do we currently still observed among 100,000 commits and how many
> > > > due to not handling comments properly?
> > > >
> > > > 2. We have a rough idea how complex this whole checking function will get.
> > > > At the moment, I cannot say if it "just 10 lines of code addition" (even
> > > > if they might be quite intrinsic to get right) or if we need hundreds of
> > > > lines with thousand of special cases etc. (which I do not expect, but who
> > > > knows which complexity might be involved, once we go into the details).
> > > >
> > > > So, I suggest to evaluate and prototype and then we can better judge if it
> > > > is worth really going forward, discussing it with Joeand adding it to mainline.
> > > >
> > > >
> > > > What do you think, Dwaipayan?
> > > >
> > >
> > > Hi,
> > > I completely agree with you. We could run the evaluation first and
> > > see what kinds of comment patterns are used. We could also keep
> > > the RFC 5322 guidelines as a base and modify the code to recognize
> > > the comment blocks as per the specifications.
> > >
> > > It could get fairly complex, because comment blocks are allowed almost
> > > anywhere. So maybe we should just solve the most
> > > common ones first. That would help the committers largely.
> > >
> > > I think Joe wanted a complete solution for it. So I guess i will try
> > > to cover as many cases which doesn't increase the code complexity
> > > much.
> > >
> > > I will start with the evaluation. If others can help out too it will be great.
> > > After it, we can determine what cases need to be handled and what
> > > can be ignored.
> > >
> >
> > Go for it and share your results with the group once you have something
> > which others can review and help to evaluate.
> >
> > The requirement for future applicants to be selected is to show that they
> > can help you with their work on evaluations and their explanations.
> >
> > Lukas
> 
> Hi,
> Sorry I took a bit of time for this. It was awfully slow and I had to cancel
> it several times.
> 
> Nevertheless, here is my report.
> I ran checkpatch over 50,000 commits from v5.4.
> 
> There were:
> 1516 BAD_SIGN_OFF warnings.
> 
> out of which:
> 798 BAD_SIGN_OFF: email address <email1> might be better as <email2>
> warnings.
> 
> Since this was the potential error point, I dug deeper, and there were
> 293 BAD_SIGN_OFF: email address '<stable at vger.kernel.org> # .*' might
> be better as warnings.
>

Yeah, that is a default pattern for stable; that should be handled.
 
> There are 68 other potential warnings, some of which are are of type:
> 
> email address '"Kai Mäkisara (Kolumbus)" <kai.makisara at kolumbus.fi>' might be
>  better as '"Kai Mäkisara"(Kolumbus) <kai.makisara at kolumbus.fi>'
> 
> The name comment wasn't parsed properly due to the quotes handling.
>

Maybe you can address that as well.
 
> Recursive parantheses types like (john (dean)) were not found.
> 
> So I think for now, taking care of the correct name comment parsing
> for quoted names and proper comment parsing after the email address
> should fix most of the BAD_SIGN_OFF false positives.
> 
> What do you think?
>

I think both are worth the effort to add proper support to the script.

Once you have those patches, I can run another evaluation as well.

Lukas


More information about the Linux-kernel-mentees mailing list