[Linux-kernel-mentees] checkpatch: improving comment parsing in email

Dwaipayan Ray dwaipayanray1 at gmail.com
Wed Oct 21 08:32:47 UTC 2020


On Wed, Oct 21, 2020 at 11:01 AM Lukas Bulwahn <lukas.bulwahn at gmail.com> wrote:
>
>
>
> On Tue, 20 Oct 2020, Dwaipayan Ray wrote:
>
> > Hi,
> > checkpatch seems to have problems parsing some types of
> > email comments. Some examples I could find were of the type:
> >
> > 1) address at example.com (Comment)
> > 2) "Name (Comment) " address at example.com
> > 3) address at example.com #comment
> >
> > These comments aren't processed currently which causes
> > false BAD_SIGN_OFF warnings.
> >
> > Examples are:
> >
> > WARNING:BAD_SIGN_OFF: email address 'David.Laight at aculab.com
> > (big endian system concerns)' might be better as 'David.Laight at aculab.com
> > (big endian system concerns)'
> >
> > WARNING:BAD_SIGN_OFF: email address 'stable at vger.kernel.org #4.20+'
> > might be better as 'stable at vger.kernel.org#4.20+'
> >
> > The earlier warning is very frequent in the kernel.
> >
> > I did send a patch solving (1 and 3),
> > https://lore.kernel.org/linux-kernel-mentees/a15a6cc0ddea068d78113f5e315eaba6f52b917a.camel@perches.com/
> >
> > Joe points out that comments and multiple comments can
> > exist at any part of email. (perhaps RFC 5322 Appendix A.5). So
> > that patch didn't solve the problem at the very root.
> >
> > What do you recommend be done?
> >
>
> I cannot say much of BIG value and insight.
>
> As of now, I think these cases should be handled correctly and the check
> for: is it the same name or is it the same email should work properly no
> matter where and which kind of comment is used (as described by RFC 5322
> as you pointed out).
>
> I might change my opinion when:
>
> 1. We have an evaluation on how many cases of BAD_SIGN_OFF (and related
> types) do we currently still observed among 100,000 commits and how many
> due to not handling comments properly?
>
> 2. We have a rough idea how complex this whole checking function will get.
> At the moment, I cannot say if it "just 10 lines of code addition" (even
> if they might be quite intrinsic to get right) or if we need hundreds of
> lines with thousand of special cases etc. (which I do not expect, but who
> knows which complexity might be involved, once we go into the details).
>
> So, I suggest to evaluate and prototype and then we can better judge if it
> is worth really going forward, discussing it with Joeand adding it to mainline.
>
>
> What do you think, Dwaipayan?
>

Hi,
I completely agree with you. We could run the evaluation first and
see what kinds of comment patterns are used. We could also keep
the RFC 5322 guidelines as a base and modify the code to recognize
the comment blocks as per the specifications.

It could get fairly complex, because comment blocks are allowed almost
anywhere. So maybe we should just solve the most
common ones first. That would help the committers largely.

I think Joe wanted a complete solution for it. So I guess i will try
to cover as many cases which doesn't increase the code complexity
much.

I will start with the evaluation. If others can help out too it will be great.
After it, we can determine what cases need to be handled and what
can be ignored.

Thanks,
Dwaipayan.


More information about the Linux-kernel-mentees mailing list