[Linux-kernel-mentees] Fix for BAD_SIGN_OFF: non-standard signature

Lukas Bulwahn lukas.bulwahn at gmail.com
Tue Nov 17 17:42:07 UTC 2020


On Tue, Nov 17, 2020 at 7:03 PM Aditya <yashsri421 at gmail.com> wrote:
>
> On 13/11/20 11:55 pm, Aditya wrote:
> > On 13/11/20 8:56 pm, Lukas Bulwahn wrote:
> >> On Fri, Nov 13, 2020 at 4:00 PM Aditya <yashsri421 at gmail.com> wrote:
> >>>
> >>> On 13/11/20 8:05 pm, Aditya wrote:
> >>>> On 12/11/20 1:34 am, Lukas Bulwahn wrote:
> >>>>> On Wed, Nov 11, 2020 at 3:13 PM Aditya <yashsri421 at gmail.com> wrote:
> >>>>>>
> >>>>>> Hi Sir
> >>>>>> I have analyzed the checkpatch report for BAD_SIGN_OFF(over
> >>>>>> v4.13..v5.8) for non-standard signature and generated reports for it.
> >>>>>> Some mistakes are more frequent than others, whereas some mistakes
> >>>>>> even have a frequency of 1.
> >>>>>>
> >>>>>> Non-standard signatures occurring with their frequency:
> >>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
> >>>>>>
> >>>>>> Complete warning messages:
> >>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/warn_msgs.txt
> >>>>>>
> >>>>>> Should I implement the fix similar to TYPO_FIX, where we have a
> >>>>>> separate file for common misspellings and corrected words? Or should I
> >>>>>> make a hash of these misspellings in checkpatch.pl file as well?
> >>>>>>
> >>>>>> Also should I include all these misspelled words in it? Or omit words
> >>>>>> below certain frequency?
> >>>>>>
> >>>>>
> >>>>> I think the best way would be to compute some kind of edit distance to
> >>>>> the known signature tags and if this edit distance is below a certain
> >>>>> threshold, suggest that signature tag as the fix. We can then evaluate
> >>>>> to determine the best suitable threshold. The edit distance between
> >>>>> the different tags are so large that this should always work as
> >>>>> intended.
> >>>>>
> >>>>> Then, we can look into these other creative tags and propose suitable
> >>>>> existing tags for the more frequent ones that are non-standard. Or in
> >>>>> the case, none of the existing ones fit we can start the discussion on
> >>>>> proposing some new standard ones.
> >>>>>
> >>>>
> >>>> I have generated a list of non-standard signatures and their fixes on
> >>>> the basis of edit distance.
> >>>>
> >>>> This is the common list of non standard signatures and fixes (in
> >>>> detail):
> >>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/min_dists.txt
> >>>>
> >>>> As I observed, I think, we can consider '<=2' as the threshold edit
> >>>> distance.
> >>>> List for non-standard signature and their proposed fix with edit
> >>>> distance<=2 :
> >>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than_3.txt
> >>>>
> >>>> I have also generated lists for 3 and 4 edit distance separately for
> >>>> reference:
> >>>> Equal to 3:
> >>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_3.txt
> >>>>
> >>>> Equal to 4:
> >>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_4.txt
> >>>>
> >>>> For the rest I guess we'll need to hard code eg. for 'Debugged-by',
> >>>> 'Requested-by' etc.
> >>>>
> >>>> These are the complete lists of non-standard signatures:
> >>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
> >>>>
> >>
> >> Can you share which non-standard-signatures would be
> >> handled/transformed with edit distance 2 and which would not in a
> >> similar format to non_standard_signs.txt (so, ordered by frequency).
> >>
> >> We can then consider those that remain and find a good next strategy
> >> for the most frequent non-standard signatures.
> >>
> >
> > Non standard signatures handled with edit distance 2:
> > https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
> >
> > Non standard signatures with edit distance greater than 2:
> > https://github.com/AdityaSrivast/kernel-tasks/tree/master/random/non_standard_signature/more_than2
> >
>
> I think this mail probably got missed. I'll summarize it a bit for
> simplicity:
> With edit distance approach and threshold as 2, we're able to handle
> 39 out of 109 'distinct' cases of non-standard signature. In this 39,
> the maximum count of non-standard signature is 19 for 'Reviwed-by:'; 9
> for 'Reviewd-by:' and other common mispellings.
> Complete List:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
>
> However, still we are unable to account for 70 non-standard signatures
> which occur more frequently (eg 'Debugged-by:', which has occurred 61
> times; 'Requested-by:', 48 times; and so on).
> Complete list:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/more_than2/signs_freq.txt
>
> I think for these cases we'd need to make some file (as is used for
> TYPO_SPELLING), or hash.
> What do you think/suggest?
>

Yes, I agree.

Goal 1: Try to map all the non-default signatures to their "standard"
counterpart as much as possible.

Goal 2: Introduce a few very little signatures to handle those cases
that really cannot be mapped to a non-default signature.

Provide good rationales that you can defend and provide documentation
for when checkpatch shall explain the fix it proposes.

Here an example for the first ten cases:

1)Debugged-by: 61 -> Codeveloped-by:

Rationale: Debugging is part of Software Development; so
Codeveloped-by is perfectly fine, even if the contributor did not
create code.

(alternatively: maybe a new Assisted-by would do here.)

2)Requested-by: 48 -> Suggested-by:

Rationale: In an open-source project, there are "no requests", just
"suggestions" to convince a maintainer to accept your patch.

3)Co-authored-by: 43 -> Codeveloped-by:

Rationale: clear. Codeveloped-by and Co-authored-by are synonyms.

4)Originally-by: 39

Maybe something like this deserves to be a new tag. There is a
significant difference to codeveloped-by. But that needs discussion.

5)Analyzed-by: 22

Rationale: Analyzing is part of Software Development; so
Codeveloped-by is perfectly fine, even if the contributor did not
create code.
(alternatively: maybe a new Assisted-by would do here.)

6)Bisected-by: 20

Difficult...
(maybe a new Assisted-by would do here.)

7)Improvements-by: 19 -> Codeveloped-by:

8)Generated-by: 17 -> Reported-by: ?

What does generated-by actually mean?

9)Noticed-by: 11 -> Reported-by:

10)Inspired-by: 11 -> Suggested-by:

Maybe you can come up with a list for the next twenty and then we
discuss them with Joe Perches and then a larger group?

Lukas

> Thanks
> Aditya
>
> > Thanks
> > Aditya
> >
>


More information about the Linux-kernel-mentees mailing list