[Linux-kernel-mentees] Fix for BAD_SIGN_OFF: non-standard signature

Aditya yashsri421 at gmail.com
Tue Nov 17 20:54:32 UTC 2020


On 17/11/20 11:12 pm, Lukas Bulwahn wrote:
> On Tue, Nov 17, 2020 at 7:03 PM Aditya <yashsri421 at gmail.com> wrote:
>>
>> On 13/11/20 11:55 pm, Aditya wrote:
>>> On 13/11/20 8:56 pm, Lukas Bulwahn wrote:
>>>> On Fri, Nov 13, 2020 at 4:00 PM Aditya <yashsri421 at gmail.com> wrote:
>>>>>
>>>>> On 13/11/20 8:05 pm, Aditya wrote:
>>>>>> On 12/11/20 1:34 am, Lukas Bulwahn wrote:
>>>>>>> On Wed, Nov 11, 2020 at 3:13 PM Aditya <yashsri421 at gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Sir
>>>>>>>> I have analyzed the checkpatch report for BAD_SIGN_OFF(over
>>>>>>>> v4.13..v5.8) for non-standard signature and generated reports for it.
>>>>>>>> Some mistakes are more frequent than others, whereas some mistakes
>>>>>>>> even have a frequency of 1.
>>>>>>>>
>>>>>>>> Non-standard signatures occurring with their frequency:
>>>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>>>>>
>>>>>>>> Complete warning messages:
>>>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/warn_msgs.txt
>>>>>>>>
>>>>>>>> Should I implement the fix similar to TYPO_FIX, where we have a
>>>>>>>> separate file for common misspellings and corrected words? Or should I
>>>>>>>> make a hash of these misspellings in checkpatch.pl file as well?
>>>>>>>>
>>>>>>>> Also should I include all these misspelled words in it? Or omit words
>>>>>>>> below certain frequency?
>>>>>>>>
>>>>>>>
>>>>>>> I think the best way would be to compute some kind of edit distance to
>>>>>>> the known signature tags and if this edit distance is below a certain
>>>>>>> threshold, suggest that signature tag as the fix. We can then evaluate
>>>>>>> to determine the best suitable threshold. The edit distance between
>>>>>>> the different tags are so large that this should always work as
>>>>>>> intended.
>>>>>>>
>>>>>>> Then, we can look into these other creative tags and propose suitable
>>>>>>> existing tags for the more frequent ones that are non-standard. Or in
>>>>>>> the case, none of the existing ones fit we can start the discussion on
>>>>>>> proposing some new standard ones.
>>>>>>>
>>>>>>
>>>>>> I have generated a list of non-standard signatures and their fixes on
>>>>>> the basis of edit distance.
>>>>>>
>>>>>> This is the common list of non standard signatures and fixes (in
>>>>>> detail):
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/min_dists.txt
>>>>>>
>>>>>> As I observed, I think, we can consider '<=2' as the threshold edit
>>>>>> distance.
>>>>>> List for non-standard signature and their proposed fix with edit
>>>>>> distance<=2 :
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than_3.txt
>>>>>>
>>>>>> I have also generated lists for 3 and 4 edit distance separately for
>>>>>> reference:
>>>>>> Equal to 3:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_3.txt
>>>>>>
>>>>>> Equal to 4:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_4.txt
>>>>>>
>>>>>> For the rest I guess we'll need to hard code eg. for 'Debugged-by',
>>>>>> 'Requested-by' etc.
>>>>>>
>>>>>> These are the complete lists of non-standard signatures:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>>>
>>>>
>>>> Can you share which non-standard-signatures would be
>>>> handled/transformed with edit distance 2 and which would not in a
>>>> similar format to non_standard_signs.txt (so, ordered by frequency).
>>>>
>>>> We can then consider those that remain and find a good next strategy
>>>> for the most frequent non-standard signatures.
>>>>
>>>
>>> Non standard signatures handled with edit distance 2:
>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
>>>
>>> Non standard signatures with edit distance greater than 2:
>>> https://github.com/AdityaSrivast/kernel-tasks/tree/master/random/non_standard_signature/more_than2
>>>
>>
>> I think this mail probably got missed. I'll summarize it a bit for
>> simplicity:
>> With edit distance approach and threshold as 2, we're able to handle
>> 39 out of 109 'distinct' cases of non-standard signature. In this 39,
>> the maximum count of non-standard signature is 19 for 'Reviwed-by:'; 9
>> for 'Reviewd-by:' and other common mispellings.
>> Complete List:
>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
>>
>> However, still we are unable to account for 70 non-standard signatures
>> which occur more frequently (eg 'Debugged-by:', which has occurred 61
>> times; 'Requested-by:', 48 times; and so on).
>> Complete list:
>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/more_than2/signs_freq.txt
>>
>> I think for these cases we'd need to make some file (as is used for
>> TYPO_SPELLING), or hash.
>> What do you think/suggest?
>>
> 
> Yes, I agree.
> 
> Goal 1: Try to map all the non-default signatures to their "standard"
> counterpart as much as possible.
> 
> Goal 2: Introduce a few very little signatures to handle those cases
> that really cannot be mapped to a non-default signature.
> 
> Provide good rationales that you can defend and provide documentation
> for when checkpatch shall explain the fix it proposes.
> 
> Here an example for the first ten cases:
> 
> 1)Debugged-by: 61 -> Codeveloped-by:
> 
> Rationale: Debugging is part of Software Development; so
> Codeveloped-by is perfectly fine, even if the contributor did not
> create code.
> 
> (alternatively: maybe a new Assisted-by would do here.)
> 
> 2)Requested-by: 48 -> Suggested-by:
> 
> Rationale: In an open-source project, there are "no requests", just
> "suggestions" to convince a maintainer to accept your patch.
> 
> 3)Co-authored-by: 43 -> Codeveloped-by:
> 
> Rationale: clear. Codeveloped-by and Co-authored-by are synonyms.
> 
> 4)Originally-by: 39
> 
> Maybe something like this deserves to be a new tag. There is a
> significant difference to codeveloped-by. But that needs discussion.
> 
> 5)Analyzed-by: 22
> 
> Rationale: Analyzing is part of Software Development; so
> Codeveloped-by is perfectly fine, even if the contributor did not
> create code.
> (alternatively: maybe a new Assisted-by would do here.)
> 
> 6)Bisected-by: 20
> 
> Difficult...
> (maybe a new Assisted-by would do here.)
> 
> 7)Improvements-by: 19 -> Codeveloped-by:
> 
> 8)Generated-by: 17 -> Reported-by: ?
> 
> What does generated-by actually mean?
> 
> 9)Noticed-by: 11 -> Reported-by:
> 
> 10)Inspired-by: 11 -> Suggested-by:
> 
> Maybe you can come up with a list for the next twenty and then we
> discuss them with Joe Perches and then a larger group?
> 

Sounds good. Will send by tomorrow morning :)

Thanks
Aditya


More information about the Linux-kernel-mentees mailing list