[Linux-kernel-mentees] Fix for BAD_SIGN_OFF: non-standard signature

Aditya yashsri421 at gmail.com
Tue Nov 17 18:03:20 UTC 2020


On 13/11/20 11:55 pm, Aditya wrote:
> On 13/11/20 8:56 pm, Lukas Bulwahn wrote:
>> On Fri, Nov 13, 2020 at 4:00 PM Aditya <yashsri421 at gmail.com> wrote:
>>>
>>> On 13/11/20 8:05 pm, Aditya wrote:
>>>> On 12/11/20 1:34 am, Lukas Bulwahn wrote:
>>>>> On Wed, Nov 11, 2020 at 3:13 PM Aditya <yashsri421 at gmail.com> wrote:
>>>>>>
>>>>>> Hi Sir
>>>>>> I have analyzed the checkpatch report for BAD_SIGN_OFF(over
>>>>>> v4.13..v5.8) for non-standard signature and generated reports for it.
>>>>>> Some mistakes are more frequent than others, whereas some mistakes
>>>>>> even have a frequency of 1.
>>>>>>
>>>>>> Non-standard signatures occurring with their frequency:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>>>
>>>>>> Complete warning messages:
>>>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/warn_msgs.txt
>>>>>>
>>>>>> Should I implement the fix similar to TYPO_FIX, where we have a
>>>>>> separate file for common misspellings and corrected words? Or should I
>>>>>> make a hash of these misspellings in checkpatch.pl file as well?
>>>>>>
>>>>>> Also should I include all these misspelled words in it? Or omit words
>>>>>> below certain frequency?
>>>>>>
>>>>>
>>>>> I think the best way would be to compute some kind of edit distance to
>>>>> the known signature tags and if this edit distance is below a certain
>>>>> threshold, suggest that signature tag as the fix. We can then evaluate
>>>>> to determine the best suitable threshold. The edit distance between
>>>>> the different tags are so large that this should always work as
>>>>> intended.
>>>>>
>>>>> Then, we can look into these other creative tags and propose suitable
>>>>> existing tags for the more frequent ones that are non-standard. Or in
>>>>> the case, none of the existing ones fit we can start the discussion on
>>>>> proposing some new standard ones.
>>>>>
>>>>
>>>> I have generated a list of non-standard signatures and their fixes on
>>>> the basis of edit distance.
>>>>
>>>> This is the common list of non standard signatures and fixes (in
>>>> detail):
>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/min_dists.txt
>>>>
>>>> As I observed, I think, we can consider '<=2' as the threshold edit
>>>> distance.
>>>> List for non-standard signature and their proposed fix with edit
>>>> distance<=2 :
>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than_3.txt
>>>>
>>>> I have also generated lists for 3 and 4 edit distance separately for
>>>> reference:
>>>> Equal to 3:
>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_3.txt
>>>>
>>>> Equal to 4:
>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/equal_4.txt
>>>>
>>>> For the rest I guess we'll need to hard code eg. for 'Debugged-by',
>>>> 'Requested-by' etc.
>>>>
>>>> These are the complete lists of non-standard signatures:
>>>> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/non_standard_signs.txt
>>>>
>>
>> Can you share which non-standard-signatures would be
>> handled/transformed with edit distance 2 and which would not in a
>> similar format to non_standard_signs.txt (so, ordered by frequency).
>>
>> We can then consider those that remain and find a good next strategy
>> for the most frequent non-standard signatures.
>>
> 
> Non standard signatures handled with edit distance 2:
> https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt
> 
> Non standard signatures with edit distance greater than 2:
> https://github.com/AdityaSrivast/kernel-tasks/tree/master/random/non_standard_signature/more_than2
> 

I think this mail probably got missed. I'll summarize it a bit for
simplicity:
With edit distance approach and threshold as 2, we're able to handle
39 out of 109 'distinct' cases of non-standard signature. In this 39,
the maximum count of non-standard signature is 19 for 'Reviwed-by:'; 9
for 'Reviewd-by:' and other common mispellings.
Complete List:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/less_than2/signs_freq.txt

However, still we are unable to account for 70 non-standard signatures
which occur more frequently (eg 'Debugged-by:', which has occurred 61
times; 'Requested-by:', 48 times; and so on).
Complete list:
https://github.com/AdityaSrivast/kernel-tasks/blob/master/random/non_standard_signature/more_than2/signs_freq.txt

I think for these cases we'd need to make some file (as is used for
TYPO_SPELLING), or hash.
What do you think/suggest?

Thanks
Aditya

> Thanks
> Aditya
> 



More information about the Linux-kernel-mentees mailing list