[Linux-kernel-mentees] [PATCH] checkpatch: fix false positive for REPEATED_WORD warning

Aditya yashsri421 at gmail.com
Wed Oct 21 08:35:44 UTC 2020


On 21/10/20 1:50 pm, Dwaipayan Ray wrote:
> Hey Aditya and Lukas,
> 
>>>> diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>> index 9b9ffd876e8a..181c95691715 100755
>>>> --- a/scripts/checkpatch.pl
>>>> +++ b/scripts/checkpatch.pl
>>>> @@ -3052,7 +3052,9 @@ sub process {
>>>>
>>>>  # check for repeated words separated by a single space
>>>>             if ($rawline =~ /^\+/ || $in_commit_log) {
>>>> -                   while ($rawline =~ /\b($word_pattern) (?=($word_pattern))/g) {
>>>> +                   # avoid repeating hex occurrences like 'ff ff fe 09 ...'
>>>> +                   while ($rawline !~ /((\s)*[0-9a-z]{2}( )+){4,}/ &&
> 
> Pattern is probably wrong. It doesn't recognize word boundaries or
> tabs between words. Example of the first type:
> 
> 000 00 ff ff ...
> 
> The regex matches "00 00 ff ff" ignoring the first 0.
> 
> I think it could be perhaps better with something like:
> 
>  # check for repeated words separated by a single space
> -               if ($rawline =~ /^\+/ || $in_commit_log) {
> +               if (($rawline =~ /^\+/ || $in_commit_log) &&
> +                   $rawline !~ /(?:\b(?:[0-9a-f]{2}\s+){4,})/) {
>                         pos($rawline) = 1 if (!$in_commit_log);
>                         while ($rawline =~ /\b($word_pattern)
> (?=($word_pattern))/g) {
> 
> Please test it though. I only ran it on a few patterns.
> 
> Apart from it, this does fix the problem. But I am quite sceptical about
> matching 4 or more 2 lettered words in a row. There could be counter
> examples but I guess that is very rare. It's not very general, but for
> the moment it does the job.
> 
> So I think it's probably good with some changes. Not sure what Joe
> would have in mind though.
> 
> Lukas, I think with the changes in place, it is ready to go for discussion.
> 
> Thanks,
> Dwaipayan.
> 

Thanks Dwaipayan. You're correct.
I'll use \b for checking the word boundaries and regenerate the
reports. I used 4 as the minimum as there were some occurrences with 4
hex words, For eg,
WARNING:REPEATED_WORD: Possible repeated word: 'ff'
#15:
 d68:	61 29 ff ff 	ori     r9,r9,65535

for the commit 332ce969b763 ("powerpc/8xx: Reduce time spent in
allow_user_access() and friends")

In addition to your changes, I also plan to modify regex with [0-9a-f]
(instead of a-z).
I'll apply all the changes and send the report, along with the removed
warnings again.

Thanks
Aditya


More information about the Linux-kernel-mentees mailing list