[Linux-kernel-mentees] [PATCH] checkpatch: Fix SPDX license check for scripts

Lukas Bulwahn lukas.bulwahn at gmail.com
Sun Jul 19 07:13:06 UTC 2020


On So., 19. Juli 2020 at 08:28, Mrinal Pandey <mrinalmni at gmail.com> wrote:

>
>
> On Fri, Jul 17, 2020 at 5:18 PM Lukas Bulwahn <lukas.bulwahn at gmail.com>
> wrote:
>
>>
>>
>> On Fri, Jul 17, 2020 at 11:54 AM Mrinal Pandey <mrinalmni at gmail.com>
>> wrote:
>>
>>>
>>>
>>> On Thu, Jul 16, 2020, 11:01 Lukas Bulwahn <lukas.bulwahn at gmail.com>
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Jul 16, 2020 at 7:15 AM Mrinal Pandey <mrinalmni at gmail.com>
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Jul 14, 2020 at 11:33 AM Lukas Bulwahn <
>>>>> lukas.bulwahn at gmail.com> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Tue, 14 Jul 2020, Mrinal Pandey wrote:
>>>>>>
>>>>>> >
>>>>>> >
>>>>>> > On Tue, Jul 14, 2020 at 1:16 AM Lukas Bulwahn <
>>>>>> lukas.bulwahn at gmail.com> wrote:
>>>>>> >
>>>>>> >
>>>>>> >       On Mon, 13 Jul 2020, Mrinal Pandey wrote:
>>>>>> >
>>>>>> >       > In all the scripts, the SPDX license should be on the
>>>>>> second line,
>>>>>> >       > the first line being the "sh-bang", but checkpatch issues a
>>>>>> warning
>>>>>> >       > "Misplaced SPDX-License-Identifier tag - use line 1
>>>>>> instead" for the
>>>>>> >       > scripts that have SPDX license in the second line.
>>>>>> >       >
>>>>>> >       > However, this warning is not issued when checkpatch is run
>>>>>> on a file using
>>>>>> >       > `-f` option. The case for files has been handled gracefully
>>>>>> by changing
>>>>>> >       > `$checklicenseline` to `2` but a corresponding check when
>>>>>> running checkpatch
>>>>>> >       > on a commit hash is missing.
>>>>>> >       >
>>>>>> >       > I noticed this false positive while running checkpatch on
>>>>>> the set of
>>>>>> >       > commits from v5.7 to v5.8-rc1 of the kernel on the commits
>>>>>> which modified
>>>>>> >       > a script file.
>>>>>> >       >
>>>>>> >       > This check is missing in checkpatch since commit
>>>>>> a8da38a9cf0e
>>>>>> >       > ("checkpatch: add test for SPDX-License-Identifier on wrong
>>>>>> line #")
>>>>>> >       > when the corresponding rule was first commited.
>>>>>> >       >
>>>>>> >       > Fix this by setting `$checklicenseline` to `2` when the
>>>>>> diff content that
>>>>>> >       > is being checked originates from a script, thus, informing
>>>>>> checkpatch that
>>>>>> >       > the SPDX license should be on the second line.
>>>>>> >       >
>>>>>> >       > Signed-off-by: Mrinal Pandey <mrinalmni at gmail.com>
>>>>>> >       > ---
>>>>>> >       >  scripts/checkpatch.pl | 3 +++
>>>>>> >       >  1 file changed, 3 insertions(+)
>>>>>> >       >
>>>>>> >       > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
>>>>>> >       > index 4c820607540b..bbffd0c4449d 100755
>>>>>> >       > --- a/scripts/checkpatch.pl
>>>>>> >       > +++ b/scripts/checkpatch.pl
>>>>>> >       > @@ -3218,6 +3218,9 @@ sub process {
>>>>>> >       >               next if ($realfile !~
>>>>>> /\.(h|c|s|S|sh|dtsi|dts)$/);
>>>>>> >       >
>>>>>> >       >  # check for using SPDX-License-Identifier on the wrong
>>>>>> line number
>>>>>> >       > +             if ($realfile =~ /^scripts/) {
>>>>>> >       > +                    $checklicenseline = 2;
>>>>>> >       > +             }
>>>>>> >
>>>>>> >       I think this is somehow wrong here. The check for
>>>>>> checklicenseline = 2
>>>>>> >       looks very different above.
>>>>>> >
>>>>>> >       Why does -f work and using a patch file not work?
>>>>>> >
>>>>>> >
>>>>>> > Sir,
>>>>>> >
>>>>>> > I am going to explain my observation based on file
>>>>>> `scripts/atomic/gen-atomic-fallback.sh` and
>>>>>> > commit hash `37f8173dd849`.
>>>>>> >
>>>>>> > If we are checking against the file, `checklicenseline` is set to 1
>>>>>> and when `realline` is 1 the above
>>>>>> > `if` block is triggered, then we check if this line is of the form
>>>>>> `#!/` using the regular expression
>>>>>> > `^[ \+]\s*\#\!\s*\/`. If this is the case we set `checklicenseline`
>>>>>> to `2` informing checkpatch that it should
>>>>>> > expect license on the second line and this works all fine for a
>>>>>> file.
>>>>>> > The `if` block below my proposed changes evaluates to false in this
>>>>>> case and thus it emits no false warning.
>>>>>> >
>>>>>> > However, If we are checking a diff content, the above `if` block is
>>>>>> not triggered at all. This is
>>>>>> > because `realline` stores the actual line number of the line we are
>>>>>> checking currently out of diff content.
>>>>>> > This value is 2 because SPDX identifier is indeed at the second
>>>>>> line in the file but `checklicenseline` is still
>>>>>> > `1`.
>>>>>> > `realline` will never become equal to 1 again and thus the above
>>>>>> `if` condition will never be true in this case.
>>>>>> > Even if the above `if` block is triggered it would not update
>>>>>> `checklicenseline` to 2 as the regular expression
>>>>>> > is not satisfied since we don't have sh-bang in diff content and
>>>>>> just the SPDX tag.
>>>>>> > If we don't do this, the `if` block below evaluates to true when
>>>>>> `realline` is 2 and `checklicensline` is `1`
>>>>>> > leading
>>>>>> > to the emission of a false warning.
>>>>>> >
>>>>>>
>>>>>> So, maybe this whole logic needs to be reworked. If you do not know
>>>>>> the
>>>>>> first line, you need to have a different criteria in the first place
>>>>>> to determine if you expect the license tag in the first or the
>>>>>> second,
>>>>>> e.g., the file extension, and then checking line 1 for a shebang is
>>>>>> just
>>>>>> sanity checking. If it is of a specific file extension, you know line
>>>>>> 1
>>>>>> and it is not a shebang, that is probably worth noting as a different
>>>>>> recommendation in checkpatch.pl anyway.
>>>>>>
>>>>>
>>>>> Sir,
>>>>>
>>>>> When we know the first line, i.e. we are running checkpatch against a
>>>>> file, the existing logic
>>>>> works fine. We probably don't want to induce any changes there.
>>>>>
>>>>>
>>>> Why not? Do you think we would break things there? Then we should not
>>>> touch the code at all.
>>>> Do you think we cannot test it properly after the change? Then we
>>>> should think about how we make a proper regression test suite for that.
>>>>
>>>
>>> Sir,
>>>
>>> No, breaking code or not being able to test is not why I suggest this. I
>>> feel that the existing logic handles the case of
>>> "Improper or malformed SPDX tag" and "Misplaced SPDX tag" for files i.e.
>>> when the first line is known. Anyway, the logic
>>> for "Misplaced SPDX tag" is written as a different rule. We just need to
>>> add in the logic for patches there.
>>> I tried to do this by checking for the scripts directory which was
>>> wrong. If I check instead for the file being a script that would make much
>>> more sense.
>>> Please let me know if you suggest something else.
>>>
>>
>> Well, you are going to add a different way of checking as you suggested,
>> right? So are you suggesting to have two duplicated ways of checking for
>> the same thing? That seems strange to me.
>>
>
>> Go ahead, make a suitable first proposal, then we will see if and how to
>> refactor.
>>
>
> Sir,
>
> No, I would not want to duplicate things. Yes, let me first send a patch
> to you first, and then we can refactor it if needed.
>
>>
>>
>>>
>>>> But when we don't know the first line, if am not wrong, it would go
>>>>> somewhat like:
>>>>> if (the file is a script) {
>>>>>     if (the first line is shebang) {
>>>>>         if (the second line is SPDX) {
>>>>>             All good
>>>>>         } else {
>>>>>             Issue a misplaced or missing SPDX tag warning
>>>>>         }
>>>>>     } else {
>>>>>             Issue a missing shebang warning
>>>>>     }
>>>>> } else {
>>>>>     if (the first line is SPDX) {
>>>>>         All good
>>>>>     } else {
>>>>>         Issue a misplaced or missing SPDX tag warning
>>>>>     }
>>>>> }
>>>>>
>>>>>
>>>> Basically agree, but that logic applies when you know the first line as
>>>> well (and only, right?). What if you do not know the first line, how would
>>>> you check "the first line is shebang" if you do not know the first line?
>>>>
>>>>
>>>> The missing shebang warning probably needs to go elsewhere in the whole
>>>> script.
>>>>
>>>
>>> By not knowing the first line I mean to say that the first line doesn't
>>> show up in diff content of the patch but
>>> what if we open the file at that point in the commit history and check
>>> for the first line to be a shebang?
>>> Would it be okay to do that? Once we check the first line we can then
>>> continue as suggested.
>>>
>>
>> I think that is essentially against the general design decision of
>> checkpatch.pl; checkpatch.pl takes the patch but it does not check
>> anything in the current working tree, nor does it know what the "parent" of
>> that patch in the git history really is.
>>
>> Maybe Shuah can confirm? Otherwise, I suggest to look if you find
>> checking a file beyond the patch happening anywhere in the current code of
>> checkpatch.pl and documented in the sparse documentation on checkpatch.pl
>> .
>>
>> So, I believe you need to make it work without checking the first line
>> (if that is not in the patch).
>>
>
> Yes. We can't open a file for checking, you are correct here but we need
> to check for shebang to be on the first line only if it appears
> in the diff content or when we check a complete file, otherwise, we should
> anyway not chek it since checkpatch only checks the patch or the complete
> file.
> Am I correct here?
>
>>
>>
>>>
>>>>>> > So, what I did was to check if the diff content we are checking
>>>>>> actually comes from a script, if yes, we can set
>>>>>> > `checklicenseline` to `2` to avoid this confusion.
>>>>>> >
>>>>>>
>>>>>> Why would you think that scripts are only in scripts?
>>>>>>
>>>>>> How about first listing all files where the SPDX tag is in line 2 in
>>>>>> the
>>>>>> current repository, e.g., v5.8-rc5?
>>>>>>
>>>>>> Then, we look at that list and determine a suitable criteria for
>>>>>> looking
>>>>>> in line 2 for the SPDX tag.
>>>>>>
>>>>>
>>>>> Yes, the scripts are not only in scripts. I have listed all the files
>>>>> where the SPDX tag should be
>>>>> on the second line. I've attached the list for reference. We should
>>>>> probably be checking the file
>>>>> extension to determine if the tag needs to be on the second line or
>>>>> not.
>>>>> The documentation says the SPDX tag should be present in all source
>>>>> files. Do these source files include
>>>>> Documentation files too?
>>>>>
>>>>>
>>>> How did you create that list?
>>>> Agree (if the way you created that list makes sense). File extension
>>>> seems to cover all cases, and checking for the directory 'scripts' does not.
>>>>
>>>> I issued the command `find . -regex ".*\.\(py\|sh\|pl\)"` to make this
>>> list. I should have included awk, YAML and tc files too since they are
>>> scripts too.
>>>
>>>
>> Why not look for all files that have a shebang in the first line?
>>
>
> We could be checking a file or a patch. I suggest we take the name of the
> file we are checking(or the name of the file from which the diff content
> comes from)
> and then run a regex, similar to above, on it to determine if it is a
> script. If we instead go for checking the first line to be shebang in the
> current file would it not require to cat or open
> the file?
>


All good. Create a patch and then we will see.

Lukas

>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.linuxfoundation.org/pipermail/linux-kernel-mentees/attachments/20200719/38b120c5/attachment-0001.html>


More information about the Linux-kernel-mentees mailing list