[Linux-kernel-mentees] [PATCH] checkpatch: Improve SPDX license identifier check for script files

Mrinal Pandey mrinalmni at gmail.com
Mon Aug 24 08:23:25 UTC 2020


On 20/08/22 10:24AM, Lukas Bulwahn wrote:
> 
> 
> On Thu, 20 Aug 2020, Mrinal Pandey wrote:
> 
> > On 20/08/09 12:52PM, Mrinal Pandey wrote:
> > > On 20/08/04 09:37PM, Lukas Bulwahn wrote:
> > > > 
> > > > 
> > > > On Tue, 4 Aug 2020, Mrinal Pandey wrote:
> > > > 
> > > > > On 20/08/03 12:59PM, Lukas Bulwahn wrote:
> > > > > > 
> > > > > > 
> > > > > > On Mon, 3 Aug 2020, Mrinal Pandey wrote:
> > > > > > 
> > > > > > > The diff content includes the SPDX licensing information but excludes the
> > > > > > > shebang when a change is made to a script file in commit 37f8173dd849
> > > > > > > ("locking/atomics: Flip fallbacks and  instrumentation") and commit
> > > > > > > 075c8aa79d54 ("selftests: forwarding: tc_actions.sh: add matchall mirror
> > > > > > > test"). In these cases checkpatch issues a false positive warning:
> > > > > > > "Misplaced SPDX-License-Identifier tag - use line 1 instead".
> > > > > > > 
> > > > > > > Currently, if checkpatch finds a shebang in line 1, it expects the
> > > > > > > license identifier in line 2. However, this doesn't work when a shebang
> > > > > > > isn't found on the line 1.
> > > > > > 
> > > > > > It does not work when the diff does not contain line 1, but only line 2,
> > > > > > because then the shebang check for line 1 cannot work.
> > > > > > 
> > > > > > > 
> > > > > > > I noticed this false positive, while running checkpatch on the set of
> > > > > > > commits from v5.7 to v5.8-rc1 of the kernel, on the said commits.
> > > > > > > This false positive exists in checkpatch since commit a8da38a9cf0e
> > > > > > > ("checkpatch: add test for SPDX-License-Identifier on wrong line #")
> > > > > > > when the corresponding rule was first added.
> > > > > > > 
> > > > > > > The alternatives considered to improve this check were looking the file
> > > > > > > to be a script by either examining the file extension or file permissions.
> > > > > > >
> > > > > > 
> > > > > > Make this sentence shorter. Try.
> > > > > >  
> > > > > > > The evaluation on former option resulted in 120 files which had a shebang
> > > > > > > in the first line but no file extension. This didn't look like a promising
> > > > > > > result and hence I dropped the idea of using this approach.
> > > > > > > 
> > > > > > > The evaluation on the latter approach shows that there are 53 files in the
> > > > > > > kernel which have an executable bit set but don't have a shebang in the
> > > > > > > first line.
> > > > > > > 
> > > > > > > At the first sight on these 53 files, it seems that they either have a
> > > > > > > wrong file permission set or could be reasonably extended with a shebang
> > > > > > > and SPDX license information. Thus, further cleanup in the repository
> > > > > > > would make the latter approach to work even more precisely.
> > > > > > > 
> > > > > > > Hence, I chose to check the file permissions to determine if the file is a
> > > > > > > script and notify checkpatch to expect SPDX on second line for such files.
> > > > > > >
> > > > > > 
> > > > > > There is no notification here. Think about better wording.
> > > > > >  
> > > > > > > Signed-off-by: Mrinal Pandey <mrinalmni at gmail.com>
> > > > > > > ---
> > > > > > >  scripts/checkpatch.pl | 3 +++
> > > > > > >  1 file changed, 3 insertions(+)
> > > > > > > 
> > > > > > > diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
> > > > > > > index 4c820607540b..bae1dd824518 100755
> > > > > > > --- a/scripts/checkpatch.pl
> > > > > > > +++ b/scripts/checkpatch.pl
> > > > > > > @@ -3166,6 +3166,9 @@ sub process {
> > > > > > >  		}
> > > > > > >  
> > > > > > >  # check for using SPDX license tag at beginning of files
> > > > > > > +		if ($line =~ /^index\ .*\.\..*\ .*[7531]\d{0,2}$/) {
> > > > > > > +			$checklicenseline = 2;
> > > > > > > +		}
> > > > > > 
> > > > > > That check looks good now.
> > > > > > 
> > > > > > >  		if ($realline == $checklicenseline) {
> > > > > > >  			if ($rawline =~ /^[ \+]\s*\#\!\s*\//) {
> > > > > > >  				$checklicenseline = 2;
> > > > > > 
> > > > > > This is probably broken now. It should check for shebang in line 1 and 
> > > > > > then set checklicenseline to line 2, right?
> > > > > 
> > > > > Sir,
> > > > > 
> > > > > Should we remove this check? Earlier when I checked for file extension
> > > > > we had 120 cases where this check was also needed but now we have a
> > > > > better heuristic which is going to work for all cases where license
> > > > > should be on line 2 irrespective of the fact that we know the first line
> > > > > or not.
> > > > >
> > > > 
> > > > Are you sure about that? Where is the evaluation that proves your point?
> > > > 
> > > > E.g., are all files that contain a shebang really with an executable flag?
> > > > 
> > > > Which commands did you run to check this?
> > > >  
> > > > > If I am missing out on something and we should not be removing this check,
> > > > > then I suggest placing the new heuristics below this block so that it doesn't
> > > > > interfere with the existing logic.
> > > > > 
> > > > > Please let me know which path should I go about and then I shall resend
> > > > > the patch with the modified commit message.
> > > > > 
> > > > 
> > > > Think about the strengths and weaknesses of the potential solutions, then 
> > > > show with some commands (as I did for example, for finding the first 
> > > > lines previously) that you can show that it practically makes a 
> > > > difference and you can numbers on those differences.
> > > > 
> > > > When you did that, send a new patch.
> > > > 
> > > > Lukas
> > > > 
> > > Sir,
> > > 
> > > I ran the evaluation as:
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ cat get_permissions.sh
> > > #!/bin/bash
> > > 
> > > for file in $(git ls-files)
> > > do
> > >         permissions="$(stat -c "%a %n" $file)"
> > >         echo "$permissions"
> > > done
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ sh get_permissions.sh | grep ^[7531] > temp
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ cut -d ' ' -f 2 temp > executables
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ cat first_line.sh
> > > #!/bin/bash
> > > file="executables"
> > > while IFS= read -r line
> > > do
> > >         firstline=`head -n 1 $line`
> > >         printf '%s:%s\n' "$firstline" "$line"
> > > done <"$file"
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ cat executables | wc -l
> > > 611
> > > 
> > > mrinalpandey at mrinalpandey:~/linux/linux$ sh first_line.sh | grep ^#! | wc -l
> > > head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
> > > head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
> > > 540
> > > 
> > > We can see that there are 71 files where the executable bit is set but
> > > the first line is not a shebang. These include 13 directories which
> > > throw the error above. Remaining 58 files(earlier the number was 53)
> > > could be cleaned so that this heuristic works better as we saw. So, by
> > > checking only for the executable bit we can say that license should be
> > > on second line, we probably don't need to check for the shebang on line
> > > 1.
> > > Please let me know if the evaluation makes sense.
> > >
> 
> This evaluation makes sense to find the cases that should be cleaned up.
> 
> Either the executable flag is simply set wrongly and should be dropped or 
> it is actually a script and should get a shebang in the beginning.
> 
> I actually already started cleaning up. See:
> 
> https://lore.kernel.org/lkml/20200819081808.26796-1-lukas.bulwahn@gmail.com/
> 
> We can discuss how to continue this cleanup.
>

Sir,

Sure. I would love to discuss that.
Should I send cleanup related patches directly to lkml or do they need to
go through you, Shuah ma'am and mentee list first?

> However, you cannot use this evaluation to say the checking the executable 
> bit is enough, simply as from this evaluation, you do not know how many 
> files have a shebang in the first line and are not set with executable 
> flag. For those files, we expect the SPDX identifier in the second line 
> but your suggested approach would expect it in the first line.
> 
> So we need an evaluation to find out how many cases of that situation 
> exist?
> 
> Then, can we easily fix those as well?

Here is the evaluation I ran to find such cases:

mrinalpandey at mrinalpandey:~/linux/linux$ cat first_lines.sh
#!/bin/bash

for file in $(git ls-files)
do
	permissions="$(stat -c '%a %n' $file)"
	firstline="$(head -n 1 $file)"
	echo "$permissions : $firstline"
done
mrinalpandey at mrinalpandey:~/linux/linux$ sh first_lines.sh | grep ^[642].*#! | wc -l
head: error reading 'scripts/dtc/include-prefixes/arc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm': Is a directory
head: error reading 'scripts/dtc/include-prefixes/arm64': Is a directory
head: error reading 'scripts/dtc/include-prefixes/c6x': Is a directory
head: error reading 'scripts/dtc/include-prefixes/dt-bindings': Is a directory
head: error reading 'scripts/dtc/include-prefixes/h8300': Is a directory
head: error reading 'scripts/dtc/include-prefixes/microblaze': Is a directory
head: error reading 'scripts/dtc/include-prefixes/mips': Is a directory
head: error reading 'scripts/dtc/include-prefixes/nios2': Is a directory
head: error reading 'scripts/dtc/include-prefixes/openrisc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/powerpc': Is a directory
head: error reading 'scripts/dtc/include-prefixes/sh': Is a directory
head: error reading 'scripts/dtc/include-prefixes/xtensa': Is a directory
82

The last line in these 82 lines is "Binary file (standard input) matches" so we have 81 instances
where we have shebang in the first line but the file is non-executable.

> 
> This is certainly good investigation and it leads to some cleanup, so let 
> us continue here.
>
Thank you sir.

> Lukas
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.linuxfoundation.org/pipermail/linux-kernel-mentees/attachments/20200824/0c41ca3b/attachment-0001.sig>


More information about the Linux-kernel-mentees mailing list