[Fuego] [PATCH] Optimize the code of assert_has_program.

Fri Oct 18 04:46:49 UTC 2019

The TL;DR version:
There's a  new much faster version of assert_has_program() in fuego-core,
in the master branch. Please try it out.

Ok - now the longer version...

I have re-written the assert_has_program code in the fuego core
so that this patch no longer applies.  Sorry about that.

But I'd like to discuss my solution to this problem, and the roadmap
for what I've implemented so far, and maybe get some feedback
on it before going further with it.

I modified assert_has_program to call a new routine:
check_has_program, that uses the command 'command -v'
on the target to find programs on the PATH.  This is a POSIX-compatible
command, which is built in to all POSIX shells.  I tested with
busybox ash, dash and bash.

The busybox version of 'command -v' does not give the full path, but it
does give a value if the  command is on the path, which is sufficient for
assert_has_program.

This is much faster than the old method of searching through the path
using a loop on the host.  Instead of many invocations of 'cmd' per item being
checked, there is only one.

Also, I have implemented a dependency cache feature.  It is possible to 
run a program once, after provisioning the board, to accumulate the
program installation status for programs required by all tests.  The data
for this is placed in variables in the stored variable file at:
/fuego-rw/board/<board>.vars

All these program variables become part of the prolog.sh file for a test during
overlay generation, and they can be tested very quickly, without having
to communicate with the target board.  This takes the cost of doing a single
assert_has_program from many seconds (the original cost) to 
under 3 seconds (using 'command -v', when the data is not in cache) to
under .1 second, if the data is in cache.

This introduces an issue, however: the dependency cache needs
to be cleared and re-populated when the board is provisioned, so that
stale data is not used for tests run after the board is re-installed.

This adds an extra step to the CI loop.  By default, if you never populate the
dependency cache by running the new program (populate_board_cache.sh),
then fuego core will use 'command -v' for each assert_has_program in a test's
test_pre_check() function.   This means the new system should be completely
backward compatible with the old system.  Without the cache, test_pre_check
with multiple calls to assert_has_program will still take over 10 seconds.
However, if populate_board_cache.sh is run,
then the dependency checks runs very quickly, for all tests.  However, executing this
script after installing a board is now required in order to avoid using stale cache
data.  It should only need to be run once per after the board is re-provisioned.
This is something that seems like it might be error-prone, and might need
some additional support (in the way of messages or other help notices) to make
sure that when people set up their CI loop with Fuego, they include a call
this this script.  I'm still thinking about the exact method of invoking this
whether through an external script, as it is now, or via a special "action"
script implemented similar to a Fuego test.

Note that I'm also looking into re-factoring the pre_test functionality
to remove or consolidate pre-test operations on the board, to save
time and reduce test overhead.

This system could be improved further by supporting multiple arguments to 
assert_has_program (and check_program), and implementing the loop
over them on the target, as Mingyu did in this patch.  I haven't had time
to go back and refactor this initial implementation to do that.  Theoretically,
this could result in a check for multiple programs only requiring a single
call to 'cmd' to communicate with the target, which would speed things
up (in the non-cache case) even more.  I’m considering writing a little
function as part of need_check.sh, which would do all the data accumulation
for all dependencies listed in a test with a single communication with the
target.  This would use a somewhat complicated shell snippet to gather all
the data at once.  The downside is that it would be a bit complicated.
The upside would be that there would be no need to maintain the
dependency cache in order to have correct operation.

Let me know what you think about this new system.
Please try it out and make sure that nothing breaks in your tests.

Finally - this dependency cache feature actually has a dual purpose.
I've been working on implementing a remote testing feature for Fuego 
using fserver.  Likely, the server will eventually support adding meta-data
information (including supported hardware, board features, and installed
memory and programs) for a board, so that a remote scheduler can select
a board for a test based on the board's attributes.  That feature is still a ways
off in terms of implementation.  But it's on the roadmap, and may help
explain why I'd like to gather this dependency data (as one element of
that board information.)

Regards,
 -- Tim

> -----Original Message-----
> From: Wang, Mingyu on  Tuesday, October 15, 2019 7:52 PM
> To: fuego at lists.linuxfoundation.org
> Subject: Re: [Fuego] [PATCH] Optimize the code of assert_has_program.
> 
> Hi Tim,
> 
> >> Usage:
> >> before:
> > >    assert_has_program AAA
> > >    assert_has_program BBB
> >>
> > >after:
> > >    assert_has_program "AAA BBB"
> > >    assert_has_program "AAA BBB" SEARCH_PATH
> >>
> > >Note: SEARCH_PATH is optional.
> >Can you explain when SEARCH_PATH is needed?
> 
> It is not necessary.
> I see that the original is_on_target can search in the specified path, so I
> added SEARCH_PATH here.
> If need to do the same thing with the original assert_has_program, this
> variable can be deleted.
> 
> 
> >> +             if [ ! -s $tmpfile ]
> >> +             then
> >> +                 find / -name \"\$prg\" | head -n 1 >$tmpfile
> >> +             fi
> >This will detect the program anywhere in the filesystem, not just on the
> PATH.
> >That changes the behavior of assert_has_program, which currently only
> looks on the target's PATH.
> 
> This code is still used to consider the user-specified path.
> If need to do the same thing with the original assert_has_program, this code
> can be deleted.
> 
> >> +             if [ ! -s $tmpfile ]
> >> +             then
> > >+                 echo \"\$prg\" >$tmpfile ; break;
> >This would put the program name into tmpfile, which means that even if
> the program was not detected, tmpfile will have a non-empty result.
> >I don't understand this.
> The program was not detected will be saved into tmpfile.
> And assert_on_target will get the program and transform to
> assert_has_program to show  an error  message.
> 
> 
> >> +             else
> > >+                 > $tmpfile
> >I also don't understand why this is needed.  The file is always touched
> above, so it exists as an empty file.  Why does it need to have the empty
> redirection into it?
> Consider the case where assert_has_program specifies multiple programs at
> the same time.
> 
> >I can't tell what happens if I provide the name of a file which isn't on the
> path, but is in the filesystem, or which isn't in the filesystem at all.
> >Did you test for these cases?
> I tested a variety of situations, including specifying multiple programs at the
> same time。
> And existing or non-existent  program  is also tested.
> 
> When I designed assert_has_program, I considered the existing is_on_target
> and is_on_target_path and wanted to be as compatible as possible with
> them so that they could be replaced in the future.
> 
> best regards!
> 
> 
> 
> -----Original Message-----
> From: Tim.Bird at sony.com [mailto:Tim.Bird at sony.com]
> Sent: Saturday, September 21, 2019 10:25 AM
> To: Wang, Mingyu/王 鸣瑜 <wangmy at cn.fujitsu.com>;
> fuego at lists.linuxfoundation.org
> Subject: RE: [Fuego] [PATCH] Optimize the code of assert_has_program.
> 
> I'm not sure this gives the same results at the original code.
> See my comments inline below.
> 
> Also, combining multiple programs into a single 'assert_has_program'
> lines will make it a bit more difficult to implement a cache system for
> programs detected on the board.
> 
> I will have to think about this one.
> 
> > -----Original Message-----
> > From: Wang Mingyu on Tuesday, September 03, 2019 3:09 PM
> >
> > It is faster to check the prgoram or file on the target.
> prgoram -> program
> 
> >
> > Usage:
> > before:
> >     assert_has_program AAA
> >     assert_has_program BBB
> >
> > after:
> >     assert_has_program "AAA BBB"
> >     assert_has_program "AAA BBB" SEARCH_PATH
> >
> > Note: SEARCH_PATH is optional.
> 
> Can you explain when SEARCH_PATH is needed?
> 
> >
> > Signed-off-by: Wang Mingyu <wangmy at cn.fujitsu.com>
> > ---
> >  scripts/functions.sh  | 65
> > +++++++++++++++++++++++++++++++++++++++----
> >  scripts/need_check.sh | 11 ++------
> >  2 files changed, 61 insertions(+), 15 deletions(-)
> >
> > diff --git a/scripts/functions.sh b/scripts/functions.sh index
> > 0fa80b8..3aea0c8 100755
> > --- a/scripts/functions.sh
> > +++ b/scripts/functions.sh
> > @@ -1010,6 +1010,57 @@ function is_on_target_path {
> >      is_on_target $1 $2 $TARGET_PATH
> >  }
> >
> > +# check for a program or file on the target, and set a variable if
> > +it's present # $1 - file, dir or program on target # $2 is the path
> > +of target to search the program/file # $3 - tmpfile to save the
> > +detect result function set_cmd_str {
> In general, I like this approach of executing a command on the target that
> does the search loop.
> 
> > +   tmpfile=$3
> > +   cmd_str="touch $tmpfile
> > +         # split $1 on whitespace, without file globbing
> > +         for prg in \$(echo $1 | tr \" \" \"\\n\") ; do
> > +             # split search path on colon
> > +             for d in \$(echo $2 | tr \":\" \"\\n\") ; do
> > +                 # execute a command on the target to detect \$d/\$prg
> > +                 if [ -z \"\$(cat $tmpfile)\" -a -e \"\$d/\$prg\" ]
> > +                 then
> > +                     echo \"\$d/\$prg\" >$tmpfile ;break;
> > +                 fi
> > +             done
> > +             if [ ! -s $tmpfile ]
> > +             then
> > +                 find / -name \"\$prg\" | head -n 1 >$tmpfile
> > +             fi
> This will detect the program anywhere in the filesystem, not just on the
> PATH.
> That changes the behavior of assert_has_program, which currently only
> looks on the target's PATH.
> 
> > +             if [ ! -s $tmpfile ]
> > +             then
> > +                 echo \"\$prg\" >$tmpfile ; break;
> This would put the program name into tmpfile, which means that even if the
> program was not detected, tmpfile will have a non-empty result.
> I don't understand this.
> 
> > +             else
> > +                 > $tmpfile
> I also don't understand why this is needed.  The file is always touched above,
> so it exists as an empty file.  Why does it need to have the empty redirection
> into it?
> 
> > +             fi
> > +         done"
> > +}
> > +
> > +# check for a program or file on the target, and set a variable if
> > +it's present # $1 - file, dir or program on target # $2 - variable to
> > +set during the build # $3 is the path of target to search the
> > +program/file (optional, default:
> > $PATH)
> > +function assert_on_target {
> > +    local TARGET_PATH=${3}
> > +    if [ -z "$TARGET_PATH" ]; then
> > +       TARGET_PATH=$(cmd "echo \$PATH")
> > +    fi
> > +
> > +    tmpfile=$(mktemp /tmp/found_loc.XXXXXX)
> > +    set_cmd_str "$1" $TARGET_PATH $tmpfile
> > +    cmd "$cmd_str"
> > +    get $tmpfile $tmpfile
> > +    if [ -s $tmpfile ] ; then
> > +        export $2=$(cat $tmpfile)
> > +    fi
> > +    cmd "rm $tmpfile"
> > +    rm -f $tmpfile # -f for tests running on the host }
> > +
> >  # check for a library on the SDK, and set a variable if it's present
> > # $1 - the file (library) we want to search for in the SDK  # $2 -
> > variable to set the location (if the file is found) @@ -1036,13
> > +1087,15 @@ function is_on_sdk {
> >
> >  # check for a program or file on the target, and send message if the
> > program or file is missing  # $1 has the program that is required on
> > the target board -# this has the side effect of defining PROGRAM_$1
> > (uppercased) -# with the value as the directory where $1 is found on
> > the board.
> > +# $2 is the path of target to search the program/file (optional, default:
> > $PATH)
> >  function assert_has_program {
> > -   upName=${1^^}
> > -   progVar=PROGRAM_${upName//[-,.]/_}
> > -   is_on_target_path $1 ${progVar}
> > -   assert_define ${progVar} "Missing '$1' program on target board"
> > +   assert_on_target "$1" prgName $2
> > +   if [ -n "$prgName" ]
> > +   then
> > +       upName=${prgName^^}
> > +       progVar=PROGRAM_${upName//[-,.]/_}
> > +       assert_define ${progVar} "Missing '$prgName' program on target
> board"
> > +   fi
> >  }
> >
> >  # check for a module on the target, and abort if it is missing diff
> > --git a/scripts/need_check.sh b/scripts/need_check.sh index
> > c9bbb75..9bb0098 100755
> > --- a/scripts/need_check.sh
> > +++ b/scripts/need_check.sh
> > @@ -319,18 +319,11 @@ function check_root {  # check if those
> > specified commands exist on board  # $1 has single string with a list
> > of command entries to check for  function check_program {
> > -  # split $1 on whitespace, without file globbing
> > -  set -f
> > -  arg_array=($1)
> > -  set +f
> > -
> > -  for prg in "${arg_array[@]}" ; do
> > -    is_on_target_path $prg prg_path
> > -    if [ -z "$prg_path" ] ; then
> > +    assert_on_target "$1" prg
> > +    if [ -n "$prg" ] ; then
> >        echo -e "\n\nABORTED: Expected command \"$prg\" on the target,
> > but it's not there!"
> >        return 1
> >      fi
> > -  done
> >
> >    # return OK if all necessary commands exist on the target
> >    return 0
> > --
> > 2.17.1
> 
> I'm not sure this does the same thing as the original assert_has_program.
> 
> I can't tell what happens if I provide the name of a file which isn't on the
> path, but is in the filesystem, or which isn't in the filesystem at all.
> Did you test for these cases?
>  -- Tim
> 
> 
> 
> 
> 
> _______________________________________________
> Fuego mailing list
> Fuego at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/fuego