[Chaoss-members] [Oss-health-metrics] Growth Maturity and Decline Working Group Update

dmg dmg at uvic.ca
Thu Jun 14 20:52:48 UTC 2018


Sean Goggins <s at goggins.com> writes:

> Hi All:
>
> During our Growth Maturity and Decline Metrics working group 
> today we discussed two specific metrics:
>

with all respect to those who are doing the work, I feel this 
method of defining metrics is flawed.

Take for example Pullrequest 13:

+ [New Overall Contributors](activity-metrics/new-contributors.md) 
| What is the overall number of new contributors?
 +[New Contributors of 
 Commits](activity-metrics/new-contributors-commits.md) | What is 
 the number of persons contributing with an accepted commit for 
 the first time?
 +[New Contributors of Opened 
 Issues](activity-metrics/new-contributors-issues-opened.md) | 
 What is the number of persons opening an issue for the first 
 time?
 +[New Contributors of Closed 
 Issues](activity-metrics/new-contributors-issues-closed.md) | 
 What is the number of persons closing an issue for the first 
 time?
 +[New Contributors of Initiated Code 
 Reviews](activity-metrics/new-contributors-code-reviews-opened.md) 
 | What is the number of persons initiating a code review for the 
 first time?
 +[New Contributors of Reviews for 
 Code](activity-metrics/new-contributors-code-reviews.md) | What 
 is the number of persons contributing with reviews of code for 
 the first time?
 +[New Contributors of Posted 
 Messages](activity-metrics/new-contributors-posts.md) | What is 
 the number of persons posting messages in mailing lists for the 
 first time?

Based on this definition, i assert that the number of new 
contributors to a project is equal to the number of contributors 
of that project. Anybody wants to prove me wrong?

What we need is to think more holistically and think more in term 
of what we are measuring.

First, "a new contributors" metric is not a _new_ metric. It is a 
derived metric. Is a filtering of an activity metric that has been 
filtered to particular subset of individuals.

We need to clearly define what we can measure and what we can 
derive from what we can measure.

here is a proposal:

perhaps we should first start with what we can measure. What are 
observable  entities? Then based on this entities define "lists" 
of activities.
Each activity has many attributes: type, who is involved with it, 
when it was done, etc. An activity is polymorphic.

Then we can define metrics in terms of filtering. For instance, 
"commits by first contributors" is the result of filtering 
activities of type commit such that we only capture the first 
commit from each person.

Now, there is also the issue of 'work' vs 'power'. Work is 
absolute (think physics), while power is avg power over unit of 
time.

The metric I defined above is absolute. If I want to compute its 
"time related" one I have to define a period, basically, the 
"average number of commits by first contributors" over "some unit 
of time".
or I can define it more fine grained, as a time series, where I 
compute the average over a fix period. Then the result is a time 
series.

for example: I can define the Time series of new contributors as:

montly new contributors = TimeSeries( count(filter <keep only the 
first activity of each contributor> activities)) per month

montly new commmitters = TimeSeries( count(filter <keep only the 
first activity of each contributor> filter <commits> activities)) 
per month


Efficiency in PR 12 is flawed to.

Note that in this context, efficiency (as defined in the PR) is 
also an absolute metric:

   Formula:** 'issues_closed / (issues_opened + issues_backlog)'

but that is ok, because it can be converted into a time series.

We can still define it in terms of a filtering of the activities:

issue resolution efficiency = count(filter <type=issue and 
status=closed> activities)/ count(filter <type=issue and 
status=(not closed> activities)

but this rate is only useful when it is converted into a time 
series. So with my made-up-notation:

monthly issue resolution efficiency = TimeSeries(count(filter 
<type=issue and status=closed> activities)/ count(filter 
<type=issue and status=(not closed> activities)) per month

I personally  don't like the name "efficiency". Its meaning is 
rate of output to input. This is not what this is measuring. A 
project that did not have any new issues
and did not close an outstanding issue would have the same 
efficiency as in the previous period, but nothing has being done.


--dmg


> 1. New Contributors and 
> https://github.com/chaoss/wg-gmd/pull/13 
> <https://github.com/chaoss/wg-gmd/pull/13>
> 2. Issue Resolution Efficiency 
> https://github.com/chaoss/wg-gmd/pull/12 
> <https://github.com/chaoss/wg-gmd/pull/12>
>
> These two metrics share the characteristic that their expression 
> is likely to be parameterized in different ways. You can follow 
> the examples and discussion on the associated pull requests, 
> noted above.
>
> We encourage participation from community managers during our 
> next call, at 11am CDT on June 
> 28th. https://unomaha.zoom.us/j/720431288 
> <https://unomaha.zoom.us/j/720431288>
>
> Whether or not you are able to make the next call, please review 
> and comment if you are interested on the two pull requests from 
> Jesus, noted above and here:
>
> https://github.com/chaoss/wg-gmd/pulls 
> <https://github.com/chaoss/wg-gmd/pulls>
>
> Thanks!
>
> Jesus & Sean _______________________________________________
> Oss-health-metrics mailing list
> Oss-health-metrics at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metrics


--
Daniel M. German                  "Often a small and simple 
question can chisel away at the biggest problems"
                                   Levitt and Dubner
http://turingmachine.org/
http://silvernegative.com/
dmg (at) uvic (dot) ca
replace (at) with @ and (dot) with .


More information about the Chaoss-members mailing list