[Chaoss-members] [Oss-health-metrics] Growth Maturity and Decline Working Group Update

Goggins, Sean Patrick GogginsS at missouri.edu
Thu Jun 14 21:13:17 UTC 2018


Hi Daniel!

Exactly the kind of perspective we are looking to draw into the conversation!  Thank you!

I have a few comments, below, as I think Jesus may as well.

It would be great if open source community managers could be pulled into the discussion where we are trying to figure out precisely what the metrics should express, what they should be called and how we should filter (parameterize) them.  I think this thread makes that need fairly clear. 

Thank you!!

Sean 

> On Jun 14, 2018, at 3:52 PM, dmg <dmg at uvic.ca> wrote:
> 
> 
> Sean Goggins <s at goggins.com> writes:
> 
>> Hi All:
>> 
>> During our Growth Maturity and Decline Metrics working group today we discussed two specific metrics:
>> 
> 
> with all respect to those who are doing the work, I feel this method of defining metrics is flawed.
> 
> Take for example Pullrequest 13:
> 
> + [New Overall Contributors](activity-metrics/new-contributors.md) | What is the overall number of new contributors?
> +[New Contributors of Commits](activity-metrics/new-contributors-commits.md) | What is the number of persons contributing with an accepted commit for the first time?
> +[New Contributors of Opened Issues](activity-metrics/new-contributors-issues-opened.md) | What is the number of persons opening an issue for the first time?
> +[New Contributors of Closed Issues](activity-metrics/new-contributors-issues-closed.md) | What is the number of persons closing an issue for the first time?
> +[New Contributors of Initiated Code Reviews](activity-metrics/new-contributors-code-reviews-opened.md) | What is the number of persons initiating a code review for the first time?
> +[New Contributors of Reviews for Code](activity-metrics/new-contributors-code-reviews.md) | What is the number of persons contributing with reviews of code for the first time?
> +[New Contributors of Posted Messages](activity-metrics/new-contributors-posts.md) | What is the number of persons posting messages in mailing lists for the first time?
> 
> Based on this definition, i assert that the number of new contributors to a project is equal to the number of contributors of that project. Anybody wants to prove me wrong?

One of the things we discussed is that these metrics are computed over a period of time that is a parameter, and probably defining “new” as “having never made contribution type X at any point in the repository’s record".  So, new contributors in 2017 for a project would be, hypothetically, all the people who did not contribute a particular type of contribution prior to that year.  New contributors over the life of the project could also be trended using “blocks of time” like weeks, months, or where available, software release. 

Its in these kinds of details and questions about the difference between metrics and parameters applied to metrics that I think we need discussion. The more clear we can be about both the metric and its parameters, the more clear a metric will become and the less likely we will be to attempt to compare two projects using the “same metric” which, behind the scenes, two implementers parameterized differently … 

> 
> What we need is to think more holistically and think more in term of what we are measuring.
> 
> First, "a new contributors" metric is not a _new_ metric. It is a derived metric. Is a filtering of an activity metric that has been filtered to particular subset of individuals.
> 
> We need to clearly define what we can measure and what we can derive from what we can measure.

Agree … I should have read this before writing the note above … I agree that the metrics in question could be defined as derived. They do not emerge directly from the raw record.  As. You state, we are filtering. I think the specific filters we are discussing are more “time focused” than “subset of individuals” focused in this case … 

> 
> here is a proposal:
> 
> perhaps we should first start with what we can measure. What are observable  entities? Then based on this entities define "lists" of activities.
> Each activity has many attributes: type, who is involved with it, when it was done, etc. An activity is polymorphic.

I think your notion of an “activity” maps to a record of an action performed in a repository or other OSS serving system … 

> 
> Then we can define metrics in terms of filtering. For instance, "commits by first contributors" is the result of filtering activities of type commit such that we only capture the first commit from each person.

I think I agree with this generally.  I think that implementation conceptually maps to what is described but in some more precise language.  I think what I am thinking about as “parameterization” of a “metric” reflects two differences from your notion of filtering: 
1. A derived metric is likely implemented as a function
2. In order to make the output (metric) clear, its parameters also need to be stated. What you are describing as a filter is taking the view of the metric from its most granular data perspective … Since most metric implementations are functions, I gravitate toward the use of the term “parameter” … but, really, “filter” means the same thing so long as the filters applied in a particular case are clear. 


> 
> Now, there is also the issue of 'work' vs 'power'. Work is absolute (think physics), while power is avg power over unit of time.
> 
> The metric I defined above is absolute. If I want to compute its "time related" one I have to define a period, basically, the "average number of commits by first contributors" over "some unit of time".
> or I can define it more fine grained, as a time series, where I compute the average over a fix period. Then the result is a time series.
> 
> for example: I can define the Time series of new contributors as:
> 
> montly new contributors = TimeSeries( count(filter <keep only the first activity of each contributor> activities)) per month
> 
> montly new commmitters = TimeSeries( count(filter <keep only the first activity of each contributor> filter <commits> activities)) per month

I think I agree with you; your example specifies a data filter where I am imagining a “metric function” that takes a parameter.  Using your last example, I think about this as  [ new_committers(begin_date, end_date, time_period) {A time series of new committers counted for each time_period (day, week, month, year) starting at begin_date and ending at end_date } … 

It seems that implemented metrics should make the filters/parameters applied transparent. I think that will build trust in metrics defined here .. 

> 
> 
> Efficiency in PR 12 is flawed to.
> 
> Note that in this context, efficiency (as defined in the PR) is also an absolute metric:
> 
>  Formula:** 'issues_closed / (issues_opened + issues_backlog)'
> 
> but that is ok, because it can be converted into a time series.
> 
> We can still define it in terms of a filtering of the activities:
> 
> issue resolution efficiency = count(filter <type=issue and status=closed> activities)/ count(filter <type=issue and status=(not closed> activities)
> 
> but this rate is only useful when it is converted into a time series. So with my made-up-notation:
> 
> monthly issue resolution efficiency = TimeSeries(count(filter <type=issue and status=closed> activities)/ count(filter <type=issue and status=(not closed> activities)) per month
> 
> I personally  don't like the name "efficiency". Its meaning is rate of output to input. This is not what this is measuring. A project that did not have any new issues
> and did not close an outstanding issue would have the same efficiency as in the previous period, but nothing has being done.

This also makes sense.  I agree that “efficiency” is a flawed term. 

Sean 

> 
> 
> --dmg
> 
> 
>> 1. New Contributors and https://github.com/chaoss/wg-gmd/pull/13 <https://github.com/chaoss/wg-gmd/pull/13>
>> 2. Issue Resolution Efficiency https://github.com/chaoss/wg-gmd/pull/12 <https://github.com/chaoss/wg-gmd/pull/12>
>> 
>> These two metrics share the characteristic that their expression is likely to be parameterized in different ways. You can follow the examples and discussion on the associated pull requests, noted above.
>> 
>> We encourage participation from community managers during our next call, at 11am CDT on June 28th. https://unomaha.zoom.us/j/720431288 <https://unomaha.zoom.us/j/720431288>
>> 
>> Whether or not you are able to make the next call, please review and comment if you are interested on the two pull requests from Jesus, noted above and here:
>> 
>> https://github.com/chaoss/wg-gmd/pulls <https://github.com/chaoss/wg-gmd/pulls>
>> 
>> Thanks!
>> 
>> Jesus & Sean _______________________________________________
>> Oss-health-metrics mailing list
>> Oss-health-metrics at lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metrics
> 
> 
> --
> Daniel M. German                  "Often a small and simple question can chisel away at the biggest problems"
>                                  Levitt and Dubner
> http://turingmachine.org/
> http://silvernegative.com/
> dmg (at) uvic (dot) ca
> replace (at) with @ and (dot) with .
> _______________________________________________
> Oss-health-metrics mailing list
> Oss-health-metrics at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metrics



More information about the Chaoss-members mailing list