[Chaoss-members] [Oss-health-metrics] Growth Maturity and Decline Working Group Update

Serebrenik, A. a.serebrenik at tue.nl
Fri Jun 15 01:05:49 UTC 2018


Dear all

Just a small note that I agree with Daniel that the time period should be a parameter: in this way we can evaluate sensitivity of the measurement-based conclusions to the specific choice of the time period duration. 

Yet another point to keep in mind are the zeros, i.e., periods when no new activity has taken place, no new developers have joined etc. A simple but not necessary correct way of dealing with the zeros would be to exclude them from the consideration. Alternatively, one can keep the zeros and use a more complex statistical machinery to analyze the measurements.

Best wishes 
Alexander

Verstuurd vanaf mijn iPhone

> Op 15 jun. 2018 om 01:19 heeft dmg <dmg at turingmachine.org> het volgende geschreven:
> 
> I think these metrics should be defined as time series, where the
> period between the observations is a parameter.
> 
> This is NOT the same as "over a period of time". A time series imply a
> list of values one for each specific moment of time.
> 
> I guess there is a use case for a specific  period: how many new
> contributors were added in the last year? but this is simply an
> element of the time series.
> 
> i think there is a strong case that many metrics must be computed as a
> time series. The user might determine the period of interest (weekly,
> hourly, yearly, etc).
> 
> 
>> On Thu, Jun 14, 2018 at 3:52 PM, dmg <dmg at uvic.ca> wrote:
>> I think these metrics should be defined as time series, where the
>> period between the observations is a parameter.
>> 
>> This is NOT the same as "over a period of time". A time series imply a
>> list of values one for each specific moment of time.
>> 
>> On Thu, Jun 14, 2018 at 3:50 PM, Jesus M. Gonzalez-Barahona
>> <jgb at bitergia.com> wrote:
>>> On Thu, 2018-06-14 at 23:59 +0200, Jesus M. Gonzalez-Barahona wrote:
>>>>> On Thu, 2018-06-14 at 13:52 -0700, dmg wrote:
>>>>> Sean Goggins <s at goggins.com> writes:
>>>>> 
>>>>>> Hi All:
>>>>>> 
>>>>>> During our Growth Maturity and Decline Metrics working group
>>>>>> today we discussed two specific metrics:
>>>>>> 
>>>>> 
>>>>> with all respect to those who are doing the work, I feel this
>>>>> method of defining metrics is flawed.
>>>>> 
>>>>> Take for example Pullrequest 13:
>>>>> 
>>>>> + [New Overall Contributors](activity-metrics/new-contributors.md)
>>>>>> What is the overall number of new contributors?
>>>>> 
>>>>> +[New Contributors of
>>>>> Commits](activity-metrics/new-contributors-commits.md) | What is
>>>>> the number of persons contributing with an accepted commit for
>>>>> the first time?
>>>>> +[New Contributors of Opened
>>>>> Issues](activity-metrics/new-contributors-issues-opened.md) |
>>>>> What is the number of persons opening an issue for the first
>>>>> time?
>>>>> +[New Contributors of Closed
>>>>> Issues](activity-metrics/new-contributors-issues-closed.md) |
>>>>> What is the number of persons closing an issue for the first
>>>>> time?
>>>>> +[New Contributors of Initiated Code
>>>>> Reviews](activity-metrics/new-contributors-code-reviews-
>>>>> opened.md)
>>>>> | What is the number of persons initiating a code review for the
>>>>> first time?
>>>>> +[New Contributors of Reviews for
>>>>> Code](activity-metrics/new-contributors-code-reviews.md) | What
>>>>> is the number of persons contributing with reviews of code for
>>>>> the first time?
>>>>> +[New Contributors of Posted
>>>>> Messages](activity-metrics/new-contributors-posts.md) | What is
>>>>> the number of persons posting messages in mailing lists for the
>>>>> first time?
>>>>> 
>>>>> Based on this definition, i assert that the number of new
>>>>> contributors to a project is equal to the number of contributors
>>>>> of that project. Anybody wants to prove me wrong?
>>>> 
>>>> Daniel, have a look at the pr. The metric is defined for a period of
>>>> time. Or maybe I'm missing something?
>>> 
>>> /me kicks /me pretty hard for being so dumb.
>>> 
>>> You are completely right, Daniel, the pr does not mention in any place
>>> that this is for a period of time. I was confused with the pr on
>>> efficiency, which I was discussing in some detail during our meeting
>>> today.
>>> 
>>> I'm so sorry for my confusion.
>>> 
>>> Please see https://github.com/chaoss/wg-gmd/pull/12/files for how we
>>> are dealing with period in that other metric about efficiency.
>>> 
>>> Yes, the detailed definition of the metric (to be written) should
>>> clearly state that it is defined over a certain period of time. If you
>>> feel that should be in the name of the metric, which is the only part
>>> which is written for now, we can discuss it. I see pros and cons to
>>> have very detailed names for the metrics.
>>> 
>>> Again, sorry for the noise,
>>> 
>>>        Jesus.
>>> 
>>>>      Jesus.
>>>> 
>>>>> What we need is to think more holistically and think more in term
>>>>> of what we are measuring.
>>>>> 
>>>>> First, "a new contributors" metric is not a _new_ metric. It is a
>>>>> derived metric. Is a filtering of an activity metric that has been
>>>>> filtered to particular subset of individuals.
>>>>> 
>>>>> We need to clearly define what we can measure and what we can
>>>>> derive from what we can measure.
>>>>> 
>>>>> here is a proposal:
>>>>> 
>>>>> perhaps we should first start with what we can measure. What are
>>>>> observable  entities? Then based on this entities define "lists"
>>>>> of activities.
>>>>> Each activity has many attributes: type, who is involved with it,
>>>>> when it was done, etc. An activity is polymorphic.
>>>>> 
>>>>> Then we can define metrics in terms of filtering. For instance,
>>>>> "commits by first contributors" is the result of filtering
>>>>> activities of type commit such that we only capture the first
>>>>> commit from each person.
>>>>> 
>>>>> Now, there is also the issue of 'work' vs 'power'. Work is
>>>>> absolute (think physics), while power is avg power over unit of
>>>>> time.
>>>>> 
>>>>> The metric I defined above is absolute. If I want to compute its
>>>>> "time related" one I have to define a period, basically, the
>>>>> "average number of commits by first contributors" over "some unit
>>>>> of time".
>>>>> or I can define it more fine grained, as a time series, where I
>>>>> compute the average over a fix period. Then the result is a time
>>>>> series.
>>>>> 
>>>>> for example: I can define the Time series of new contributors as:
>>>>> 
>>>>> montly new contributors = TimeSeries( count(filter <keep only the
>>>>> first activity of each contributor> activities)) per month
>>>>> 
>>>>> montly new commmitters = TimeSeries( count(filter <keep only the
>>>>> first activity of each contributor> filter <commits> activities))
>>>>> per month
>>>>> 
>>>>> 
>>>>> Efficiency in PR 12 is flawed to.
>>>>> 
>>>>> Note that in this context, efficiency (as defined in the PR) is
>>>>> also an absolute metric:
>>>>> 
>>>>>   Formula:** 'issues_closed / (issues_opened + issues_backlog)'
>>>>> 
>>>>> but that is ok, because it can be converted into a time series.
>>>>> 
>>>>> We can still define it in terms of a filtering of the activities:
>>>>> 
>>>>> issue resolution efficiency = count(filter <type=issue and
>>>>> status=closed> activities)/ count(filter <type=issue and
>>>>> status=(not closed> activities)
>>>>> 
>>>>> but this rate is only useful when it is converted into a time
>>>>> series. So with my made-up-notation:
>>>>> 
>>>>> monthly issue resolution efficiency = TimeSeries(count(filter
>>>>> <type=issue and status=closed> activities)/ count(filter
>>>>> <type=issue and status=(not closed> activities)) per month
>>>>> 
>>>>> I personally  don't like the name "efficiency". Its meaning is
>>>>> rate of output to input. This is not what this is measuring. A
>>>>> project that did not have any new issues
>>>>> and did not close an outstanding issue would have the same
>>>>> efficiency as in the previous period, but nothing has being done.
>>>>> 
>>>>> 
>>>>> --dmg
>>>>> 
>>>>> 
>>>>>> 1. New Contributors and
>>>>>> https://github.com/chaoss/wg-gmd/pull/13
>>>>>> <https://github.com/chaoss/wg-gmd/pull/13>
>>>>>> 2. Issue Resolution Efficiency
>>>>>> https://github.com/chaoss/wg-gmd/pull/12
>>>>>> <https://github.com/chaoss/wg-gmd/pull/12>
>>>>>> 
>>>>>> These two metrics share the characteristic that their expression
>>>>>> is likely to be parameterized in different ways. You can follow
>>>>>> the examples and discussion on the associated pull requests,
>>>>>> noted above.
>>>>>> 
>>>>>> We encourage participation from community managers during our
>>>>>> next call, at 11am CDT on June
>>>>>> 28th. https://unomaha.zoom.us/j/720431288
>>>>>> <https://unomaha.zoom.us/j/720431288>
>>>>>> 
>>>>>> Whether or not you are able to make the next call, please review
>>>>>> and comment if you are interested on the two pull requests from
>>>>>> Jesus, noted above and here:
>>>>>> 
>>>>>> https://github.com/chaoss/wg-gmd/pulls
>>>>>> <https://github.com/chaoss/wg-gmd/pulls>
>>>>>> 
>>>>>> Thanks!
>>>>>> 
>>>>>> Jesus & Sean _______________________________________________
>>>>>> Oss-health-metrics mailing list
>>>>>> Oss-health-metrics at lists.linuxfoundation.org
>>>>>> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-met
>>>>>> ri
>>>>>> cs
>>>>> 
>>>>> 
>>>>> --
>>>>> Daniel M. German                  "Often a small and simple
>>>>> question can chisel away at the biggest problems"
>>>>>                                   Levitt and Dubner
>>>>> http://turingmachine.org/
>>>>> http://silvernegative.com/
>>>>> dmg (at) uvic (dot) ca
>>>>> replace (at) with @ and (dot) with .
>>>>> _______________________________________________
>>>>> Oss-health-metrics mailing list
>>>>> Oss-health-metrics at lists.linuxfoundation.org
>>>>> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metri
>>>>> cs
>>> --
>>> Bitergia: http://bitergia.com
>>> /me at Twitter: https://twitter.com/jgbarah
>>> 
>>> _______________________________________________
>>> Chaoss-members mailing list
>>> Chaoss-members at lists.linuxfoundation.org
>>> https://lists.linuxfoundation.org/mailman/listinfo/chaoss-members
>> 
>> 
>> 
>> --
>> --dmg
>> 
>> ---
>> Daniel M. German
>> http://turingmachine.org
> 
> 
> 
> -- 
> --dmg
> 
> ---
> D M German
> http://turingmachine.org
> _______________________________________________
> Oss-health-metrics mailing list
> Oss-health-metrics at lists.linuxfoundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metrics


More information about the Chaoss-members mailing list