[Chaoss-members] [Oss-health-metrics] Growth Maturity and Decline Working Group Update

dmg dmg at turingmachine.org
Thu Jun 14 23:18:03 UTC 2018


I think these metrics should be defined as time series, where the
period between the observations is a parameter.

This is NOT the same as "over a period of time". A time series imply a
list of values one for each specific moment of time.

I guess there is a use case for a specific  period: how many new
contributors were added in the last year? but this is simply an
element of the time series.

i think there is a strong case that many metrics must be computed as a
time series. The user might determine the period of interest (weekly,
hourly, yearly, etc).


On Thu, Jun 14, 2018 at 3:52 PM, dmg <dmg at uvic.ca> wrote:
> I think these metrics should be defined as time series, where the
> period between the observations is a parameter.
>
> This is NOT the same as "over a period of time". A time series imply a
> list of values one for each specific moment of time.
>
> On Thu, Jun 14, 2018 at 3:50 PM, Jesus M. Gonzalez-Barahona
> <jgb at bitergia.com> wrote:
>> On Thu, 2018-06-14 at 23:59 +0200, Jesus M. Gonzalez-Barahona wrote:
>>> On Thu, 2018-06-14 at 13:52 -0700, dmg wrote:
>>> > Sean Goggins <s at goggins.com> writes:
>>> >
>>> > > Hi All:
>>> > >
>>> > > During our Growth Maturity and Decline Metrics working group
>>> > > today we discussed two specific metrics:
>>> > >
>>> >
>>> > with all respect to those who are doing the work, I feel this
>>> > method of defining metrics is flawed.
>>> >
>>> > Take for example Pullrequest 13:
>>> >
>>> > + [New Overall Contributors](activity-metrics/new-contributors.md)
>>> > > What is the overall number of new contributors?
>>> >
>>> >  +[New Contributors of
>>> >  Commits](activity-metrics/new-contributors-commits.md) | What is
>>> >  the number of persons contributing with an accepted commit for
>>> >  the first time?
>>> >  +[New Contributors of Opened
>>> >  Issues](activity-metrics/new-contributors-issues-opened.md) |
>>> >  What is the number of persons opening an issue for the first
>>> >  time?
>>> >  +[New Contributors of Closed
>>> >  Issues](activity-metrics/new-contributors-issues-closed.md) |
>>> >  What is the number of persons closing an issue for the first
>>> >  time?
>>> >  +[New Contributors of Initiated Code
>>> >  Reviews](activity-metrics/new-contributors-code-reviews-
>>> > opened.md)
>>> >  | What is the number of persons initiating a code review for the
>>> >  first time?
>>> >  +[New Contributors of Reviews for
>>> >  Code](activity-metrics/new-contributors-code-reviews.md) | What
>>> >  is the number of persons contributing with reviews of code for
>>> >  the first time?
>>> >  +[New Contributors of Posted
>>> >  Messages](activity-metrics/new-contributors-posts.md) | What is
>>> >  the number of persons posting messages in mailing lists for the
>>> >  first time?
>>> >
>>> > Based on this definition, i assert that the number of new
>>> > contributors to a project is equal to the number of contributors
>>> > of that project. Anybody wants to prove me wrong?
>>>
>>> Daniel, have a look at the pr. The metric is defined for a period of
>>> time. Or maybe I'm missing something?
>>
>> /me kicks /me pretty hard for being so dumb.
>>
>> You are completely right, Daniel, the pr does not mention in any place
>> that this is for a period of time. I was confused with the pr on
>> efficiency, which I was discussing in some detail during our meeting
>> today.
>>
>> I'm so sorry for my confusion.
>>
>> Please see https://github.com/chaoss/wg-gmd/pull/12/files for how we
>> are dealing with period in that other metric about efficiency.
>>
>> Yes, the detailed definition of the metric (to be written) should
>> clearly state that it is defined over a certain period of time. If you
>> feel that should be in the name of the metric, which is the only part
>> which is written for now, we can discuss it. I see pros and cons to
>> have very detailed names for the metrics.
>>
>> Again, sorry for the noise,
>>
>>         Jesus.
>>
>>>       Jesus.
>>>
>>> > What we need is to think more holistically and think more in term
>>> > of what we are measuring.
>>> >
>>> > First, "a new contributors" metric is not a _new_ metric. It is a
>>> > derived metric. Is a filtering of an activity metric that has been
>>> > filtered to particular subset of individuals.
>>> >
>>> > We need to clearly define what we can measure and what we can
>>> > derive from what we can measure.
>>> >
>>> > here is a proposal:
>>> >
>>> > perhaps we should first start with what we can measure. What are
>>> > observable  entities? Then based on this entities define "lists"
>>> > of activities.
>>> > Each activity has many attributes: type, who is involved with it,
>>> > when it was done, etc. An activity is polymorphic.
>>> >
>>> > Then we can define metrics in terms of filtering. For instance,
>>> > "commits by first contributors" is the result of filtering
>>> > activities of type commit such that we only capture the first
>>> > commit from each person.
>>> >
>>> > Now, there is also the issue of 'work' vs 'power'. Work is
>>> > absolute (think physics), while power is avg power over unit of
>>> > time.
>>> >
>>> > The metric I defined above is absolute. If I want to compute its
>>> > "time related" one I have to define a period, basically, the
>>> > "average number of commits by first contributors" over "some unit
>>> > of time".
>>> > or I can define it more fine grained, as a time series, where I
>>> > compute the average over a fix period. Then the result is a time
>>> > series.
>>> >
>>> > for example: I can define the Time series of new contributors as:
>>> >
>>> > montly new contributors = TimeSeries( count(filter <keep only the
>>> > first activity of each contributor> activities)) per month
>>> >
>>> > montly new commmitters = TimeSeries( count(filter <keep only the
>>> > first activity of each contributor> filter <commits> activities))
>>> > per month
>>> >
>>> >
>>> > Efficiency in PR 12 is flawed to.
>>> >
>>> > Note that in this context, efficiency (as defined in the PR) is
>>> > also an absolute metric:
>>> >
>>> >    Formula:** 'issues_closed / (issues_opened + issues_backlog)'
>>> >
>>> > but that is ok, because it can be converted into a time series.
>>> >
>>> > We can still define it in terms of a filtering of the activities:
>>> >
>>> > issue resolution efficiency = count(filter <type=issue and
>>> > status=closed> activities)/ count(filter <type=issue and
>>> > status=(not closed> activities)
>>> >
>>> > but this rate is only useful when it is converted into a time
>>> > series. So with my made-up-notation:
>>> >
>>> > monthly issue resolution efficiency = TimeSeries(count(filter
>>> > <type=issue and status=closed> activities)/ count(filter
>>> > <type=issue and status=(not closed> activities)) per month
>>> >
>>> > I personally  don't like the name "efficiency". Its meaning is
>>> > rate of output to input. This is not what this is measuring. A
>>> > project that did not have any new issues
>>> > and did not close an outstanding issue would have the same
>>> > efficiency as in the previous period, but nothing has being done.
>>> >
>>> >
>>> > --dmg
>>> >
>>> >
>>> > > 1. New Contributors and
>>> > > https://github.com/chaoss/wg-gmd/pull/13
>>> > > <https://github.com/chaoss/wg-gmd/pull/13>
>>> > > 2. Issue Resolution Efficiency
>>> > > https://github.com/chaoss/wg-gmd/pull/12
>>> > > <https://github.com/chaoss/wg-gmd/pull/12>
>>> > >
>>> > > These two metrics share the characteristic that their expression
>>> > > is likely to be parameterized in different ways. You can follow
>>> > > the examples and discussion on the associated pull requests,
>>> > > noted above.
>>> > >
>>> > > We encourage participation from community managers during our
>>> > > next call, at 11am CDT on June
>>> > > 28th. https://unomaha.zoom.us/j/720431288
>>> > > <https://unomaha.zoom.us/j/720431288>
>>> > >
>>> > > Whether or not you are able to make the next call, please review
>>> > > and comment if you are interested on the two pull requests from
>>> > > Jesus, noted above and here:
>>> > >
>>> > > https://github.com/chaoss/wg-gmd/pulls
>>> > > <https://github.com/chaoss/wg-gmd/pulls>
>>> > >
>>> > > Thanks!
>>> > >
>>> > > Jesus & Sean _______________________________________________
>>> > > Oss-health-metrics mailing list
>>> > > Oss-health-metrics at lists.linuxfoundation.org
>>> > > https://lists.linuxfoundation.org/mailman/listinfo/oss-health-met
>>> > > ri
>>> > > cs
>>> >
>>> >
>>> > --
>>> > Daniel M. German                  "Often a small and simple
>>> > question can chisel away at the biggest problems"
>>> >                                    Levitt and Dubner
>>> > http://turingmachine.org/
>>> > http://silvernegative.com/
>>> > dmg (at) uvic (dot) ca
>>> > replace (at) with @ and (dot) with .
>>> > _______________________________________________
>>> > Oss-health-metrics mailing list
>>> > Oss-health-metrics at lists.linuxfoundation.org
>>> > https://lists.linuxfoundation.org/mailman/listinfo/oss-health-metri
>>> > cs
>> --
>> Bitergia: http://bitergia.com
>> /me at Twitter: https://twitter.com/jgbarah
>>
>> _______________________________________________
>> Chaoss-members mailing list
>> Chaoss-members at lists.linuxfoundation.org
>> https://lists.linuxfoundation.org/mailman/listinfo/chaoss-members
>
>
>
> --
> --dmg
>
> ---
> Daniel M. German
> http://turingmachine.org



-- 
--dmg

---
D M German
http://turingmachine.org


More information about the Chaoss-members mailing list