[Chaoss-members] [Oss-health-metrics] Growth Maturity and Decline Working Group Update

Jesus M. Gonzalez-Barahona jgb at bitergia.com
Mon Jun 25 21:02:01 UTC 2018


Thanks, Daniel. I feel we're approaching to interesting results. Let me
summarize:

* We could define, for each metric, how it relates to time. In
particular, if they are computed, in your words:

- At a point in time. For example, number of SLOCs in a project
 
- During a period. For example, number of commits.

I tend to call the first one "snapshot metrics", because you compute
them on a snapshot of the artifact (eg, source code, when calculating
SLOC) at a given point in time. For the second one, in fact you have
discrete events that can be grouped by time periods (and in the
process, maybe computing some aggregated metric, not necessarily just a
count).

How do you think we could reflect this in out definitions for the
different metrics?

* We could define which kinds of time-based aggregations are meaningful
for each metric (which time series make sense, which time periods...).
I have not made up my mind about this, but maybe we could have like
default periods (like "during the previous 30 days") or default periods
for time series (like "monthly")?

Saludos,

	Jesus.
 
On Fri, 2018-06-15 at 12:01 -0700, dmg wrote:
> Jesus M. Gonzalez-Barahona <jgb at bitergia.com> writes:
> 
> > WRT to the time series: I agree that in most cases, what you 
> > want is to
> > know the evolution of the number of new people over time. And 
> > that's
> > basically a time series. However, as Daniel mentions, there are 
> > some
> > cases where what you want is to know new people over a certain 
> > time
> > period. For example, number of people joining during the last 
> > year, or
> > number of people joining over a certain release cycle.
> > 
> > Therefore, being the time series a very usual case, the usual
> > assumptions in time series that all periods are equal (or at 
> > least
> > comparable), maybe don't hold for all the cases.
> > 
> 
> I think the generalization of metrics is this regard is the 
> following:
> 
> There are two types of metrics (maybe more?) based on when they 
> are applied:
> 
> - At a point in time. For example, number of SLOCs in a project
> 
> - During a period. For example, number of commits.
> 
> Note that we can have a period that includes everything from 
> beginning to end of the life of a project
> (number of people who have left the project, as Jesús implies.
> 
> This is an intrinsic feature of a metric.
> 
> Then, the evolution of this metric is a time series. The 
> calculation of this time series is defined
> by the distance between observations. a "point in time" metric is 
> computed at the edges of the periods.
> The "during-a-period" metric is computed for each period of the 
> time series.
> 
> > This said, I think this is a pretty good case example, because 
> > many
> > metrics are going to be useful in periods. That's why, in 
> > general, I'm
> > more in favor of first defining the metric, and then decide 
> > which
> > filters or groupings (binning) are applicable to it. In this 
> > case, both
> > filtering for a period, and binning for regular periods of time 
> > is
> > sensible. But none of them influence the metric itself, I think.
> 
> Note that this is not exactly the same a "bining". A time series 
> might look like a histogram, but it is not the same.
> 
> The way a time series is displayed depends on the resolution at 
> which it is displayed. And how the data might be "collated" when the
> resolution of the time series is much smaller than the 
> resolution it is displayed at.
> 
> > 
> > Comparte this to the case of the efficiency metric in
> > 
> > https://github.com/chaoss/wg-gmd/pull/12
> > 
> > which, as it is defined in that proposal, *needs* a time frame, 
> > because
> > there are specific references to it in the definition of the 
> > metric
> > itself.
> > 
> > And yes, Daniel is completely right: if there is no time periods
> > defined, the number of new people is exactly the same as the 
> > number of
> > people. However, for having a different perspective on a very 
> > similar
> > metric, maybe consider the case of "number of people leaving the
> > project", where both cases make sense and are potentially 
> > interesting:
> > "how many people left the project" (during all the history of 
> > the
> > project) or "how many people left the project" (for a certain 
> > period).
> > 
> > I the case of leaving people, of course you need to define how 
> > you
> > consider people left, but that's another issue.
> > 
> > Saludos,
> > 
> > 	Jesus.
> > 
> > On Fri, 2018-06-15 at 01:05 +0000, Serebrenik, A. wrote:
> > > Dear all
> > > 
> > > Just a small note that I agree with Daniel that the time period
> > > should be a parameter: in this way we can evaluate sensitivity 
> > > of the
> > > measurement-based conclusions to the specific choice of the 
> > > time
> > > period duration.
> > > 
> > > Yet another point to keep in mind are the zeros, i.e., periods 
> > > when
> > > no new activity has taken place, no new developers have joined 
> > > etc. A
> > > simple but not necessary correct way of dealing with the zeros 
> > > would
> > > be to exclude them from the consideration. Alternatively, one 
> > > can
> > > keep the zeros and use a more complex statistical machinery to
> > > analyze the measurements.
> > > 
> > > Best wishes
> > > Alexander
> > > 
> > > Verstuurd vanaf mijn iPhone
> > > 
> > > > Op 15 jun. 2018 om 01:19 heeft dmg <dmg at uvic.ca> het
> > > > volgende geschreven:
> > > > 
> > > > I think these metrics should be defined as time series, where 
> > > > the
> > > > period between the observations is a parameter.
> > > > 
> > > > This is NOT the same as "over a period of time". A time 
> > > > series
> > > > imply a
> > > > list of values one for each specific moment of time.
> > > > 
> > > > I guess there is a use case for a specific  period: how many 
> > > > new
> > > > contributors were added in the last year? but this is simply 
> > > > an
> > > > element of the time series.
> > > > 
> > > > i think there is a strong case that many metrics must be 
> > > > computed
> > > > as a
> > > > time series. The user might determine the period of interest
> > > > (weekly,
> > > > hourly, yearly, etc).
> > > > 
> > > > 
> > > > > On Thu, Jun 14, 2018 at 3:52 PM, dmg <dmg at uvic.ca> wrote:
> > > > > I think these metrics should be defined as time series, 
> > > > > where the
> > > > > period between the observations is a parameter.
> > > > > 
> > > > > This is NOT the same as "over a period of time". A time 
> > > > > series
> > > > > imply a
> > > > > list of values one for each specific moment of time.
> > > > > 
> > > > > On Thu, Jun 14, 2018 at 3:50 PM, Jesus M. Gonzalez-Barahona
> > > > > <jgb at bitergia.com> wrote:
> > > > > > On Thu, 2018-06-14 at 23:59 +0200, Jesus 
> > > > > > M. Gonzalez-Barahona
> > > > > > wrote:
> > > > > > > > On Thu, 2018-06-14 at 13:52 -0700, dmg wrote:
> > > > > > > > Sean Goggins <s at goggins.com> writes:
> > > > > > > > 
> > > > > > > > > Hi All:
> > > > > > > > > 
> > > > > > > > > During our Growth Maturity and Decline Metrics 
> > > > > > > > > working
> > > > > > > > > group
> > > > > > > > > today we discussed two specific metrics:
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > with all respect to those who are doing the work, I 
> > > > > > > > feel
> > > > > > > > this
> > > > > > > > method of defining metrics is flawed.
> > > > > > > > 
> > > > > > > > Take for example Pullrequest 13:
> > > > > > > > 
> > > > > > > > + [New Overall Contributors](activity-metrics/new-
> > > > > > > > contributors.md)
> > > > > > > > > What is the overall number of new contributors?
> > > > > > > > 
> > > > > > > > +[New Contributors of
> > > > > > > > Commits](activity-metrics/new-contributors-commits.md) 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What is
> > > > > > > > the number of persons contributing with an accepted 
> > > > > > > > commit
> > > > > > > > for
> > > > > > > > the first time?
> > > > > > > > +[New Contributors of Opened
> > > > > > > > Issues](activity-metrics/new-contributors-issues-
> > > > > > > > opened.md)
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What is the number of persons opening an issue for 
> > > > > > > > the
> > > > > > > > first
> > > > > > > > time?
> > > > > > > > +[New Contributors of Closed
> > > > > > > > Issues](activity-metrics/new-contributors-issues-
> > > > > > > > closed.md)
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What is the number of persons closing an issue for 
> > > > > > > > the
> > > > > > > > first
> > > > > > > > time?
> > > > > > > > +[New Contributors of Initiated Code
> > > > > > > > Reviews](activity-metrics/new-contributors-code-
> > > > > > > > reviews-
> > > > > > > > opened.md)
> > > > > > > > > What is the number of persons initiating a code 
> > > > > > > > > review
> > > > > > > > > for the
> > > > > > > > 
> > > > > > > > first time?
> > > > > > > > +[New Contributors of Reviews for
> > > > > > > > Code](activity-metrics/new-contributors-code-
> > > > > > > > reviews.md) 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What
> > > > > > > > is the number of persons contributing with reviews of 
> > > > > > > > code
> > > > > > > > for
> > > > > > > > the first time?
> > > > > > > > +[New Contributors of Posted
> > > > > > > > Messages](activity-metrics/new-contributors-posts.md) 
> > > > > > > > > 
> > > > > > > > 
> > > > > > > > What is
> > > > > > > > the number of persons posting messages in mailing 
> > > > > > > > lists for
> > > > > > > > the
> > > > > > > > first time?
> > > > > > > > 
> > > > > > > > Based on this definition, i assert that the number of 
> > > > > > > > new
> > > > > > > > contributors to a project is equal to the number of
> > > > > > > > contributors
> > > > > > > > of that project. Anybody wants to prove me wrong?
> > > > > > > 
> > > > > > > Daniel, have a look at the pr. The metric is defined 
> > > > > > > for a
> > > > > > > period of
> > > > > > > time. Or maybe I'm missing something?
> > > > > > 
> > > > > > /me kicks /me pretty hard for being so dumb.
> > > > > > 
> > > > > > You are completely right, Daniel, the pr does not mention 
> > > > > > in
> > > > > > any place
> > > > > > that this is for a period of time. I was confused with 
> > > > > > the pr
> > > > > > on
> > > > > > efficiency, which I was discussing in some detail during 
> > > > > > our
> > > > > > meeting
> > > > > > today.
> > > > > > 
> > > > > > I'm so sorry for my confusion.
> > > > > > 
> > > > > > Please see https://github.com/chaoss/wg-gmd/pull/12/files 
> > > > > > for
> > > > > > how we
> > > > > > are dealing with period in that other metric about 
> > > > > > efficiency.
> > > > > > 
> > > > > > Yes, the detailed definition of the metric (to be 
> > > > > > written)
> > > > > > should
> > > > > > clearly state that it is defined over a certain period of 
> > > > > > time.
> > > > > > If you
> > > > > > feel that should be in the name of the metric, which is 
> > > > > > the
> > > > > > only part
> > > > > > which is written for now, we can discuss it. I see pros 
> > > > > > and
> > > > > > cons to
> > > > > > have very detailed names for the metrics.
> > > > > > 
> > > > > > Again, sorry for the noise,
> > > > > > 
> > > > > >        Jesus.
> > > > > > 
> > > > > > >      Jesus.
> > > > > > > 
> > > > > > > > What we need is to think more holistically and think 
> > > > > > > > more
> > > > > > > > in term
> > > > > > > > of what we are measuring.
> > > > > > > > 
> > > > > > > > First, "a new contributors" metric is not a _new_ 
> > > > > > > > metric.
> > > > > > > > It is a
> > > > > > > > derived metric. Is a filtering of an activity metric 
> > > > > > > > that
> > > > > > > > has been
> > > > > > > > filtered to particular subset of individuals.
> > > > > > > > 
> > > > > > > > We need to clearly define what we can measure and 
> > > > > > > > what we
> > > > > > > > can
> > > > > > > > derive from what we can measure.
> > > > > > > > 
> > > > > > > > here is a proposal:
> > > > > > > > 
> > > > > > > > perhaps we should first start with what we can 
> > > > > > > > measure.
> > > > > > > > What are
> > > > > > > > observable  entities? Then based on this entities 
> > > > > > > > define
> > > > > > > > "lists"
> > > > > > > > of activities.
> > > > > > > > Each activity has many attributes: type, who is 
> > > > > > > > involved
> > > > > > > > with it,
> > > > > > > > when it was done, etc. An activity is polymorphic.
> > > > > > > > 
> > > > > > > > Then we can define metrics in terms of filtering. For
> > > > > > > > instance,
> > > > > > > > "commits by first contributors" is the result of 
> > > > > > > > filtering
> > > > > > > > activities of type commit such that we only capture 
> > > > > > > > the
> > > > > > > > first
> > > > > > > > commit from each person.
> > > > > > > > 
> > > > > > > > Now, there is also the issue of 'work' vs 
> > > > > > > > 'power'. Work is
> > > > > > > > absolute (think physics), while power is avg power 
> > > > > > > > over
> > > > > > > > unit of
> > > > > > > > time.
> > > > > > > > 
> > > > > > > > The metric I defined above is absolute. If I want to
> > > > > > > > compute its
> > > > > > > > "time related" one I have to define a period, 
> > > > > > > > basically,
> > > > > > > > the
> > > > > > > > "average number of commits by first contributors" 
> > > > > > > > over
> > > > > > > > "some unit
> > > > > > > > of time".
> > > > > > > > or I can define it more fine grained, as a time 
> > > > > > > > series,
> > > > > > > > where I
> > > > > > > > compute the average over a fix period. Then the 
> > > > > > > > result is a
> > > > > > > > time
> > > > > > > > series.
> > > > > > > > 
> > > > > > > > for example: I can define the Time series of new
> > > > > > > > contributors as:
> > > > > > > > 
> > > > > > > > montly new contributors = TimeSeries( count(filter 
> > > > > > > > <keep
> > > > > > > > only the
> > > > > > > > first activity of each contributor> activities)) per 
> > > > > > > > month
> > > > > > > > 
> > > > > > > > montly new commmitters = TimeSeries( count(filter 
> > > > > > > > <keep
> > > > > > > > only the
> > > > > > > > first activity of each contributor> filter <commits>
> > > > > > > > activities))
> > > > > > > > per month
> > > > > > > > 
> > > > > > > > 
> > > > > > > > Efficiency in PR 12 is flawed to.
> > > > > > > > 
> > > > > > > > Note that in this context, efficiency (as defined in 
> > > > > > > > the
> > > > > > > > PR) is
> > > > > > > > also an absolute metric:
> > > > > > > > 
> > > > > > > >   Formula:** 'issues_closed / (issues_opened +
> > > > > > > > issues_backlog)'
> > > > > > > > 
> > > > > > > > but that is ok, because it can be converted into a 
> > > > > > > > time
> > > > > > > > series.
> > > > > > > > 
> > > > > > > > We can still define it in terms of a filtering of the
> > > > > > > > activities:
> > > > > > > > 
> > > > > > > > issue resolution efficiency = count(filter 
> > > > > > > > <type=issue and
> > > > > > > > status=closed> activities)/ count(filter <type=issue 
> > > > > > > > and
> > > > > > > > status=(not closed> activities)
> > > > > > > > 
> > > > > > > > but this rate is only useful when it is converted 
> > > > > > > > into a
> > > > > > > > time
> > > > > > > > series. So with my made-up-notation:
> > > > > > > > 
> > > > > > > > monthly issue resolution efficiency =
> > > > > > > > TimeSeries(count(filter
> > > > > > > > <type=issue and status=closed> activities)/ 
> > > > > > > > count(filter
> > > > > > > > <type=issue and status=(not closed> activities)) per 
> > > > > > > > month
> > > > > > > > 
> > > > > > > > I personally  don't like the name "efficiency". Its 
> > > > > > > > meaning
> > > > > > > > is
> > > > > > > > rate of output to input. This is not what this is
> > > > > > > > measuring. A
> > > > > > > > project that did not have any new issues
> > > > > > > > and did not close an outstanding issue would have the 
> > > > > > > > same
> > > > > > > > efficiency as in the previous period, but nothing has 
> > > > > > > > being
> > > > > > > > done.
> > > > > > > > 
> > > > > > > > 
> > > > > > > > --dmg
> > > > > > > > 
> > > > > > > > 
> > > > > > > > > 1. New Contributors and
> > > > > > > > > https://github.com/chaoss/wg-gmd/pull/13
> > > > > > > > > <https://github.com/chaoss/wg-gmd/pull/13>
> > > > > > > > > 2. Issue Resolution Efficiency
> > > > > > > > > https://github.com/chaoss/wg-gmd/pull/12
> > > > > > > > > <https://github.com/chaoss/wg-gmd/pull/12>
> > > > > > > > > 
> > > > > > > > > These two metrics share the characteristic that 
> > > > > > > > > their
> > > > > > > > > expression
> > > > > > > > > is likely to be parameterized in different 
> > > > > > > > > ways. You can
> > > > > > > > > follow
> > > > > > > > > the examples and discussion on the associated pull
> > > > > > > > > requests,
> > > > > > > > > noted above.
> > > > > > > > > 
> > > > > > > > > We encourage participation from community managers 
> > > > > > > > > during
> > > > > > > > > our
> > > > > > > > > next call, at 11am CDT on June
> > > > > > > > > 28th. https://unomaha.zoom.us/j/720431288
> > > > > > > > > <https://unomaha.zoom.us/j/720431288>
> > > > > > > > > 
> > > > > > > > > Whether or not you are able to make the next call, 
> > > > > > > > > please
> > > > > > > > > review
> > > > > > > > > and comment if you are interested on the two pull
> > > > > > > > > requests from
> > > > > > > > > Jesus, noted above and here:
> > > > > > > > > 
> > > > > > > > > https://github.com/chaoss/wg-gmd/pulls
> > > > > > > > > <https://github.com/chaoss/wg-gmd/pulls>
> > > > > > > > > 
> > > > > > > > > Thanks!
> > > > > > > > > 
> > > > > > > > > Jesus & Sean
> > > > > > > > > _______________________________________________
> > > > > > > > > Oss-health-metrics mailing list
> > > > > > > > > Oss-health-metrics at lists.linuxfoundation.org
> > > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/os
> > > > > > > > > s-he
> > > > > > > > > alth-met
> > > > > > > > > ri
> > > > > > > > > cs
> > > > > > > > 
> > > > > > > > 
> > > > > > > > --
> > > > > > > > Daniel M. German                  "Often a small and 
> > > > > > > > simple
> > > > > > > > question can chisel away at the biggest problems"
> > > > > > > >                                   Levitt and Dubner
> > > > > > > > http://turingmachine.org/
> > > > > > > > http://silvernegative.com/
> > > > > > > > dmg (at) uvic (dot) ca
> > > > > > > > replace (at) with @ and (dot) with .
> > > > > > > > _______________________________________________
> > > > > > > > Oss-health-metrics mailing list
> > > > > > > > Oss-health-metrics at lists.linuxfoundation.org
> > > > > > > > https://lists.linuxfoundation.org/mailman/listinfo/oss-
> > > > > > > > heal
> > > > > > > > th-metri
> > > > > > > > cs
> > > > > > 
> > > > > > --
> > > > > > Bitergia: http://bitergia.com
> > > > > > /me at Twitter: https://twitter.com/jgbarah
> > > > > > 
> > > > > > _______________________________________________
> > > > > > Chaoss-members mailing list
> > > > > > Chaoss-members at lists.linuxfoundation.org
> > > > > > https://lists.linuxfoundation.org/mailman/listinfo/chaoss-m
> > > > > > embe
> > > > > > rs
> > > > > 
> > > > > 
> > > > > 
> > > > > --
> > > > > --dmg
> > > > > 
> > > > > ---
> > > > > Daniel M. German
> > > > > http://turingmachine.org
> > > > 
> > > > 
> > > > 
> > > > --
> > > > --dmg
> > > > 
> > > > ---
> > > > D M German
> > > > http://turingmachine.org
> > > > _______________________________________________
> > > > Oss-health-metrics mailing list
> > > > Oss-health-metrics at lists.linuxfoundation.org
> > > > https://lists.linuxfoundation.org/mailman/listinfo/oss-health-m
> > > > etri
> > > > cs
> 
> 
> --
> Daniel M. German                  "My friends would think I was a 
> nut,
>                                    turning water into wine,
>    Solsbury Hill, Peter Gabriel -> opening doors that seem to be 
>    shut."
> http://turingmachine.org/
> http://silvernegative.com/
> dmg (at) uvic (dot) ca
> replace (at) with @ and (dot) with .
-- 
Bitergia: http://bitergia.com
/me at Twitter: https://twitter.com/jgbarah



More information about the Chaoss-members mailing list