[PATCH 06/12] cgroup, memcg: move cgroup_event implementation to memcg

Tue Aug 27 20:00:02 UTC 2013

Hey, Michal.

On Tue, Aug 27, 2013 at 04:20:02PM +0200, Michal Hocko wrote:
> Doing notification and read() every time might be a concern for
> embedded-world who are worried about too many wakeups. We have
> already seen suggestions to do a different modes of event triggering
> to reduce wake up costs because of the power consumption (e.g.
> http://comments.gmane.org/gmane.linux.kernel.mm/101628).

Given the expected frequency of the event, I highly doubt it's gonna
matter.  The patch you linked is different.  That's just the event
designed wrong.  Sigh... so it's mixing events for vmpressure state
changing and reclaim activities happening?  Am I understanding it
correctly?  If so, I really have no idea why so many things are so ill
designed, even the new things.

* As a general principle, events should either be edge-triggered or
  level-triggered with a way to acknowledge the current state;
  otherwise, it's a fundamentally broken design.

* Let's please not slide in some completely unobvious side-channel
  information into an event.  If someone wants to watch reclaim
  events, add a dedicated event for that rather than overloading state
  changed event.

Let's just hope I grossly misunderstood the whole thread.

If it *really* is critical to discern low/mid and mid/high
transitions, we can add a seprate file so that the transitions can be
monitored separately but I can't emphasize enough that we shouldn't
chase every single requirement people come up with without the overall
sense of its relative importance and impact on long term maintenance
of the subsystem.  I'm highly skeptical that distinguishing the two
transitions and saving the occassional spurious events would be
justifiable.

> You just forgot to tell us what is that "something simpler". Does it
> exist yet? What is the semantic?

It's gonna be a file modified event, exactly the same as regular
files.  Haven't we gone over this multiple times now?  cgroup
implementation doesn't exist yet but everything about normal file
events including its semantics are already well defined and
understood.

> We have been trying to reduce memcg specific things in the past and
> this will add non trivial chunk of code. I would at least expect some
> justification _why_ moving the maintenance burden is worth it. It
> certainly won't make memcg live easier. I can bite a bullet though if
> this is the roadblock for making important changes in the cgroup core.
> But you didn't tell us anything like that, except that you do not like
> the interface because like other parts it is over-engineered thus bad.

Yes, it is a road block in two ways.

* Everything is being made per-css which is necessary as css's
  lifetime would no longer coincide with cgroup's.  Keeping this in
  cgroup proper would mean that it needs to be updated so that it's
  attached to css instead of cgroup.

* The file system interface of cgroup will go through sysfs so that
  cgroup doesn't have to worry about inode locking maze.  This is a
  major issue for nested subsys enable / disable and implementating
  migration as currently we end up having to lock inodes which belong
  to different subtrees and vfs doesn't define locking order between
  them.  So, short of meddling with vfs rename mutex, it'll deadlock.

> You have mentioned it will help you clean up code further in the past
> but I do not see any mention about it in this patch neither in the
> leader email. Could you be more specific? How much? Is this piece of
> code blocking those cleanups?

See above.

> That is what I _really_ dislike about this patch and why I am really
> reluctant to ack it.

See above.

> And we might end up having that code there for ever because your new and
> yet to be shown interface might turn out to be not the best fit for the
> current users.

We're gonna keep that code for years no matter what and you know that
the existing interface has fundamental issues.  Regardless of what we
do in the future, it needs to go.  That much is clear, isn't it?

> Tejun, you have done _a lot_ of great work on cleaning up cgroup
> core mess and I really appreciate that! I was supporting you in some
> of those, I wish I had more time to do more. But I think you are
> deprecating some things too easily without carrying much about the
> current users assuming they will cope with that somehow and that a magic
> central authority will do everything for them in a sane way. I am quite
> skeptical, to be honest.

This one eventually has to go.  It's the wrong piece of logic at the
wrong place / layer.  The sequence could be debatable but I don't
wanna introduce new thing before the filesystem part is converted to
sysfs as it'll need to be re-done then and I can tell you with
relative high level of confidence that the new interface will be
implemented in some months and its interface will be something along
the line of css_notify(css).

And as for the general skepticism, well, I can't make you to see
things my way.  I have been trying pretty hard with all involved and
am confident enough that at least enough are looking towards the same
general direction and the progress is pretty healthy too.  As for
memcg, I don't know.  I frankly have no idea what your general
direction and long term plan are and memcg in general still seems to
be lost quite often.  So, ummm, I don't know.  Can you at least
pretend to play along?

Thanks.

-- 
tejun