Per user rlimits

Aleksa Sarai asarai at suse.de
Mon Aug 31 08:09:41 UTC 2020


On 2020-08-28, Sargun Dhillon <sargun at sargun.me> wrote:
> On Fri, Aug 28, 2020 at 12:29 PM Eric W. Biederman
> <ebiederm at xmission.com> wrote:
> > Just to scope how much work it would be to fix rlimits
> > so they are not a problem for user namespaces I took a quick
> > survey.
> >
> > The rlimits can be found in
> > include/uapi/asm-generic/resource.h
> >
> > There are a total of 16 rlimits.
> > There are only 4 rlimits that are enforced at anything other
> > than process granularity.
> >
> > RLIMIT_NPROC
> > RLIMIT_MEMLOCK
> > RLIMIT_SIGPENDING
> > RLIMIT_MSGQUEUE
> >
> > So it should not be difficult to fix those rlimits.
> 
> What are your proposed semantics for what the "fix" would look like? Or
> are you saying that once we take on Christian's proposal of 64-bit kuid
> they would be trivial to fix? I think the reason we didn't move forward with
> fixing it is the only real thing we could agree upon is an rlimit namespace,

From memory, we did briefly discuss how this would work in the call. I
believe the basic idea was that the host rlimit would act as a maximum
setting but there would be an optional lower limit that a user namespace
could set and would be accounted separately. That way containers
wouldn't interfere with each others' rlimit settings. I imagine this
would be nested with user namespaces and presumable means that rlimit
would now be attached to userns directly.

(But I might be misremembering the details of the proposal. I do
remember Eric mentioning that the "maximum namespaces" sysctl semantics
were a useful model to look at.)

> and then you get into a question of why do these even exist, and should
> they just be cgroup(v2) controllers, and should calling setrlimit just
> be a wrapper around a cgroup(v2) controller that has a map of
> uid -> limit?

To mirror what I said when this came up in the actual discussion, the
reason why we don't have cgroups for all of these things is that some of
those limits aren't "real resources" and arguably should all be managed
through kmemcg policies.

Right after getting the pids cgroup controller merged, I did mention
adding controllers for the other rlimits and Tejun said that they didn't
make sense to add ([1] is one of the responses I found through a quick
search). The only reason the pids controller was merged is that you
could still fork-bomb a system even with modest kmemcg limits.

[1]: https://lore.kernel.org/lkml/20150227114940.GB3964@htj.duckdns.org/

-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH
<https://www.cyphar.com/>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: not available
URL: <http://lists.linuxfoundation.org/pipermail/containers/attachments/20200831/61c11558/attachment.sig>


More information about the Containers mailing list