From hare at suse.com Sat Jul 1 17:24:39 2017 From: hare at suse.com (Hannes Reinecke) Date: Sat, 1 Jul 2017 19:24:39 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170630175201.GC26257@fury> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170630175201.GC26257@fury> Message-ID: <92305974-328a-d1b2-7301-4321f374ab8f@suse.com> On 06/30/2017 07:52 PM, Darren Hart wrote: > On Tue, Jun 27, 2017 at 10:18:04AM -0700, Linus Torvalds wrote: >> On Tue, Jun 27, 2017 at 6:58 AM, Sergey Senozhatsky >> wrote: >>> >>> am I the only one who struggle with the Kconfig sometimes? >> >> I hate our Kconfig. It's my least favorite part of the kernel. It asks >> questions about insane things that nobody can know the answer to. >> >> Taking a distro default config and doing"make localmodconfig" is what >> I end up doing on new machines, and it has all kinds of suckage too. >> >> I don't have a solution to it. But I think part of the solution would >> be for us to have various "sane minimal requirement" Kconfig >> fragments, and trhe ability to feed them incrementally, so that people >> can build up a sane Kconfig from "I want this". > > This was, in part, the intent behind the configuration fragments and the > merge_config.sh script. I use this with the x86 platform drivers: > > $ make defconfig pdx86.config > > But I have to generate, also scripted, the pdx86.config by scraping the > Kconfig file. The kvm_guest.config. There are other things I would like > to see subconfigs for, like "efi.config" - but I wasn't sure what the > current view on such things were. I'm glad to know I'm not along in my > frustration with the overly granular nature of Kconfig. > > The problem with this model of course is keeping the config fragments > current with Kconfig changes. The mergeconfig script does call out > problems with specified config options. We can address this with > a configcheck target or similar which would audit the config fragments > to ensure they are kept in sync with the Kconfig files. > > ... > >> >> And note that none of this is about technoliogy, and SAT solvers and >> resolving the KConfig depdendencies that some techie people love >> talking about. It's all about "what if we just had some kconfig >> fragments to enable some commonly used stuff" (where "commonly used" >> is obviously architecture dependent, but also target-dependent - a >> "simpleconfig" for a PC workstation kind of config is very different >> from a "simpleconfig" for a server or some ARM embedded thing). >> > > It sounds like the existing config fragment mechanism is sufficient for > what you describe and what we need to do is create these fragments. > > One thing that would be nice is if we could have fragment nesting so you > could create your "simpleconfig" which in turn includes a few of the > more specific config fragments. > And what would be totally cool if we could have fragments _per default_. EG by not having a massive .config, but rather keeping it per directory, or maybe corresponding in the directory where each Kconfig lives. That way it would be easier to figure out where this blasted option cam from, plus one could easily provide (and check!) configurations for several systems, keeping the common parts intact and modify only the machine specific ones. And it would solve the 'keeping the config current' problem, as one could quite simply identify which configuration will need to be changed for a Kconfig change, seeing that both will be kept in the same directory. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare at suse.com +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: F. Imend?rffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG N?rnberg) From sre at kernel.org Sun Jul 2 11:28:26 2017 From: sre at kernel.org (Sebastian Reichel) Date: Sun, 2 Jul 2017 13:28:26 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] mobile phones In-Reply-To: <20170628211008.GA19571@amd> References: <20170625104850.GA24717@amd> <87shinzkp9.fsf@notabene.neil.brown.name> <20170626083407.GA9621@amd> <20170627123947.krne6a2saolcndih@earth> <20170627215755.GC5250@amd> <20170628164502.itggbf4xuhsv3oyf@earth> <20170628211008.GA19571@amd> Message-ID: <20170702112826.zirafz7lbo5lnabd@earth> Hi, On Wed, Jun 28, 2017 at 11:10:08PM +0200, Pavel Machek wrote: > > > So to be exact... u-boot does not know about battery charging. And > > > NoLo can only do very, very slow charging. > > > > Yes. The idea is, that normally NoLo only charges far enough, that > > Linux can be booted. > > > > > Yes, unfortunately that does not work quite well here. Voltage goes > > > too low before Linux can boot, so it resets, but it is still high > > > enough for the bootloader, so it attempts to boot Linux one more time, > > > but battery is empty and voltage goes too low before Linux can boot, ... > > > > I guess your battery is not the fittest anymore? > > I guess that's one issue. (One of my batteries is actually so bad that > GSM modem fails with it.) > > Well, I guess Debian boots a little longer than Maemo. Plus, I believe > we should charge the battery from kernel by default; it will enable > running fsck etc, and it will mean slow userspace boot will not > break... That sounds like really bad battery :) > > > > On N950 there is an unsupported gps connected via i2c iirc (with > > > > unknown protocol that needs to be RE'd) and TI's WiLink provides > > > > GPS on a shared UART link with bluetooth-style header using yet > > > > another protocol. I agree, that we should have a GPS subsystem. > > > > > > Two GPSes in one box, interesting design. Are both of them connected > > > to useful antenna? > > > > Actually there are probably 3 GPS implementations in Droid 4: > > > > * WL1285 > > * MDM6600 modem > > * LTE modem > > > > As far as I understand it modems are required to have GPS access in > > US. I'm not yet sure which of the implementations is used by Droid 4's > > stock system, but Motorola explicitly added a driver for the WL1285 GPS > > making it a likely candidate (The userspace part is a closed source > > shared object used by Android). > > Interesting :-). I guess you could do really fair comparison of the > chipsets. > > (Binary driver -- bad Motorola :-( ) Yeah :( Note, that Nokia also has binary driver for the N900 (which had been reverse engineered) and a different on on N950 (which has not been reverse engineered so far). > > > +static int generic_protect(struct power_supply *psy) > > > +{ > > > + union power_supply_propval val; > > > + int res; > > > + int mV, mA, mOhm = 430, mVadj = 0; > > > > 430 mOhm? > > Yes, 0.43 Ohm. What's the source of this value? -- Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From sre at kernel.org Sun Jul 2 12:03:21 2017 From: sre at kernel.org (Sebastian Reichel) Date: Sun, 2 Jul 2017 14:03:21 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] mobile phones In-Reply-To: <20170628202722.GC18101@amd> References: <20170626111207.GA11688@amd> <20170626114931.GG23064@atomide.com> <20170626131401.GA11980@amd> <20170626134904.GH23064@atomide.com> <20170626204932.GA19396@amd> <20170627071835.GJ23064@atomide.com> <20170627121455.tljtekx6bmzlezxa@earth> <20170627215727.GA5250@amd> <20170628160112.ip2ambkzlkkoz2ww@earth> <20170628202722.GC18101@amd> Message-ID: <20170702120321.tskgcrbfehg4fccx@earth> Hi, On Wed, Jun 28, 2017 at 10:27:22PM +0200, Pavel Machek wrote: > > > Oh, another major piece is DSP coprocessor that is there. Unlike > > > graphics, we don't even know how support for it should like. > > > > https://www.kernel.org/doc/Documentation/remoteproc.txt > > > > config OMAP_REMOTEPROC > > tristate "OMAP remoteproc support" > > [...] > > help > > Say y here to support OMAP's remote processors (dual M3 > > and DSP on OMAP4) via the remote processor framework. > > > > Currently only supported on OMAP4. > > > > Usually you want to say Y here, in order to enable multimedia > > use-cases to run on your platform (multimedia codecs are > > offloaded to remote DSP processors using this framework). > > > > It's safe to say N here if you're not interested in multimedia > > offloading or just want a bare minimum kernel. > > > > I have been told by some Nokia people (I do not remember who it > > was, possibly Sakari), that the DSP is not that powerful and any > > calculation should also be possible on CPU (wasting a bit of > > energy). > > Ok, we probably don't care about DSP, but lets say we had really > fast DSP or really cared about power. > > We'd need remoteproc. Sure. But that has no interface for userland, > right? remoteproc is only for start/stop + fw loading. The actual communication is done using rpmsg, which does not yet have a userspace API afaik. > So we'd need to introduce interface for userland... fine. > > And then we'd need cross-compilers for the DSP used. Ok. Yes. As far as I know there is currently no open source toolchain for the TI DSP. > And then we'd need to split mpg123 to CPU and DSP parts, and modify it > so that it can run the DSP parts using remoteproc, when available? > > That starts to be ... complex, with changes all over the system :-(. With little gain expected on N900... -- Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From sre at kernel.org Sun Jul 2 12:11:04 2017 From: sre at kernel.org (Sebastian Reichel) Date: Sun, 2 Jul 2017 14:11:04 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] mobile phones In-Reply-To: <20170628183756.GA30277@amd> References: <20170625104850.GA24717@amd> <87shinzkp9.fsf@notabene.neil.brown.name> <20170626083407.GA9621@amd> <20170626112052.oze7qxmxiyu67wzh@sirena.org.uk> <20170626122224.GA11441@amd> <20170627114026.iwsqbbwytleyurmi@sirena.org.uk> <20170628183756.GA30277@amd> Message-ID: <20170702121104.azcq3jhx3hmjphq5@earth> Hi, On Wed, Jun 28, 2017 at 08:37:56PM +0200, Pavel Machek wrote: > Right now, first priority is to get useful quality of the voice > calls. If Nokia is the only one, then this is perhaps not a big > problem. > > I'd like to know how Pyra handles this ( > https://pyra-handheld.com/boards/pages/pyra/ ). It provides a normal PCM interface, check 20161116-DragonFly-MAIN-4G-sch.pdf, page 11 from here: https://pyra-handheld.com/boards/threads/power-memory-and-schematics.78631/ The Nokia phones are really special in this regard. -- Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From linux at leemhuis.info Sun Jul 2 17:51:43 2017 From: linux at leemhuis.info (Thorsten Leemhuis) Date: Sun, 2 Jul 2017 19:51:43 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking Message-ID: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> Hi! Sorry, I know I'm late -- real life (travel, day job, ...) kept me away from spending time on Linux kernel regression work :-/ Maybe I'm taking it a bit to far for the new kid in town, but I think I want to propose two sessions. One for the maintainer summit, that deals with a the most critical issues relevant to regression tracking. And one technical session to deal with all the other stuff. Obviously we can move below mentioned topics from one to the other or talk about them at both if we want. = [MAINTAINERS SUMMIT] Improve regression tracking = * Follow up from last year: What to do about bugzilla.kernel.org? Reporters still get stranded there. * How to get subsystems maintainer involved more in regression tracking to better make sure that reported regressions are tracked and not forgotten accidentally. * Frustrations with regression tracking aka. how to establish regression tracking properly to make sure it will never go away again. = [TECH TOPIC] Improve the kernels quality by getting more people involved in regression testing and reporting = * A short report from the outcome of the maintainer summit discussion; also pick up and topics here that where not properly discussed on the maintainer summit or were postponed to this session. * How to get distros more involved in regression tracking; especially those that have a technical aware user base or normally ship up2date kernel images (and thus have an greater interest in avoiding regressions). I'm mainly thinking about Arch Linux, Debian, Fedora, and openSUSE Tumbleweed here; having Ubuntu in the boat would be good, too! (might be wise to talk about this on the maintainers summit as well, if the right people are there) * How to make it more easy to (ideally automatically!) track the current status and the progress of each regression? Are there any tools that could make regression tracking easier for all of us while not introducing much overhead for maintainers? = Details = Below you'll find few more words about some points mentioned above; there are a few other topics as well we could discuss if we want. But first, a few general words on regression tracking from my point of view: * There are a lot of areas in regression tracking where things are far from good (read: in a bad state). That makes it easy to discuss current problems and their solutions for hours -- and at the same time forget that discussing itself doesn't get us much forward (the old bugzilla issue mentioned in this mail is a good example). We thus IMHO should focus on the most important issues and lay the groundwork to establish regression tracking properly again, then we move on to solve things that are harder to solve. * Regression tracking currently is quite boring and exhausting (read: high burn-out risk), as it involves quite a lot of manual work finding regressions and keeping track of their progress (and at the end of the day it does not feel like you achieved much). Some of that work can not be automated. But quite a bit can and that would help a great deal to establish regression tracking properly (currently I'm the only one doing it and some development cycles I simply don't find spare time for it). I currently don't see any existing solutions that fit well with our mail focused workflow and at the same time do not introduce much overhead for subsystem maintainers (which I assume is what everyone wants, as I fear solutions with much overhead won't fly at all). Ideas how to solve this tricky problem area are highly welcomed. It's something that can be discussed when the aforementioned points "establish regression tracking properly" and "make it more easy to manually or automatically track the current status of a regression" come up. == What to do about bugzilla.kernel.org = Discussed last year already; see https://lwn.net/Articles/705245/ for details. Situation didn't change much since then: the bugzilla instance was updated, but people still get stranded there as most subsystems ignore it. That afaics frustrates people and makes them stop testing or reporting bugs. Discuss how to improve things. [my2cent] Maybe a short term solution like this could work: Serve a static page on bugzilla.kernel.org that tells people where regressions/bugs for certain subsystems can be reported, as it most of the time is some mailing list anyway. Such a page could get compiled from MAINTAINERS (there is the "B:" field now that points to bugzilla; if its not there point to a mailing lists; also explain get_maintainers.pl). Leave our bugzilla reachable via bugzilla.kernel.org/frontpage (or something like that) for those few subsystems that use it; that's afaics ACPI and PM (including Cpufreq, Cpuidle, Hibernation, Suspend, ...) and maybe PCI (not sure) -- or should we tell them to move to bugzilla.freedesktop.org (or somewhere else) to get rid of our bugzilla in the long etrm and make Konstantins life easier? Anyway: Make sure bugs for other subsystems can't get filed in bugzilla.kernel.org anymore to make sure they get lost there. [/my2cent] == How to get subsystems maintainer more involved in regression tracking to [?] == One reasons why I put this up is: It would help me a lot if people let regressions at leemhuis.info (side note: might be wise to make a mailing-list that replaces this address) get told about regressions -- simply CCing it on reports or answers to regressions reports is enough; forwarding/bouncing mails there (even without additional text) is fine, too. The other reason I included it: This came up in last years discussion on this list and it seemed some people thought we can get the subsystems maintainers more involved; so I thought it might be wise to discuss it. Might also be a good idea to discuss here how to get distro kernel maintainer more involved if enough are around. == How to establish regression tracking properly [?] == This is a pretty vague topic on purpose. People seem to agree that regression tracking is important, but for years nobody did it (it stopped a little while after Rafael had to move on) and the little bit that I can do in my rare spare time won't help much (and I have no idea how long I can continue to find time for it). == Make it easier to track the progress of regression == One of the main reasons that makes regression tracking hard currently: getting aware or regressions and tracking their progress is a lot of manual work. I plan one step that hopefully makes the job a little easier and at the same time might allow some automation in the long term: ask people to include a certain keyword in their regressions reports. Maybe something like "Linux-Regression" that doesn't get too much false positives when searching for it on lists and via Google (suggestions for a better tag welcome). In addition, I plan to hand out some form of ID for each regressions I track and ask people to include it -- especially when they post patches that fix said regression or move the discussion to a new place (like "Corrects: Linux-Regression-d2afd"; again: suggestions welcome! Maybe I should just use a URL where people find details?). That way I can notice more easy when a fix for a regression hits linux-next or master; I also get aware if a discussion moves from bugzilla to LKML or from one thread to another (fingers crossed). Obviously it depends on cooperation of those involved. If this works out we could write a script or something that watches mailing lists, bug trackers and git trees for the tag in question. That script could file a database and automatically do some of the tracking job. == get distros more involved == I assume at least Ben (Debian), Laura (Fedora), and Takashi (openSUSE) are around, so it might be a good idea to sit together and talk regression tracking in general and how we could get the distros kernel maintainers more involved. Even better would be to sit down before to maybe come up with some ideas/plans we could talk during this session. One topic could be: How to make it easier for users of popular distros to get involved in testing. The "Kernel of the day" (KOTD) from SUSE/openSUSE was mentioned recently on this list already, but I got the impression that the existence of this repo is not well known; guess it's the same for my own Kernel Vanilla Repositories for Fedora (those contain packages with a quite recent mainline version; see https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories ) or the fact that Fedora rawhide ships a recent mainline snapshot all the time. But should distros also offer Linux-next somewhere? Or anything else? And should the distros send experienced users upstream when they found a regression? Or will subsystem maintainers send those users away because they assume those kernels are not vanilla? == Topics or vague ideas I left out on purpose == Here is a list of other things we could talk about, but I think better left for a later time: * Kerneloops (http://oops.kernel.org/): It was discussed last year on this list. I have no idea what the current status is. Is someone watching & analysing it? And poking the right people when needed? (I doubt it) * Regression tracking for stable kernels (many bugs only get noticed once a new mainline version got released; at that time it might still be easy to revert a certain patch in mainline and stable) * statistics: I didn't spend time to create statistics, like Rafael did in the past. They'd be nice to have, but for now I think my time is better spend elsewhere. * work towards growing the number of tester by making it easier for them (better documentation, easier configuration, bisection scripts, ...) * maybe document a few some procedures for those that are not regular kernel developers (like the "When users report bugs on the Fedora tracker that look like actual upstream bugs, what's the best way to have those reported?" thing that Laura mentioned earlier this month in the mail "Bug reporting feedback loop" * provide better services than only a plain text list of regression on a mailing list? * better documentation? for example explain the difference between bugs and regressions somewhere to make people understand why their bugs might get ignored, but as the same time know that we handle regressions more seriously. * Should the regression tracker nag subsystem maintainers (and reporters) more often if they are inactive? How do people for example feel about (Semi-)Automatic nagging mails for regressions where there is no progress? * Is the data and the format of the current reports show useful at all? If not: How to improve it? * regression tracking is a fair amount of work, and it's frustrating, and people burn out. How to avoid that? Can we maybe get regression tracking on solid ground by somehow building a healthy community around it (containing kernel developers, Distro maintainers and people that are willing to help in their spare time) that work on regressions testing/tracking and other QA stuff? * how to make the Linux kernel development so good that the mainstream distros stop their kernel forks and do what they do with Firefox: Ship the latest stable version (users get a new version with new features every few weeks) or a longterm branch (makes a big version jump about once a year; see Firefox ESR). Ugh, pretty long mail. Sorry about that. Maybe I shouldn't have looked so closely into LWN.net articles about regression tracking and older discussions about it. Ciao, Thorsten From pavel at ucw.cz Sun Jul 2 18:14:45 2017 From: pavel at ucw.cz (Pavel Machek) Date: Sun, 2 Jul 2017 20:14:45 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] mobile phones In-Reply-To: <20170702112826.zirafz7lbo5lnabd@earth> References: <20170625104850.GA24717@amd> <87shinzkp9.fsf@notabene.neil.brown.name> <20170626083407.GA9621@amd> <20170627123947.krne6a2saolcndih@earth> <20170627215755.GC5250@amd> <20170628164502.itggbf4xuhsv3oyf@earth> <20170628211008.GA19571@amd> <20170702112826.zirafz7lbo5lnabd@earth> Message-ID: <20170702181445.GB1894@xo-6d-61-c0.localdomain> Hi! > > > I guess your battery is not the fittest anymore? > > > > I guess that's one issue. (One of my batteries is actually so bad that > > GSM modem fails with it.) > > > > Well, I guess Debian boots a little longer than Maemo. Plus, I believe > > we should charge the battery from kernel by default; it will enable > > running fsck etc, and it will mean slow userspace boot will not > > break... > > That sounds like really bad battery :) Yes, fortunately I have two others :-). > > Interesting :-). I guess you could do really fair comparison of the > > chipsets. > > > > (Binary driver -- bad Motorola :-( ) > > Yeah :( Note, that Nokia also has binary driver for the N900 (which > had been reverse engineered) and a different on on N950 (which has > not been reverse engineered so far). Yes... and another reason to keep N900. > > > > + int mV, mA, mOhm = 430, mVadj = 0; > > > > > > 430 mOhm? > > > > Yes, 0.43 Ohm. > > What's the source of this value? Estimate on one of my batteries. Real value differs with temperature and battery age (among other things) but this is already significantly better than assuming internal resistance is zero. Best regards, Pavel -- (english) http://www.livejournal.com/~pavelmachek (cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html From rostedt at goodmis.org Mon Jul 3 16:30:25 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 3 Jul 2017 12:30:25 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> Message-ID: <20170703123025.7479702e@gandalf.local.home> On Sun, 2 Jul 2017 19:51:43 +0200 Thorsten Leemhuis wrote: > Hi! Sorry, I know I'm late -- real life (travel, day job, ...) kept me > away from spending time on Linux kernel regression work :-/ > > Maybe I'm taking it a bit to far for the new kid in town, but I think I > want to propose two sessions. One for the maintainer summit, that deals > with a the most critical issues relevant to regression tracking. And one > technical session to deal with all the other stuff. Obviously we can > move below mentioned topics from one to the other or talk about them at > both if we want. > > = [MAINTAINERS SUMMIT] Improve regression tracking = > > * Follow up from last year: What to do about bugzilla.kernel.org? > Reporters still get stranded there. > * How to get subsystems maintainer involved more in regression tracking > to better make sure that reported regressions are tracked and not > forgotten accidentally. We should push harder for all reproducer tests to be put into selftests. I try to do that myself (although I admit, I forget to do it myself here and there. But I'm pushing myself to be better) > * Frustrations with regression tracking aka. how to establish > regression tracking properly to make sure it will never go away again. By adding reproducing tests to selftests, we can easily see what regressions are still there. > > = [TECH TOPIC] Improve the kernels quality by getting more people > involved in regression testing and reporting = Again, this can be answered by placing more reproducers into selftests. > > * A short report from the outcome of the maintainer summit discussion; > also pick up and topics here that where not properly discussed on the > maintainer summit or were postponed to this session. > * How to get distros more involved in regression tracking; especially > those that have a technical aware user base or normally ship up2date > kernel images (and thus have an greater interest in avoiding > regressions). I'm mainly thinking about Arch Linux, Debian, Fedora, and > openSUSE Tumbleweed here; having Ubuntu in the boat would be good, too! > (might be wise to talk about this on the maintainers summit as well, if > the right people are there) > * How to make it more easy to (ideally automatically!) track the > current status and the progress of each regression? Are there any tools > that could make regression tracking easier for all of us while not > introducing much overhead for maintainers? What is selftests? (Jeopardy answer for all of the above ;-) > > = Details = > > Below you'll find few more words about some points mentioned above; > there are a few other topics as well we could discuss if we want. But > first, a few general words on regression tracking from my point of view: > > * There are a lot of areas in regression tracking where things are far > from good (read: in a bad state). That makes it easy to discuss current > problems and their solutions for hours -- and at the same time forget > that discussing itself doesn't get us much forward (the old bugzilla > issue mentioned in this mail is a good example). We thus IMHO should > focus on the most important issues and lay the groundwork to establish > regression tracking properly again, then we move on to solve things that > are harder to solve. > > * Regression tracking currently is quite boring and exhausting (read: > high burn-out risk), as it involves quite a lot of manual work finding > regressions and keeping track of their progress (and at the end of the > day it does not feel like you achieved much). Some of that work can not > be automated. But quite a bit can and that would help a great deal to > establish regression tracking properly (currently I'm the only one doing > it and some development cycles I simply don't find spare time for it). > > I currently don't see any existing solutions that fit well with our > mail focused workflow and at the same time do not introduce much > overhead for subsystem maintainers (which I assume is what everyone > wants, as I fear solutions with much overhead won't fly at all). Ideas > how to solve this tricky problem area are highly welcomed. It's > something that can be discussed when the aforementioned points > "establish regression tracking properly" and "make it more easy to > manually or automatically track the current status of a regression" come up. > > == What to do about bugzilla.kernel.org = > > Discussed last year already; see https://lwn.net/Articles/705245/ for > details. Situation didn't change much since then: the bugzilla instance > was updated, but people still get stranded there as most subsystems > ignore it. That afaics frustrates people and makes them stop testing or > reporting bugs. > > Discuss how to improve things. [my2cent] Maybe a short term solution > like this could work: Serve a static page on bugzilla.kernel.org that > tells people where regressions/bugs for certain subsystems can be > reported, as it most of the time is some mailing list anyway. Such a > page could get compiled from MAINTAINERS (there is the "B:" field now > that points to bugzilla; if its not there point to a mailing lists; also > explain get_maintainers.pl). > > Leave our bugzilla reachable via bugzilla.kernel.org/frontpage (or > something like that) for those few subsystems that use it; that's afaics > ACPI and PM (including Cpufreq, Cpuidle, Hibernation, Suspend, ...) and > maybe PCI (not sure) -- or should we tell them to move to > bugzilla.freedesktop.org (or somewhere else) to get rid of our bugzilla > in the long etrm and make Konstantins life easier? Anyway: Make sure > bugs for other subsystems can't get filed in bugzilla.kernel.org anymore > to make sure they get lost there. [/my2cent] > > == How to get subsystems maintainer more involved in regression tracking > to [?] == > > One reasons why I put this up is: It would help me a lot if people let > regressions at leemhuis.info (side note: might be wise to make a > mailing-list that replaces this address) get told about regressions -- > simply CCing it on reports or answers to regressions reports is enough; > forwarding/bouncing mails there (even without additional text) is fine, > too. > > The other reason I included it: This came up in last years discussion on > this list and it seemed some people thought we can get the subsystems > maintainers more involved; so I thought it might be wise to discuss it. > Might also be a good idea to discuss here how to get distro kernel > maintainer more involved if enough are around. > > == How to establish regression tracking properly [?] == > > This is a pretty vague topic on purpose. People seem to agree that > regression tracking is important, but for years nobody did it (it > stopped a little while after Rafael had to move on) and the little bit > that I can do in my rare spare time won't help much (and I have no idea > how long I can continue to find time for it). > > == Make it easier to track the progress of regression == > > One of the main reasons that makes regression tracking hard currently: > getting aware or regressions and tracking their progress is a lot of > manual work. I plan one step that hopefully makes the job a little > easier and at the same time might allow some automation in the long > term: ask people to include a certain keyword in their regressions > reports. Maybe something like "Linux-Regression" that doesn't get too > much false positives when searching for it on lists and via Google > (suggestions for a better tag welcome). > > In addition, I plan to hand out some form of ID for each regressions I > track and ask people to include it -- especially when they post patches > that fix said regression or move the discussion to a new place (like > "Corrects: Linux-Regression-d2afd"; again: suggestions welcome! Maybe I > should just use a URL where people find details?). > > That way I can notice more easy when a fix for a regression hits > linux-next or master; I also get aware if a discussion moves from > bugzilla to LKML or from one thread to another (fingers crossed). > Obviously it depends on cooperation of those involved. > > If this works out we could write a script or something that watches > mailing lists, bug trackers and git trees for the tag in question. That > script could file a database and automatically do some of the tracking job. > > == get distros more involved == > > I assume at least Ben (Debian), Laura (Fedora), and Takashi (openSUSE) > are around, so it might be a good idea to sit together and talk > regression tracking in general and how we could get the distros kernel > maintainers more involved. Even better would be to sit down before to > maybe come up with some ideas/plans we could talk during this session. > > One topic could be: How to make it easier for users of popular distros > to get involved in testing. The "Kernel of the day" (KOTD) from > SUSE/openSUSE was mentioned recently on this list already, but I got the > impression that the existence of this repo is not well known; guess it's > the same for my own Kernel Vanilla Repositories for Fedora (those > contain packages with a quite recent mainline version; see > https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories ) or the fact > that Fedora rawhide ships a recent mainline snapshot all the time. But > should distros also offer Linux-next somewhere? Or anything else? And > should the distros send experienced users upstream when they found a > regression? Or will subsystem maintainers send those users away because > they assume those kernels are not vanilla? > > > == Topics or vague ideas I left out on purpose == > > Here is a list of other things we could talk about, but I think better > left for a later time: > > * Kerneloops (http://oops.kernel.org/): It was discussed last year on > this list. I have no idea what the current status is. Is someone > watching & analysing it? And poking the right people when needed? (I > doubt it) > > * Regression tracking for stable kernels (many bugs only get noticed > once a new mainline version got released; at that time it might still be > easy to revert a certain patch in mainline and stable) > > * statistics: I didn't spend time to create statistics, like Rafael did > in the past. They'd be nice to have, but for now I think my time is > better spend elsewhere. > > * work towards growing the number of tester by making it easier for > them (better documentation, easier configuration, bisection scripts, ...) > > * maybe document a few some procedures for those that are not regular > kernel developers (like the "When users report bugs on the Fedora > tracker that look like actual upstream bugs, what's the best way to have > those reported?" thing that Laura mentioned earlier this month in the > mail "Bug reporting feedback loop" > > * provide better services than only a plain text list of regression on > a mailing list? > > * better documentation? for example explain the difference between bugs > and regressions somewhere to make people understand why their bugs might > get ignored, but as the same time know that we handle regressions more > seriously. > > * Should the regression tracker nag subsystem maintainers (and > reporters) more often if they are inactive? How do people for example > feel about (Semi-)Automatic nagging mails for regressions where there is > no progress? > > * Is the data and the format of the current reports show useful at all? > If not: How to improve it? > > * regression tracking is a fair amount of work, and it's frustrating, > and people burn out. How to avoid that? Can we maybe get regression > tracking on solid ground by somehow building a healthy community around > it (containing kernel developers, Distro maintainers and people that are > willing to help in their spare time) that work on regressions > testing/tracking and other QA stuff? > > * how to make the Linux kernel development so good that the mainstream > distros stop their kernel forks and do what they do with Firefox: Ship > the latest stable version (users get a new version with new features > every few weeks) or a longterm branch (makes a big version jump about > once a year; see Firefox ESR). This wont ever happen (famous last words). Distros want "stable kernels" with new features. That's not what stable is about. > > Ugh, pretty long mail. Sorry about that. Maybe I shouldn't have looked > so closely into LWN.net articles about regression tracking and older > discussions about it. Anyway, I know that selftests are not the answer for everything, but anything that has a way to reproduce a bug should be added to it. Sure, it may depend on various hardware and/or file systems and different configs, but if we have a central location to place all bug reproducing tests (which we do have), then we should utilize it. When it's in the kernel tree, it will be used much more often. -- Steve From dan.j.williams at intel.com Mon Jul 3 18:50:42 2017 From: dan.j.williams at intel.com (Dan Williams) Date: Mon, 3 Jul 2017 11:50:42 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170703123025.7479702e@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> Message-ID: On Mon, Jul 3, 2017 at 9:30 AM, Steven Rostedt wrote: > On Sun, 2 Jul 2017 19:51:43 +0200 [..] >> >> Ugh, pretty long mail. Sorry about that. Maybe I shouldn't have looked >> so closely into LWN.net articles about regression tracking and older >> discussions about it. > > Anyway, I know that selftests are not the answer for everything, but > anything that has a way to reproduce a bug should be added to it. Sure, > it may depend on various hardware and/or file systems and different > configs, but if we have a central location to place all bug reproducing > tests (which we do have), then we should utilize it. I agree with Steven, and I would add that you don't necessarily need specific hardware to write a test for a driver regression, see examples in tools/testing/nvdimm. I also tend to think that back-stopping regressions with new tests helps with the burn-out problem of tracking regressions. Where building tools and tests is potentially more fulfilling than just bug tracking. From jkosina at suse.cz Mon Jul 3 20:41:16 2017 From: jkosina at suse.cz (Jiri Kosina) Date: Mon, 3 Jul 2017 22:41:16 +0200 (CEST) Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Driver and/or module versions In-Reply-To: <20170630162155.GB26257@fury> References: <20170625072423.GR1248@mtr-leonro.local> <20170630162155.GB26257@fury> Message-ID: On Fri, 30 Jun 2017, Darren Hart wrote: > New features also fall into the independent tracking bucket, although > your point about feature masks could reduce that need. Is there a > definitive mechanism for the feature mask approach? I see a lot of > sysfs_filename:value key:value pairs for this kind of thing. Adding those sysfs attributes seems like exactly the thing that people will keep forgetting to do, as there is no (real) functionality depending on them. I doubt there is any better 'description' of the 'state' of the driver than SHA of the topmost commit + the tree it's related to. -- Jiri Kosina SUSE Labs From dvhart at infradead.org Mon Jul 3 21:25:35 2017 From: dvhart at infradead.org (Darren Hart) Date: Mon, 3 Jul 2017 14:25:35 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Driver and/or module versions In-Reply-To: References: <20170625072423.GR1248@mtr-leonro.local> <20170630162155.GB26257@fury> Message-ID: <20170703212535.GA6739@fury> On Mon, Jul 03, 2017 at 10:41:16PM +0200, Jiri Kosina wrote: > On Fri, 30 Jun 2017, Darren Hart wrote: > > > New features also fall into the independent tracking bucket, although > > your point about feature masks could reduce that need. Is there a > > definitive mechanism for the feature mask approach? I see a lot of > > sysfs_filename:value key:value pairs for this kind of thing. > > Adding those sysfs attributes seems like exactly the thing that people > will keep forgetting to do, as there is no (real) functionality depending > on them. > > I doubt there is any better 'description' of the 'state' of the driver > than SHA of the topmost commit + the tree it's related to. This is exactly what I have been saying in my inward facing roles. Here I'm trying to make sure I'm not missing something that makes this not 100% accurate. For specific things, I could see those sysfs attributes being useful, but as you say, they are informational only. -- Darren Hart VMware Open Source Technology Center From peterz at infradead.org Tue Jul 4 14:51:10 2017 From: peterz at infradead.org (Peter Zijlstra) Date: Tue, 4 Jul 2017 16:51:10 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <20170629221245.489760b1@gandalf.local.home> References: <152520246.5707.1498771254819.JavaMail.zimbra@efficios.com> <20170629195537.534445e7@gandalf.local.home> <20170629203224.6bf7f29a@gandalf.local.home> <20170629205218.5b9a7923@gandalf.local.home> <20170629211641.5aeb3af7@gandalf.local.home> <20170629212750.5c3542ee@gandalf.local.home> <20170629221245.489760b1@gandalf.local.home> Message-ID: <20170704145110.GD7287@worktop> Yay, tracing fight!! :/ On Thu, Jun 29, 2017 at 10:12:45PM -0400, Steven Rostedt wrote: > On Thu, 29 Jun 2017 18:51:14 -0700 > Linus Torvalds wrote: > > But yes, I was talking about something very similar to what I think > > Peter is talking about - the ability to attach a ebpf script to > > kprobes and extract data dynamically. We've supported ebpf tracepoints > > for years afaik, what is actually missing from using that for whatever > > particular extension people want to use? > > Well, I don't want to put words in his mouth, but as he's probably > currently putting mush in a baby's mouth, so I'll do it anyway. ;-) We > were talking about making the static tracepoints more "dynamic". I'm not > sure he's ever used eBPF with tracing. So my concerns/objections are two-fold: - I want only a single static tracepoint in the code. - I want only a single 'event' associated with this in userspace. (in particular I only see confusion happening when we have: sched_switch_fair, sched_switch_rt, sched_switch_deadline events for the exact same event; people will forget to enable one or more and wonder WTF they have holes in their traces) These are not strange constraints / demands in my book. Just turns out its 'difficult' to pull off or something. I'm in fact fine with simply adding bits to the one tracepoint we have; although others (that'd be you Steve) are not because expensive. Further complications seem to stem from the fact that I use the tracefs interface exclusively. I don't know how to use perf or trace-cmd or any of that new fangled stuff to do tracing -- nor do I really care, it works for me (same why I'm happy with sysvinit, I don't _want_ to have to relearn my 20+ year old sysadmin skillz, there's better things in live to spend time on, that baby you mentioned for example). So on that same vein, I'd be entirely helpless using eBPF to do tracing, that's even more complicated. That said, I don't typically need this crud anyway, I just change my kernel and rebuild, reboot and am happy, that's far easier than trying to figure out how eBPF works. In any case, baby vomit is more fun that this subject :-) From linux at leemhuis.info Tue Jul 4 19:03:22 2017 From: linux at leemhuis.info (Thorsten Leemhuis) Date: Tue, 4 Jul 2017 21:03:22 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170703123025.7479702e@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> Message-ID: On 03.07.2017 18:30, Steven Rostedt wrote: > On Sun, 2 Jul 2017 19:51:43 +0200 > Thorsten Leemhuis wrote: >> * How to get subsystems maintainer involved more in regression tracking >> to better make sure that reported regressions are tracked and not >> forgotten accidentally. > We should push harder for all reproducer tests to be put into > selftests. I try to do that myself [...] > [...] > By adding reproducing tests to selftests, we can easily see what > regressions are still there. > [...] > What is selftests? (Jeopardy answer for all of the above ;-) Sure, writing and running selftests is a good idea. But as you said yourself in the later part of your mail: it won't help much in situations where the kernel (or a selftest) needs to run on a certain hardware or a specific (and maybe rare or complex) configuration. Sadly a lot of the regressions in my recent reports were of this kind afaics :-/ In fact I got the impression that most of the regressions that might get caught by selftests were directly handled by the subsystem maintainer and never made it to me or my reports -- and thus I can't ask maintainers to write selftests. *If* I got better aware of those problems I (a) could make sure they are not forgotten and (b) sooner or later could publicly state something like "hey, you had ten regressions recently in your subsystem where writing a selftest might have been a good idea, but you didn't even write one -- why?" (if we want something like that). > [?] >> * how to make the Linux kernel development so good that the mainstream >> distros stop their kernel forks and do what they do with Firefox: Ship >> the latest stable version (users get a new version with new features >> every few weeks) or a longterm branch (makes a big version jump about >> once a year; see Firefox ESR). Hehe, I maybe left the field "regression tracking" to much here and wandered too far into QA territory. > This wont ever happen (famous last words). Distros want "stable > kernels" with new features. Ha, yes, it's a long shot (and maybe more a vague idea to work towards to). And maybe Debian stable and RHEL will always use the model they use today. But Fedora, rolling release distros (Tumbleweed, Arch, ...), and some others are updating to the latest Linux kernel release every few weeks already and it works fine for them. Maybe we can get Ubuntu and others to follow sooner or later. Sure, for some people a version jump to a major new kernel release will sound crazy, but when Linus introduced the current development scheme a lot of people also said "that will never fly" -- that was 13 years ago now and it works quite well. The situation was similar with Firefox as well. > That's not what stable is about. That afaics (disclaimer: English is not my mother tongue) depends on the interpretation of the word, as it can mean "nothing changes" or "rock solid/reliable" (even when two people have a "stable relationship" it does not mean that nothing changes between them...). >> Ugh, pretty long mail. Sorry about that. Maybe I shouldn't have looked >> so closely into LWN.net articles about regression tracking and older >> discussions about it. > Anyway, I know that selftests are not the answer for everything, but > anything that has a way to reproduce a bug should be added to it. Sure, > it may depend on various hardware and/or file systems and different > configs, but if we have a central location to place all bug reproducing > tests (which we do have), then we should utilize it. > When it's in the kernel tree, it will be used much more often. +1 Ciao, Thorsten From rostedt at goodmis.org Wed Jul 5 12:45:28 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 08:45:28 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> Message-ID: <20170705084528.67499f8c@gandalf.local.home> On Tue, 4 Jul 2017 21:03:22 +0200 Thorsten Leemhuis wrote: > On 03.07.2017 18:30, Steven Rostedt wrote: > > On Sun, 2 Jul 2017 19:51:43 +0200 > > Thorsten Leemhuis wrote: > >> * How to get subsystems maintainer involved more in regression tracking > >> to better make sure that reported regressions are tracked and not > >> forgotten accidentally. > > We should push harder for all reproducer tests to be put into > > selftests. I try to do that myself [...] > > [...] > > By adding reproducing tests to selftests, we can easily see what > > regressions are still there. > > [...] > > What is selftests? (Jeopardy answer for all of the above ;-) > > Sure, writing and running selftests is a good idea. But as you said > yourself in the later part of your mail: it won't help much in > situations where the kernel (or a selftest) needs to run on a certain > hardware or a specific (and maybe rare or complex) configuration. Sadly > a lot of the regressions in my recent reports were of this kind afaics :-/ > > In fact I got the impression that most of the regressions that might get > caught by selftests were directly handled by the subsystem maintainer > and never made it to me or my reports -- and thus I can't ask > maintainers to write selftests. *If* I got better aware of those > problems I (a) could make sure they are not forgotten and (b) sooner or > later could publicly state something like "hey, you had ten regressions > recently in your subsystem where writing a selftest might have been a > good idea, but you didn't even write one -- why?" (if we want something > like that). I'm betting there's a lot of reproducer code that never makes it into a test. How do we solve that? Perhaps we need people looking at LKML for any signs "I did this, and it caused a bug" or "Here's a test case which can trigger the bug". Each of these instances should end up in selftests, and I'm sure they are not. We can't do much for special hardware, even though those tests should still be in the selftests for those that have the hardware, but we can do something about special configs. Perhaps selfttests should have a "config test" section. I have that in my own tests, but I use ktest to build them. > > > [?] > >> * how to make the Linux kernel development so good that the mainstream > >> distros stop their kernel forks and do what they do with Firefox: Ship > >> the latest stable version (users get a new version with new features > >> every few weeks) or a longterm branch (makes a big version jump about > >> once a year; see Firefox ESR). > > Hehe, I maybe left the field "regression tracking" to much here and > wandered too far into QA territory. > > > This wont ever happen (famous last words). Distros want "stable > > kernels" with new features. > > Ha, yes, it's a long shot (and maybe more a vague idea to work towards > to). And maybe Debian stable and RHEL will always use the model they use > today. But Fedora, rolling release distros (Tumbleweed, Arch, ...), and > some others are updating to the latest Linux kernel release every few > weeks already and it works fine for them. Maybe we can get Ubuntu and > others to follow sooner or later. > > Sure, for some people a version jump to a major new kernel release will > sound crazy, but when Linus introduced the current development scheme a > lot of people also said "that will never fly" -- that was 13 years ago > now and it works quite well. The situation was similar with Firefox as well. > > > That's not what stable is about. > > That afaics (disclaimer: English is not my mother tongue) depends on the > interpretation of the word, as it can mean "nothing changes" or "rock > solid/reliable" (even when two people have a "stable relationship" it > does not mean that nothing changes between them...). Nothing to do with what language your mother tongue is ;-) When the stable releases were created, there was some pretty strict requirements for what should go into stable. Of course the requirements have changed throughout the years. But there are big differences in what Red Hat considers something "stable" and what the Linux stable releases consider to be stable. That is where I meant that things wont change. -- Steve From rostedt at goodmis.org Wed Jul 5 13:27:57 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 09:27:57 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> Message-ID: <20170705092757.63dc2328@gandalf.local.home> On Wed, 5 Jul 2017 09:09:51 -0400 Carlos O'Donell wrote: > This problem is a reflection of our own explicit or implicit priorities. > The priorities of developers and reviewers needs to change to make an > impact on the problem. This is a hard problem. I 100% agree. > > As a concrete action item, glibc core developers took a harder stance on > (a) all user-visible bugs need a bug # (forces people to think about the Unfortunately, we don't have a good system for a "bug #". Most kernel developers hate bugzilla, and I think that includes Linus ;-) Which means, unless Linus builds us a new bug tracking system, there wont be any mandate for it. > problem and file a coherent public bug about it) (b) all bugs needs a > regression test if possible, (c) and if not possible we need to extend I would love all bug fixes to come with a test (when possible). > the testing framework to make it possible (we've started using kernel > namespaces to create isolated test configurations). Well, we have a selftest directory that should include all of these. And most people run them on either a test box or a VM. > > This change in reviewer priorities has had a noticeable impact on developer > priorities over the last 5 years. Timelines for this problem will be > measured in years. > Your "b" above is what I would like to push. But who's going to enforce this? With 10,000 changes per release, and a lot of them are fixes, the best we can do is the honor system. Start shaming people that don't have a regression test along with a Fixes tag (but we don't want people to fix bugs without adding that tag either). There is a fine line one must walk between getting people to change their approaches to bugs and regression tests, and pissing them off where they start doing the opposite of what would be best for the community. -- Steve From greg at kroah.com Wed Jul 5 14:06:07 2017 From: greg at kroah.com (Greg KH) Date: Wed, 5 Jul 2017 16:06:07 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705092757.63dc2328@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> Message-ID: <20170705140607.GA30187@kroah.com> On Wed, Jul 05, 2017 at 09:27:57AM -0400, Steven Rostedt wrote: > Your "b" above is what I would like to push. But who's going to enforce > this? With 10,000 changes per release, and a lot of them are fixes, the > best we can do is the honor system. Start shaming people that don't > have a regression test along with a Fixes tag (but we don't want people > to fix bugs without adding that tag either). There is a fine line one > must walk between getting people to change their approaches to bugs and > regression tests, and pissing them off where they start doing the > opposite of what would be best for the community. I would bet, for the huge majority of our fixes, they are fixes for specific hardware, or workarounds for specific hardware issues. Now writing tests for those is not an impossible task (look at what the i915 developers have), but it is very very hard overall, especially if the base infrastructure isn't there to do it. For specific examples, here's the shortlog for fixes that went into drivers/usb/host/ for 4.12 after 4.12-rc1 came out. Do you know of a way to write a test for these types of things? usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk usb: xhci: Fix USB 3.1 supported protocol parsing usb: host: xhci-plat: propagate return value of platform_get_irq() xhci: Fix command ring stop regression in 4.11 xhci: remove GFP_DMA flag from allocation USB: xhci: fix lock-inversion problem usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd usb: host: xhci-mem: allocate zeroed Scratchpad Buffer xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton usb: xhci: trace URB before giving it back instead of after USB: host: xhci: use max-port define USB: ehci-platform: fix companion-device leak usb: r8a66597-hcd: select a different endpoint on timeout usb: r8a66597-hcd: decrease timeout And look at the commits with the "Fixes:" tag in it, I do, I read every one of them. See if writing a test for the majority of them would even be possible... I don't mean to poo-poo the idea, but please realize that around 75% of the kernel is hardware/arch support, so that means that 75% of the changes/fixes deal with hardware things (yes, change is in direct correlation to size of the codebase in the tree, strange but true). If only I had a subsystem that didn't have to deal with hardware, that must be so easy to work with :) thanks, greg k-h From rostedt at goodmis.org Wed Jul 5 14:33:35 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 10:33:35 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705140607.GA30187@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <20170705103335.0cbd9984@gandalf.local.home> On Wed, 5 Jul 2017 16:06:07 +0200 Greg KH wrote: > On Wed, Jul 05, 2017 at 09:27:57AM -0400, Steven Rostedt wrote: > > Your "b" above is what I would like to push. But who's going to enforce > > this? With 10,000 changes per release, and a lot of them are fixes, the > > best we can do is the honor system. Start shaming people that don't > > have a regression test along with a Fixes tag (but we don't want people > > to fix bugs without adding that tag either). There is a fine line one > > must walk between getting people to change their approaches to bugs and > > regression tests, and pissing them off where they start doing the > > opposite of what would be best for the community. > > I would bet, for the huge majority of our fixes, they are fixes for > specific hardware, or workarounds for specific hardware issues. Now > writing tests for those is not an impossible task (look at what the i915 > developers have), but it is very very hard overall, especially if the > base infrastructure isn't there to do it. > > For specific examples, here's the shortlog for fixes that went into > drivers/usb/host/ for 4.12 after 4.12-rc1 came out. Do you know of a > way to write a test for these types of things? > usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk > usb: xhci: Fix USB 3.1 supported protocol parsing > usb: host: xhci-plat: propagate return value of platform_get_irq() > xhci: Fix command ring stop regression in 4.11 > xhci: remove GFP_DMA flag from allocation > USB: xhci: fix lock-inversion problem > usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd > usb: host: xhci-mem: allocate zeroed Scratchpad Buffer > xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton > usb: xhci: trace URB before giving it back instead of after > USB: host: xhci: use max-port define > USB: ehci-platform: fix companion-device leak > usb: r8a66597-hcd: select a different endpoint on timeout > usb: r8a66597-hcd: decrease timeout > > And look at the commits with the "Fixes:" tag in it, I do, I read every > one of them. See if writing a test for the majority of them would even > be possible... > > I don't mean to poo-poo the idea, but please realize that around 75% of > the kernel is hardware/arch support, so that means that 75% of the > changes/fixes deal with hardware things (yes, change is in direct > correlation to size of the codebase in the tree, strange but true). I would say that if it's for a specific hardware, then it's really up to the maintainer if there should be a test or not. As a lot of these is just to deal with some quirk or non standard that the hardware does. But are these regressions, or just some feature that's been broken on that hardware since its conception? That is, Thorsten this is more for you, how much real regressions are in hardware? A bug that's been there forever is not a regression. It's a feature ;-) A regression is something that use to work and now does not. Is that number still as high with hardware? Those probably could be where tests can be focused on. I'm worried more about infrastructure too. I would look at general functionality of say USB, to see if something can be written to test a device. Using the one change above that actually mentions "regression" would it be possible to test completion codes? (I have no idea, I only read the change log and I'm speaking out of my derri?re) If we have a bunch of generic tests that can test hardware (general video tests, USB tests, network cards, etc) and people ran these on their own hardware, and it were to trigger a failure, then it would be easier for users to report these issues to the maintainers. And these would be easier to find and fix. No test should be written for a single specific hardware. It should be a general functionality that different hardware can execute. > > If only I had a subsystem that didn't have to deal with hardware, that > must be so easy to work with :) *smack*! ;-) -- Steve From broonie at kernel.org Wed Jul 5 14:33:41 2017 From: broonie at kernel.org (Mark Brown) Date: Wed, 5 Jul 2017 15:33:41 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705140607.GA30187@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <20170705143341.oees22k2snhtmkxo@sirena.org.uk> On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: > I don't mean to poo-poo the idea, but please realize that around 75% of > the kernel is hardware/arch support, so that means that 75% of the > changes/fixes deal with hardware things (yes, change is in direct > correlation to size of the codebase in the tree, strange but true). Then add in all the fixes for concurrency/locking issues and so on that're hard to reliably reproduce as well... -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From rostedt at goodmis.org Wed Jul 5 14:36:58 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 10:36:58 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705143341.oees22k2snhtmkxo@sirena.org.uk> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> Message-ID: <20170705103658.226099c6@gandalf.local.home> On Wed, 5 Jul 2017 15:33:41 +0100 Mark Brown wrote: > On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: > > > I don't mean to poo-poo the idea, but please realize that around 75% of > > the kernel is hardware/arch support, so that means that 75% of the > > changes/fixes deal with hardware things (yes, change is in direct > > correlation to size of the codebase in the tree, strange but true). > > Then add in all the fixes for concurrency/locking issues and so on > that're hard to reliably reproduce as well... All tests should be run with lockdep enabled ;-) Which a surprising few developers appear to do :-p -- Steve From James.Bottomley at HansenPartnership.com Wed Jul 5 14:50:28 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Wed, 05 Jul 2017 07:50:28 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705103658.226099c6@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> Message-ID: <1499266228.3668.10.camel@HansenPartnership.com> On Wed, 2017-07-05 at 10:36 -0400, Steven Rostedt wrote: > On Wed, 5 Jul 2017 15:33:41 +0100 > Mark Brown wrote: > > > > > On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: > > > > > > > > I don't mean to poo-poo the idea, but please realize that around > > > 75% of the kernel is hardware/arch support, so that means that > > > 75% of the changes/fixes deal with hardware things (yes, change > > > is in direct correlation to size of the codebase in the tree, > > > strange but true). ? > > > > Then add in all the fixes for concurrency/locking issues and so on > > that're hard to reliably reproduce as well... > > All tests should be run with lockdep enabled ;-)??Which a surprising > few developers appear to do :-p Lockdep checks the locking hierarchies and makes assumptions about them which it then validates ... it doesn't tell you if the data you think you're protecting was accessed outside the lock, which is the usual source of concurrency problems. ?In other words lockdep is useful but it's not a panacea. James From broonie at kernel.org Wed Jul 5 14:52:20 2017 From: broonie at kernel.org (Mark Brown) Date: Wed, 5 Jul 2017 15:52:20 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705103335.0cbd9984@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705103335.0cbd9984@gandalf.local.home> Message-ID: <20170705145220.5u3qpxs45sbbpzpx@sirena.org.uk> On Wed, Jul 05, 2017 at 10:33:35AM -0400, Steven Rostedt wrote: > That is, Thorsten this is more for you, how much real regressions are in > hardware? A bug that's been there forever is not a regression. It's a > feature ;-) A regression is something that use to work and now does > not. Is that number still as high with hardware? Those probably could > be where tests can be focused on. A relatively common case IME is things that were always bugs but depend on some external thing to become visible, like someone trying to use a device in a slightly different way, doing more detailed testing of some kind or some subsystem change. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From rostedt at goodmis.org Wed Jul 5 14:56:51 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 10:56:51 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499266228.3668.10.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> Message-ID: <20170705105651.5da9c969@gandalf.local.home> On Wed, 05 Jul 2017 07:50:28 -0700 James Bottomley wrote: > On Wed, 2017-07-05 at 10:36 -0400, Steven Rostedt wrote: > > On Wed, 5 Jul 2017 15:33:41 +0100 > > Mark Brown wrote: > > > > > > > > On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: > > > > > > > > > > > I don't mean to poo-poo the idea, but please realize that around > > > > 75% of the kernel is hardware/arch support, so that means that > > > > 75% of the changes/fixes deal with hardware things (yes, change > > > > is in direct correlation to size of the codebase in the tree, > > > > strange but true). ? > > > > > > Then add in all the fixes for concurrency/locking issues and so on > > > that're hard to reliably reproduce as well... > > > > All tests should be run with lockdep enabled ;-)??Which a surprising > > few developers appear to do :-p > > Lockdep checks the locking hierarchies and makes assumptions about them > which it then validates ... it doesn't tell you if the data you think We should probably look at adding infrastructure that helps in that. RCU already has a lot of there to help know if data is being protected by RCU or not. Hmm, maybe we could add a __rcu like type that we can associate protected data with, where a config can associate access to a variable with a lock being held? > you're protecting was accessed outside the lock, which is the usual > source of concurrency problems. ?In other words lockdep is useful but > it's not a panacea. Still not an excuse to not have lockdep enabled during tests. -- Steve From James.Bottomley at HansenPartnership.com Wed Jul 5 15:09:49 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Wed, 05 Jul 2017 08:09:49 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705105651.5da9c969@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> Message-ID: <1499267389.3668.16.camel@HansenPartnership.com> On Wed, 2017-07-05 at 10:56 -0400, Steven Rostedt wrote: > On Wed, 05 Jul 2017 07:50:28 -0700 > James Bottomley wrote: > > > > > On Wed, 2017-07-05 at 10:36 -0400, Steven Rostedt wrote: > > > > > > On Wed, 5 Jul 2017 15:33:41 +0100 > > > Mark Brown wrote: > > > ?? > > > > > > > > > > > > On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: > > > > ?? > > > > > > > > > > > > > > > I don't mean to poo-poo the idea, but please realize that > > > > > around 75% of the kernel is hardware/arch support, so that > > > > > means that 75% of the changes/fixes deal with hardware things > > > > > (yes, change is in direct correlation to size of the codebase > > > > > in the tree, strange but true). ? ? > > > > > > > > Then add in all the fixes for concurrency/locking issues and so > > > > on that're hard to reliably reproduce as well... ? > > > > > > All tests should be run with lockdep enabled ;-)??Which a > > > surprising few developers appear to do :-p ? > > > > Lockdep checks the locking hierarchies and makes assumptions about > > them which it then validates ... it doesn't tell you if the data > > you think > > We should probably look at adding infrastructure that helps in that. > RCU already has a lot of there to help know if data is being > protected by RCU or not. > > Hmm, maybe we could add a __rcu like type that we can associate > protected data with, where a config can associate access to a > variable with a lock being held? That's about 10x more complex than the releases/acquires/must_hold annotation, which we have fairly dismal coverage on. If you remember the hotplug annotations, which were a shining example: there's a limit of complexity before any annotation system simply becomes a make work tyranny.? > > you're protecting was accessed outside the lock, which is the usual > > source of concurrency problems. ?In other words lockdep is useful > > but it's not a panacea. > > Still not an excuse to not have lockdep enabled during tests. OK, what makes you think lockdep isn't enabled? ?Since Kconfig is so complex, I usually use a distro config ... they have it enabled (or at least openSUSE does), so it's enabled for everything I do. James From linux at roeck-us.net Wed Jul 5 15:16:33 2017 From: linux at roeck-us.net (Guenter Roeck) Date: Wed, 5 Jul 2017 08:16:33 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705140607.GA30187@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: On 07/05/2017 07:06 AM, Greg KH wrote: > On Wed, Jul 05, 2017 at 09:27:57AM -0400, Steven Rostedt wrote: >> Your "b" above is what I would like to push. But who's going to enforce >> this? With 10,000 changes per release, and a lot of them are fixes, the >> best we can do is the honor system. Start shaming people that don't >> have a regression test along with a Fixes tag (but we don't want people >> to fix bugs without adding that tag either). There is a fine line one >> must walk between getting people to change their approaches to bugs and >> regression tests, and pissing them off where they start doing the >> opposite of what would be best for the community. > > I would bet, for the huge majority of our fixes, they are fixes for > specific hardware, or workarounds for specific hardware issues. Now > writing tests for those is not an impossible task (look at what the i915 > developers have), but it is very very hard overall, especially if the > base infrastructure isn't there to do it. > > For specific examples, here's the shortlog for fixes that went into > drivers/usb/host/ for 4.12 after 4.12-rc1 came out. Do you know of a > way to write a test for these types of things? > usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk > usb: xhci: Fix USB 3.1 supported protocol parsing > usb: host: xhci-plat: propagate return value of platform_get_irq() > xhci: Fix command ring stop regression in 4.11 > xhci: remove GFP_DMA flag from allocation > USB: xhci: fix lock-inversion problem > usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd > usb: host: xhci-mem: allocate zeroed Scratchpad Buffer > xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton > usb: xhci: trace URB before giving it back instead of after > USB: host: xhci: use max-port define > USB: ehci-platform: fix companion-device leak > usb: r8a66597-hcd: select a different endpoint on timeout > usb: r8a66597-hcd: decrease timeout > > And look at the commits with the "Fixes:" tag in it, I do, I read every > one of them. See if writing a test for the majority of them would even > be possible... > > I don't mean to poo-poo the idea, but please realize that around 75% of > the kernel is hardware/arch support, so that means that 75% of the > changes/fixes deal with hardware things (yes, change is in direct > correlation to size of the codebase in the tree, strange but true). > The reproducers for several of the usb fixes I submitted recently took hours of stress test to reproduce the underlying problems. I have one more to fix which takes days to reproduce, if at all (I have seen that problem only two or three times during weeks of stress test). Due to the nature of the problems, reproducing them heavily depended on the underlying hardware. None of the reproducers can guarantee that the problem is fixed; they are intended to show the problem, not that it is fixed. This happens a lot with race conditions - in many cases it is impossible to prove that the problem is fixed; one can only prove that it still exists. Echoing what you said, I have no idea how it would even be possible to write unit tests to verify if the problems I fixed are really fixed. Several of the fixes I have submitted are based on single-instance error logs with no reproducer. Many others are compile time fixes or fix problems found with code inspection (manual or automatic). If we start shaming people for not providing unit tests, all we'll accomplish is that people will stop providing bug fixes. Guenter From broonie at kernel.org Wed Jul 5 15:20:26 2017 From: broonie at kernel.org (Mark Brown) Date: Wed, 5 Jul 2017 16:20:26 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499267389.3668.16.camel@HansenPartnership.com> References: <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> Message-ID: <20170705152026.rkw73q2f6xmiju37@sirena.org.uk> On Wed, Jul 05, 2017 at 08:09:49AM -0700, James Bottomley wrote: > On Wed, 2017-07-05 at 10:56 -0400, Steven Rostedt wrote: > > James Bottomley wrote: > > > you're protecting was accessed outside the lock, which is the usual > > > source of concurrency problems. ?In other words lockdep is useful > > > but it's not a panacea. > > Still not an excuse to not have lockdep enabled during tests. > OK, what makes you think lockdep isn't enabled? ?Since Kconfig is so > complex, I usually use a distro config ... they have it enabled (or at > least openSUSE does), so it's enabled for everything I do. Yeah, I see enough reports with it in embedded contexts to make me think people use it there. I know I tend to have it turned on most of the time. The concurrency stuff I'm thinking of here is more the things you're mentioning with just not taking locks at all when they are needed or concurrency with hardware. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From rostedt at goodmis.org Wed Jul 5 15:20:47 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 11:20:47 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499267389.3668.16.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> Message-ID: <20170705112047.23ee09f6@gandalf.local.home> On Wed, 05 Jul 2017 08:09:49 -0700 James Bottomley wrote: > > > you're protecting was accessed outside the lock, which is the usual > > > source of concurrency problems. ?In other words lockdep is useful > > > but it's not a panacea. > > > > Still not an excuse to not have lockdep enabled during tests. > > OK, what makes you think lockdep isn't enabled? ?Since Kconfig is so > complex, I usually use a distro config ... they have it enabled (or at > least openSUSE does), so it's enabled for everything I do. openSuSE has it enabled? I hope not for its production config, as lockdep has a huge performance penalty. I'm thinking you don't have it enabled. What config are you looking at? The actual config that does the testing of locks is CONFIG_PROVE_LOCKING, which selects LOCKDEP to be compiled in. -- Steve From rostedt at goodmis.org Wed Jul 5 15:27:07 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 11:27:07 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <20170705112707.54d7f345@gandalf.local.home> On Wed, 5 Jul 2017 08:16:33 -0700 Guenter Roeck wrote: > The reproducers for several of the usb fixes I submitted recently took hours of > stress test to reproduce the underlying problems. I have one more to fix which > takes days to reproduce, if at all (I have seen that problem only two or three > times during weeks of stress test). Due to the nature of the problems, reproducing > them heavily depended on the underlying hardware. None of the reproducers can > guarantee that the problem is fixed; they are intended to show the problem, > not that it is fixed. This happens a lot with race conditions - in many cases > it is impossible to prove that the problem is fixed; one can only prove that > it still exists. > > Echoing what you said, I have no idea how it would even be possible to write > unit tests to verify if the problems I fixed are really fixed. > > Several of the fixes I have submitted are based on single-instance error logs with > no reproducer. Many others are compile time fixes or fix problems found with code > inspection (manual or automatic). > > If we start shaming people for not providing unit tests, all we'll accomplish is > that people will stop providing bug fixes. I need to be clearer on this. What I meant was, if there's a bug where someone has a test that easily reproduces the bug, then if there's not a test added to selftests for said bug, then we should shame those into doing so. A bug that is found by inspection or hard to reproduce test cases are not applicable, as they don't have tests that can show a regression. And I'm betting that those bugs are NOT REGRESSIONS! Most likely are bugs that always existed, but because of the unpredictable hitting of the bug (as you said, it required hours of stress tests to reproduce), the bug was not previously hit during development. That's not a regression, that's a feature. Are we tracking regressions or just simply bugs? -- Steve From carlos at redhat.com Wed Jul 5 13:09:51 2017 From: carlos at redhat.com (Carlos O'Donell) Date: Wed, 5 Jul 2017 09:09:51 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705084528.67499f8c@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> Message-ID: <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> On 07/05/2017 08:45 AM, Steven Rostedt wrote: > I'm betting there's a lot of reproducer code that never makes it into a > test. How do we solve that? Perhaps we need people looking at LKML for > any signs "I did this, and it caused a bug" or "Here's a test case > which can trigger the bug". Each of these instances should end up in > selftests, and I'm sure they are not. > > We can't do much for special hardware, even though those tests should > still be in the selftests for those that have the hardware, but we can > do something about special configs. Perhaps selfttests should have a > "config test" section. I have that in my own tests, but I use ktest to > build them. This problem is a reflection of our own explicit or implicit priorities. The priorities of developers and reviewers needs to change to make an impact on the problem. This is a hard problem. As a concrete action item, glibc core developers took a harder stance on (a) all user-visible bugs need a bug # (forces people to think about the problem and file a coherent public bug about it) (b) all bugs needs a regression test if possible, (c) and if not possible we need to extend the testing framework to make it possible (we've started using kernel namespaces to create isolated test configurations). This change in reviewer priorities has had a noticeable impact on developer priorities over the last 5 years. Timelines for this problem will be measured in years. -- Cheers, Carlos. From carlos at redhat.com Wed Jul 5 14:06:24 2017 From: carlos at redhat.com (Carlos O'Donell) Date: Wed, 5 Jul 2017 10:06:24 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705092757.63dc2328@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> Message-ID: <9b377a08-bf38-b41e-040c-41cb078bcfc3@redhat.com> On 07/05/2017 09:27 AM, Steven Rostedt wrote: >> As a concrete action item, glibc core developers took a harder stance on >> (a) all user-visible bugs need a bug # (forces people to think about the > > Unfortunately, we don't have a good system for a "bug #". Most kernel > developers hate bugzilla, and I think that includes Linus ;-) Which > means, unless Linus builds us a new bug tracking system, there wont be > any mandate for it. Use the XMLRPC API to build a better interface for kernel developers? Our "fixed bugs" list is automatically culled via XMLRPC to generate our release announcement with "fixed bugs." The bug # mandate has had a few key effects. It allows non-developers to search for old similar regressions in an easier fashion than having to trawl the mailing list for incomprehensible (to them) discussions about semantics. The bugs are described and talked about in terms of user facing aspects, not internal implementation details. Regressed bugs can be reopened and discussed on the mailing list with links to the discussions and summaries of conclusions. All of this means we have a cleaner, clearer, description of the problem from the user side. This again needs priority from a group of people for whom time is precious, so you have to get buy in from them. I don't think (a) is needed, but the glibc community found it helpful. >> problem and file a coherent public bug about it) (b) all bugs needs a >> regression test if possible, (c) and if not possible we need to extend > > I would love all bug fixes to come with a test (when possible). We have lots of hardware-specific tests that are marked UNSUPPORTED if say you're not running on AVX512 enabled hardware. >> the testing framework to make it possible (we've started using kernel >> namespaces to create isolated test configurations). > > Well, we have a selftest directory that should include all of these. > And most people run them on either a test box or a VM. Improving the test infrastructure must also be a priority, otherwise you will grow to the limit of that infrastructure. >> This change in reviewer priorities has had a noticeable impact on developer >> priorities over the last 5 years. Timelines for this problem will be >> measured in years. > > Your "b" above is what I would like to push. But who's going to enforce > this? With 10,000 changes per release, and a lot of them are fixes, the > best we can do is the honor system. Start shaming people that don't > have a regression test along with a Fixes tag (but we don't want people > to fix bugs without adding that tag either). There is a fine line one > must walk between getting people to change their approaches to bugs and > regression tests, and pissing them off where they start doing the > opposite of what would be best for the community. I did say "hard problem" earlier didn't I? * Start with yourself. * For everyone you know well, and have met in person, be brutal and require them to submit regression tests with their bug fixes. These people are already committed to getting their fixes in and they will understand you are making an example of them. * For everyone you don't know well, be gentle, and begin reminding them you need a regression test, and if you feel generous try to write one yourself for them. Often the act of writing such a test will show you how hard it is, and what is missing from your infrastructure to make this easy, because if it was easy everyone would do it. YMMV. -- Cheers, Carlos. From carlos at redhat.com Wed Jul 5 14:28:30 2017 From: carlos at redhat.com (Carlos O'Donell) Date: Wed, 5 Jul 2017 10:28:30 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705140607.GA30187@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <6401b327-cc2c-5e0a-716b-0b9ea70adcb0@redhat.com> On 07/05/2017 10:06 AM, Greg KH wrote: > I don't mean to poo-poo the idea, but please realize that around 75% of > the kernel is hardware/arch support, so that means that 75% of the > changes/fixes deal with hardware things (yes, change is in direct > correlation to size of the codebase in the tree, strange but true). We should distinguish between the reviewer reviewing the regression test and running the regression test. As long as the submitter ran the regression test on their hardware, and it passed, the reviewer need only review the test for logical consistency and correctness? Lack of test infrastructure was a serious problem for us in glibc. We are relying on namespaces for more complex network and filesystem testing. Without namespaces we would have needed a much more complex setup that might never have seen developer adoption. When I attended LPC 2016 I prioritized listening in on namespaces discussions to make sure nothing was changing that might break our testing framework. This conversation is going to lead down the path of driver HAL or emulation in order to provide regression testing for code above the actual hardware, and that's another hard problem, but one need not go there. Starting with real hardware tests can have benefit. In glibc we test SSE, AVX, AVX512, TSX etc. but if you don't have the extensions you get a bunch of UNSUPPORTED tests. While upstream kernel may have a more limited set of available hardware per-person, the collective set of developers has hardware to cover all configurations, and they should run the regression tests for hardware they care about ... and *must* do so if they submit a patch to fix a bug! :-) -- Cheers, Carlos. From carlos at redhat.com Wed Jul 5 15:08:48 2017 From: carlos at redhat.com (Carlos O'Donell) Date: Wed, 5 Jul 2017 11:08:48 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705103335.0cbd9984@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705103335.0cbd9984@gandalf.local.home> Message-ID: <8c6843e8-73d9-a898-0366-0b72dfeb79a2@redhat.com> On 07/05/2017 10:33 AM, Steven Rostedt wrote: > No test should be written for a single specific hardware. It should be a > general functionality that different hardware can execute. Why? We test all sorts of hardware in userspace and we see value in that testing. -- Cheers, Carlos. From James.Bottomley at HansenPartnership.com Wed Jul 5 15:32:28 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Wed, 05 Jul 2017 08:32:28 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705112047.23ee09f6@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> <20170705112047.23ee09f6@gandalf.local.home> Message-ID: <1499268748.3668.20.camel@HansenPartnership.com> On Wed, 2017-07-05 at 11:20 -0400, Steven Rostedt wrote: > On Wed, 05 Jul 2017 08:09:49 -0700 > James Bottomley wrote: > > ? > > > > > > > > > > > > > you're protecting was accessed outside the lock, which is the > > > > usual source of concurrency problems. ?In other words lockdep > > > > is useful but it's not a panacea. ? > > > > > > Still not an excuse to not have lockdep enabled during tests.?? > > > > OK, what makes you think lockdep isn't enabled? ?Since Kconfig is > > so complex, I usually use a distro config ... they have it enabled > > (or at least openSUSE does), so it's enabled for everything I do. > > openSuSE has it enabled? I hope not for its production config, as > lockdep has a huge performance penalty. Then, surely, it's the last thing we want when tracking down race conditgions since it will alter timings dramatically. > I'm thinking you don't have it enabled. What config are you looking > at? The actual config that does the testing of locks is > CONFIG_PROVE_LOCKING, which selects LOCKDEP to be compiled in. This is what it has: jejb at jarvis:~/git/linux-build> grep LOCKDEP /boot/config-4.4.73-18.17-default? CONFIG_LOCKDEP_SUPPORT=y James From greg at kroah.com Wed Jul 5 15:32:59 2017 From: greg at kroah.com (Greg KH) Date: Wed, 5 Jul 2017 17:32:59 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <20170705153259.GA7265@kroah.com> On Wed, Jul 05, 2017 at 08:16:33AM -0700, Guenter Roeck wrote: > If we start shaming people for not providing unit tests, all we'll accomplish is > that people will stop providing bug fixes. Yes, this is the key! Steven, just look at everything marked with a "Fixes:" or "stable@" tag from 4.12-rc1..4.12 and try to determine how you would write a test for the majority of them. Yes, for some subsystems this can work (look at xfstests as one great example for filesystems, same for the i915 tests), but for the majority of the kernel, at this point in time, it doesn't make sense. So take Carlos's advice, start small, do it for your subsystem if you don't touch hardware (easy peasy, right?), and let's see how it goes, and see if we have the infrastructure to do it even today. Right now, kselftests is finally getting a unified output format, which is great, it shows that people are starting to use and rely on it. What else will we need to make this more widely used, we don't know yet... thanks, greg k-h From carlos at redhat.com Wed Jul 5 15:36:23 2017 From: carlos at redhat.com (Carlos O'Donell) Date: Wed, 5 Jul 2017 11:36:23 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705153259.GA7265@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> Message-ID: On 07/05/2017 11:32 AM, Greg KH wrote: > So take Carlos's advice, start small, do it for your subsystem if you > don't touch hardware (easy peasy, right?), and let's see how it goes, > and see if we have the infrastructure to do it even today. Right now, > kselftests is finally getting a unified output format, which is great, > it shows that people are starting to use and rely on it. What else will > we need to make this more widely used, we don't know yet... +1 ;-) -- Cheers, Carlos. From James.Bottomley at HansenPartnership.com Wed Jul 5 15:36:55 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Wed, 05 Jul 2017 08:36:55 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705112707.54d7f345@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> Message-ID: <1499269015.3668.25.camel@HansenPartnership.com> On Wed, 2017-07-05 at 11:27 -0400, Steven Rostedt wrote: > On Wed, 5 Jul 2017 08:16:33 -0700 > Guenter Roeck wrote: > > > > > The reproducers for several of the usb fixes I submitted recently > > took hours of stress test to reproduce the underlying problems. I > > have one more to fix which takes days to reproduce, if at all (I > > have seen that problem only two or three times during weeks of > > stress test). Due to the nature of the problems, reproducing > > them heavily depended on the underlying hardware. None of the > > reproducers can guarantee that the problem is fixed; they are > > intended to show the problem, not that it is fixed. This happens a > > lot with race conditions - in many cases it is impossible to prove > > that the problem is fixed; one can only prove that it still exists. > > > > Echoing what you said, I have no idea how it would even be possible > > to write unit tests to verify if the problems I fixed are really > > fixed. > > > > Several of the fixes I have submitted are based on single-instance > > error logs with no reproducer. Many others are compile time fixes > > or fix problems found with code inspection (manual or automatic). > > > > If we start shaming people for not providing unit tests, all we'll > > accomplish is that people will stop providing bug fixes. > > I need to be clearer on this. What I meant was, if there's a bug > where someone has a test that easily reproduces the bug, then if > there's not a test added to selftests for said bug, then we should > shame those into doing so. > > A bug that is found by inspection or hard to reproduce test cases are > not applicable, as they don't have tests that can show a regression. > > And I'm betting that those bugs are NOT REGRESSIONS! Most likely are > bugs that always existed, but because of the unpredictable hitting of > the bug (as you said, it required hours of stress tests to > reproduce), the bug was not previously hit during development. That's > not a regression, that's a feature. > > Are we tracking regressions or just simply bugs? A lot of device driver regressions are bugs that previously existed in the code but which didn't manifest until something else happened. ?A huge number of locking and timing issues are like this. ?The irony is that a lot of them go from race always being won (so bug never noticed) to race being lost often enough to make something unusable. ?To a user that ends up being a kernel regression because it's a bug in the current kernel which they didn't see previously which makes it unusable for them. I've got to vote with my users here: that's a regression not a "feature". James From geert at linux-m68k.org Wed Jul 5 15:40:44 2017 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Wed, 5 Jul 2017 17:40:44 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705152026.rkw73q2f6xmiju37@sirena.org.uk> References: <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> <20170705152026.rkw73q2f6xmiju37@sirena.org.uk> Message-ID: On Wed, Jul 5, 2017 at 5:20 PM, Mark Brown wrote: > On Wed, Jul 05, 2017 at 08:09:49AM -0700, James Bottomley wrote: >> On Wed, 2017-07-05 at 10:56 -0400, Steven Rostedt wrote: >> > James Bottomley wrote: > >> > > you're protecting was accessed outside the lock, which is the usual >> > > source of concurrency problems. In other words lockdep is useful >> > > but it's not a panacea. > >> > Still not an excuse to not have lockdep enabled during tests. > >> OK, what makes you think lockdep isn't enabled? Since Kconfig is so >> complex, I usually use a distro config ... they have it enabled (or at >> least openSUSE does), so it's enabled for everything I do. > > Yeah, I see enough reports with it in embedded contexts to make me think > people use it there. I know I tend to have it turned on most of the > time. The concurrency stuff I'm thinking of here is more the things > you're mentioning with just not taking locks at all when they are needed > or concurrency with hardware. I try to have it enabled as much as possible. However, as it increases kernel size (huge static tables), hitting boot loader limitations on several boards, I cannot enable all debugging I would like to on all boards. Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From rostedt at goodmis.org Wed Jul 5 15:43:16 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 11:43:16 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499268748.3668.20.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> <20170705112047.23ee09f6@gandalf.local.home> <1499268748.3668.20.camel@HansenPartnership.com> Message-ID: <20170705114316.424a9e28@gandalf.local.home> On Wed, 05 Jul 2017 08:32:28 -0700 James Bottomley wrote: > > openSuSE has it enabled? I hope not for its production config, as > > lockdep has a huge performance penalty. > > Then, surely, it's the last thing we want when tracking down race > conditgions since it will alter timings dramatically. It's to be run when you want to make sure locking order is at least not an issue. And it's not about running when tracking down race conditions, its to be run when developing new code. > > > I'm thinking you don't have it enabled. What config are you looking > > at? The actual config that does the testing of locks is > > CONFIG_PROVE_LOCKING, which selects LOCKDEP to be compiled in. > > This is what it has: > > jejb at jarvis:~/git/linux-build> grep LOCKDEP /boot/config-4.4.73-18.17-default? > CONFIG_LOCKDEP_SUPPORT=y That means your architecture supports it, it's not enabled. -- Steve From broonie at kernel.org Wed Jul 5 15:47:47 2017 From: broonie at kernel.org (Mark Brown) Date: Wed, 5 Jul 2017 16:47:47 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> Message-ID: <20170705154747.gtu6v5rrol2xrgbx@sirena.org.uk> On Wed, Jul 05, 2017 at 09:09:51AM -0400, Carlos O'Donell wrote: > This problem is a reflection of our own explicit or implicit priorities. > The priorities of developers and reviewers needs to change to make an > impact on the problem. This is a hard problem. Take a look at the trajectory for the build and boot testing for a concrete example of this - the failure rates go down over time but it's not a quick process. > As a concrete action item, glibc core developers took a harder stance on > (a) all user-visible bugs need a bug # (forces people to think about the > problem and file a coherent public bug about it) (b) all bugs needs a > regression test if possible, (c) and if not possible we need to extend > the testing framework to make it possible (we've started using kernel > namespaces to create isolated test configurations). One thing I'd really like to see here is an equivalent of the build and boot testing we currently have that exercises some of the testsuites on a regular basis so we can push on keeping them running cleanly. As well as just the intrisic value of the tests themselves I'd hope that a visible practical interest would help push more activity in this area. There's a couple of efforts I'm aware of in this area, hopefully one or both of them will start delivering. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From rostedt at goodmis.org Wed Jul 5 15:52:19 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 11:52:19 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705153259.GA7265@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> Message-ID: <20170705115219.02370220@gandalf.local.home> On Wed, 5 Jul 2017 17:32:59 +0200 Greg KH wrote: > On Wed, Jul 05, 2017 at 08:16:33AM -0700, Guenter Roeck wrote: > > If we start shaming people for not providing unit tests, all we'll accomplish is > > that people will stop providing bug fixes. > > Yes, this is the key! And I mentioned this in my initial email. > > Steven, just look at everything marked with a "Fixes:" or "stable@" tag > from 4.12-rc1..4.12 and try to determine how you would write a test for > the majority of them. It only makes sense if there's a reproducible case. For cases where stress testing is required and you hope to hit the bug, well, that's never an easy answer, and this is not something that will fix it. > > Yes, for some subsystems this can work (look at xfstests as one great > example for filesystems, same for the i915 tests), but for the majority > of the kernel, at this point in time, it doesn't make sense. I already do. Actually, I have just fixed a bug that I need to add a selftest for. Yes, it is easier for non hardware, but for cases which has specs on hardware behavior, why can't we have tests to test if the hardware matches the spec? Everyone is focusing on that "shaming" comment and not looking at the rest of what I wrote. My main point is, there's a lot of reproducers in change logs or emails that are not in selftests. There's no excuse for that. Lets fix that issue, and not go into a bike shedding fight about the entire approach. > > So take Carlos's advice, start small, do it for your subsystem if you Yes, lets start small. What do you think about all reproducers getting into selftests? If it's not 100% reproducing, then it's up to the individual, but any test that can trigger a bug 100% should be added. I'd like to expand selftests to include configs too. If there's a config that triggers a bug, that should be added to a list of "configs" to be tested as well. > don't touch hardware (easy peasy, right?), and let's see how it goes, > and see if we have the infrastructure to do it even today. Right now, > kselftests is finally getting a unified output format, which is great, > it shows that people are starting to use and rely on it. What else will > we need to make this more widely used, we don't know yet... I've been using selftests for ftrace for some time. I have my own tests that I run (which do test any config that has failed me in the past), and I'm slowing getting those into the selftests directory as well. -- Steve From rostedt at goodmis.org Wed Jul 5 16:04:59 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 12:04:59 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499269015.3668.25.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <1499269015.3668.25.camel@HansenPartnership.com> Message-ID: <20170705120459.41e81f7b@gandalf.local.home> On Wed, 05 Jul 2017 08:36:55 -0700 James Bottomley wrote: > > Are we tracking regressions or just simply bugs? > > A lot of device driver regressions are bugs that previously existed in > the code but which didn't manifest until something else happened. ?A > huge number of locking and timing issues are like this. ?The irony is > that a lot of them go from race always being won (so bug never noticed) > to race being lost often enough to make something unusable. ?To a user > that ends up being a kernel regression because it's a bug in the > current kernel which they didn't see previously which makes it unusable > for them. > > I've got to vote with my users here: that's a regression not a > "feature". Let's take a step back. What exactly is the problem? The regressions that we want to track? Why are they not fixed? Is it because they are hard to reproduce? If so, how do we know they are a regression or just some hard to hit bug? If it's hard to hit, how do we know we fixed it? What exactly are the questions we want solved. Granted, I used this thread to push more use of kselftests, and I don't see any SCSI tests there at all! -- Steve From rostedt at goodmis.org Wed Jul 5 16:10:00 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 12:10:00 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <8c6843e8-73d9-a898-0366-0b72dfeb79a2@redhat.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705103335.0cbd9984@gandalf.local.home> <8c6843e8-73d9-a898-0366-0b72dfeb79a2@redhat.com> Message-ID: <20170705121000.5430d7d0@gandalf.local.home> On Wed, 5 Jul 2017 11:08:48 -0400 Carlos O'Donell wrote: > On 07/05/2017 10:33 AM, Steven Rostedt wrote: > > No test should be written for a single specific hardware. It should be a > > general functionality that different hardware can execute. > > Why? We test all sorts of hardware in userspace and we see value in that > testing. > One reason is for bit rot. I'm not totally against it. But I envision that if we have hundreds of tests for very specific pieces of hardware, it's value will diminish over time. Unless we can get a good infrastructure written where the hardware info is more of a data sheet then a single test itself. -- Steve From linux at roeck-us.net Wed Jul 5 16:48:31 2017 From: linux at roeck-us.net (Guenter Roeck) Date: Wed, 5 Jul 2017 09:48:31 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705112707.54d7f345@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> Message-ID: On 07/05/2017 08:27 AM, Steven Rostedt wrote: > On Wed, 5 Jul 2017 08:16:33 -0700 > Guenter Roeck wrote: [ ... ] >> >> If we start shaming people for not providing unit tests, all we'll accomplish is >> that people will stop providing bug fixes. > > I need to be clearer on this. What I meant was, if there's a bug > where someone has a test that easily reproduces the bug, then if > there's not a test added to selftests for said bug, then we should > shame those into doing so. > I don't think that public shaming of kernel developers is going to work any better than public shaming of children or teenagers. Maybe a friendlier approach would be more useful ? If a test to reproduce a problem exists, it might be more beneficial to suggest to the patch submitter that it would be great if that test would be submitted as unit test instead of shaming that person for not doing so. Acknowledging and praising kselftest submissions might help more than shaming for non-submissions. > A bug that is found by inspection or hard to reproduce test cases are > not applicable, as they don't have tests that can show a regression. > My concern would be that once the shaming starts, it won't stop. Guenter From dan.j.williams at intel.com Wed Jul 5 16:54:29 2017 From: dan.j.williams at intel.com (Dan Williams) Date: Wed, 5 Jul 2017 09:54:29 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705140607.GA30187@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: On Wed, Jul 5, 2017 at 7:06 AM, Greg KH wrote: > On Wed, Jul 05, 2017 at 09:27:57AM -0400, Steven Rostedt wrote: >> Your "b" above is what I would like to push. But who's going to enforce >> this? With 10,000 changes per release, and a lot of them are fixes, the >> best we can do is the honor system. Start shaming people that don't >> have a regression test along with a Fixes tag (but we don't want people >> to fix bugs without adding that tag either). There is a fine line one >> must walk between getting people to change their approaches to bugs and >> regression tests, and pissing them off where they start doing the >> opposite of what would be best for the community. > > I would bet, for the huge majority of our fixes, they are fixes for > specific hardware, or workarounds for specific hardware issues. Now > writing tests for those is not an impossible task (look at what the i915 > developers have), but it is very very hard overall, especially if the > base infrastructure isn't there to do it. > > For specific examples, here's the shortlog for fixes that went into > drivers/usb/host/ for 4.12 after 4.12-rc1 came out. Do you know of a > way to write a test for these types of things? > usb: xhci: ASMedia ASM1042A chipset need shorts TX quirk > usb: xhci: Fix USB 3.1 supported protocol parsing > usb: host: xhci-plat: propagate return value of platform_get_irq() > xhci: Fix command ring stop regression in 4.11 > xhci: remove GFP_DMA flag from allocation > USB: xhci: fix lock-inversion problem > usb: host: xhci-ring: don't need to clear interrupt pending for MSI enabled hcd > usb: host: xhci-mem: allocate zeroed Scratchpad Buffer > xhci: apply PME_STUCK_QUIRK and MISSING_CAS quirk for Denverton > usb: xhci: trace URB before giving it back instead of after > USB: host: xhci: use max-port define > USB: ehci-platform: fix companion-device leak > usb: r8a66597-hcd: select a different endpoint on timeout > usb: r8a66597-hcd: decrease timeout I wrote some test infrastructure to go after xhci TRB boundary conditions [1]. So, yes, some of these are possible to unit test, but of course not all. [1]: http://marc.info/?l=linux-usb&m=140872785411304&w=2 From dan.j.williams at intel.com Wed Jul 5 16:58:06 2017 From: dan.j.williams at intel.com (Dan Williams) Date: Wed, 5 Jul 2017 09:58:06 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> Message-ID: On Wed, Jul 5, 2017 at 9:48 AM, Guenter Roeck wrote: > On 07/05/2017 08:27 AM, Steven Rostedt wrote: >> >> On Wed, 5 Jul 2017 08:16:33 -0700 >> Guenter Roeck wrote: > > [ ... ] >>> >>> >>> If we start shaming people for not providing unit tests, all we'll >>> accomplish is >>> that people will stop providing bug fixes. >> >> >> I need to be clearer on this. What I meant was, if there's a bug >> where someone has a test that easily reproduces the bug, then if >> there's not a test added to selftests for said bug, then we should >> shame those into doing so. >> > > I don't think that public shaming of kernel developers is going to work > any better than public shaming of children or teenagers. > > Maybe a friendlier approach would be more useful ? > > If a test to reproduce a problem exists, it might be more beneficial to > suggest > to the patch submitter that it would be great if that test would be > submitted > as unit test instead of shaming that person for not doing so. Acknowledging > and > praising kselftest submissions might help more than shaming for > non-submissions. > >> A bug that is found by inspection or hard to reproduce test cases are >> not applicable, as they don't have tests that can show a regression. >> > > My concern would be that once the shaming starts, it won't stop. Agreed, this shouldn't be a new burden for maintainers, this should be a contribution path for new kernel developers. Go beyond our standard "fix a bug" advice, which is a great advice, and also recommend "backstop a regression with a unit test". From James.Bottomley at HansenPartnership.com Wed Jul 5 16:58:28 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Wed, 05 Jul 2017 09:58:28 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705120459.41e81f7b@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <1499269015.3668.25.camel@HansenPartnership.com> <20170705120459.41e81f7b@gandalf.local.home> Message-ID: <1499273908.3668.30.camel@HansenPartnership.com> On Wed, 2017-07-05 at 12:04 -0400, Steven Rostedt wrote: > On Wed, 05 Jul 2017 08:36:55 -0700 > James Bottomley wrote: > > > > > > > > > Are we tracking regressions or just simply bugs??? > > > > A lot of device driver regressions are bugs that previously existed > > in the code but which didn't manifest until something else > > happened. ?A huge number of locking and timing issues are like > > this. ?The irony is that a lot of them go from race always being > > won (so bug never noticed) to race being lost often enough to make > > something unusable. ?To a user that ends up being a kernel > > regression because it's a bug in the current kernel which they > > didn't see previously which makes it unusable for them. > > > > I've got to vote with my users here: that's a regression not a > > "feature". > > Let's take a step back. What exactly is the problem? You mean what question was I answering? ?It was your "is your problem a regression?" one. > The regressions that we want to track? Why are they not fixed? Is it > because they are hard to reproduce? If so, how do we know they are a > regression or just some hard to hit bug? If it's hard to hit, how do > we know we fixed it? Usually for the exposed races we develop a theoretical model which tells us what the problem is and also the solution. > What exactly are the questions we want solved. In the context of this subthread? ?Tracking and fixing of regressions meaning behaviour that damages or destroys usability of version k+1 that wasn't present in version k. > Granted, I used this thread to push more use of kselftests, and I > don't see any SCSI tests there at all! It would be an interesting question for another thread to consider whether that's a problem or not. James From rostedt at goodmis.org Wed Jul 5 17:02:00 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 13:02:00 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> Message-ID: <20170705130200.7c653f61@gandalf.local.home> On Wed, 5 Jul 2017 09:48:31 -0700 Guenter Roeck wrote: > On 07/05/2017 08:27 AM, Steven Rostedt wrote: > > On Wed, 5 Jul 2017 08:16:33 -0700 > > Guenter Roeck wrote: > [ ... ] > >> > >> If we start shaming people for not providing unit tests, all we'll accomplish is > >> that people will stop providing bug fixes. > > > > I need to be clearer on this. What I meant was, if there's a bug > > where someone has a test that easily reproduces the bug, then if > > there's not a test added to selftests for said bug, then we should > > shame those into doing so. > > > > I don't think that public shaming of kernel developers is going to work > any better than public shaming of children or teenagers. > > Maybe a friendlier approach would be more useful ? I'm a friendly shamer ;-) > > If a test to reproduce a problem exists, it might be more beneficial to suggest > to the patch submitter that it would be great if that test would be submitted > as unit test instead of shaming that person for not doing so. Acknowledging and > praising kselftest submissions might help more than shaming for non-submissions. > > > A bug that is found by inspection or hard to reproduce test cases are > > not applicable, as they don't have tests that can show a regression. > > > > My concern would be that once the shaming starts, it won't stop. I think this is a communication issue. My word for "shaming" was to call out a developer for not submitting a test. It wasn't about making fun of them, or anything like that. I was only making a point about how to teach people that they need to be more aware of the testing infrastructure. Not about actually demeaning people. Lets take a hypothetical sample. Say someone posted a bug report with an associated reproducer for it. The developer then runs the reproducer sees the bug, makes a fix and sends it to Linus and stable. Now the developer forgets this and continues on their merry way. Along comes someone like myself and sees a reproducing test case for a bug, but sees no test added to kselftests. I would send an email along the lines of "Hi, I noticed that there was a reproducer for this bug you fixed. How come there was no test added to the kselftests to make sure it doesn't appear again?" There, I "shamed" them ;-) -- Steve From rostedt at goodmis.org Wed Jul 5 17:07:24 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 5 Jul 2017 13:07:24 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499273908.3668.30.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <1499269015.3668.25.camel@HansenPartnership.com> <20170705120459.41e81f7b@gandalf.local.home> <1499273908.3668.30.camel@HansenPartnership.com> Message-ID: <20170705130724.66637518@gandalf.local.home> On Wed, 05 Jul 2017 09:58:28 -0700 James Bottomley wrote: > On Wed, 2017-07-05 at 12:04 -0400, Steven Rostedt wrote: > > On Wed, 05 Jul 2017 08:36:55 -0700 > > James Bottomley wrote: > > > > > > > > > > > > > Are we tracking regressions or just simply bugs??? > > > > > > A lot of device driver regressions are bugs that previously existed > > > in the code but which didn't manifest until something else > > > happened. ?A huge number of locking and timing issues are like > > > this. ?The irony is that a lot of them go from race always being > > > won (so bug never noticed) to race being lost often enough to make > > > something unusable. ?To a user that ends up being a kernel > > > regression because it's a bug in the current kernel which they > > > didn't see previously which makes it unusable for them. > > > > > > I've got to vote with my users here: that's a regression not a > > > "feature". > > > > Let's take a step back. What exactly is the problem? > > You mean what question was I answering? ?It was your "is your problem a > regression?" one. No that's not what I meant. I mean that we are going off tangent to the original topic. > > > The regressions that we want to track? Why are they not fixed? Is it > > because they are hard to reproduce? If so, how do we know they are a > > regression or just some hard to hit bug? If it's hard to hit, how do > > we know we fixed it? > > Usually for the exposed races we develop a theoretical model which > tells us what the problem is and also the solution. I think the problem is that the regressions that are not being fixed happen to be where we have no model to create, as the problem may be too hard to hit, and it could just be a "works for me" issue. > > > What exactly are the questions we want solved. > > In the context of this subthread? ?Tracking and fixing of regressions > meaning behaviour that damages or destroys usability of version k+1 > that wasn't present in version k. Agreed with this part. And I believe this is also in the context of the entire thread. > > > Granted, I used this thread to push more use of kselftests, and I > > don't see any SCSI tests there at all! > > It would be an interesting question for another thread to consider > whether that's a problem or not. It's not a problem for me, but it begs the question of whether it would be useful or not. But I agree, that's for another thread. -- Steve From daniel.vetter at ffwll.ch Wed Jul 5 18:17:05 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Wed, 5 Jul 2017 20:17:05 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705103658.226099c6@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> Message-ID: On Wed, Jul 5, 2017 at 4:36 PM, Steven Rostedt wrote: > On Wed, 5 Jul 2017 15:33:41 +0100 > Mark Brown wrote: >> On Wed, Jul 05, 2017 at 04:06:07PM +0200, Greg KH wrote: >> > I don't mean to poo-poo the idea, but please realize that around 75% of >> > the kernel is hardware/arch support, so that means that 75% of the >> > changes/fixes deal with hardware things (yes, change is in direct >> > correlation to size of the codebase in the tree, strange but true). >> >> Then add in all the fixes for concurrency/locking issues and so on >> that're hard to reliably reproduce as well... > > All tests should be run with lockdep enabled ;-) Which a surprising > few developers appear to do :-p We're slowly working towards running the i915 testsuite with kasan enabled as the next level of evil. It's ... interesting, to say the least. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From daniel.vetter at ffwll.ch Wed Jul 5 18:24:26 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Wed, 5 Jul 2017 20:24:26 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499267389.3668.16.camel@HansenPartnership.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705143341.oees22k2snhtmkxo@sirena.org.uk> <20170705103658.226099c6@gandalf.local.home> <1499266228.3668.10.camel@HansenPartnership.com> <20170705105651.5da9c969@gandalf.local.home> <1499267389.3668.16.camel@HansenPartnership.com> Message-ID: On Wed, Jul 5, 2017 at 5:09 PM, James Bottomley wrote: >> > > All tests should be run with lockdep enabled ;-) Which a >> > > surprising few developers appear to do :-p >> > >> > Lockdep checks the locking hierarchies and makes assumptions about >> > them which it then validates ... it doesn't tell you if the data >> > you think >> >> We should probably look at adding infrastructure that helps in that. >> RCU already has a lot of there to help know if data is being >> protected by RCU or not. >> >> Hmm, maybe we could add a __rcu like type that we can associate >> protected data with, where a config can associate access to a >> variable with a lock being held? > > That's about 10x more complex than the releases/acquires/must_hold > annotation, which we have fairly dismal coverage on. Yeah, I've never found those useful at all. What we're trying to do in drm code is liberally sprinkle lockdep_assert_held into accessor and helper functions (there's lots of nontrivial stuff where you need a little bit of computation around a pure access, so doesn't result in ugly code). That catches a lot of these, but of course not all. The problem with static annotations is that often the lock you need to hold isn't statically known, and annotating the entire callchain is a no-go as James points out. But maybe we could use such annotations plus a gcc plugin to auto-insert the right lockdep_assert_held every time you read/write into a given field? That's not going to cover locking rules where the locking rules change during the lifetime of an object, but I think even without that it would cover a _lot_ of cases. And if your static annotation would be allowed to chase pointers (well, just any C expression that takes the struct pointer as parameter would be sweet) you could even annotate fields where the protecting lock is in some parent struct. Another thing I'm really looking forward to (but it's somehow not moving fast) is the cross-release stuff. Too many times I've screamed at kernel backtraces stuck in wait_event, and lockdep could have directly told me what's wrong long before a stress test successfully hit that race. There's definitely a lot of room to prove more stuff in locking using tools. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From daniel.vetter at ffwll.ch Wed Jul 5 18:29:51 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Wed, 5 Jul 2017 20:29:51 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705153259.GA7265@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> Message-ID: On Wed, Jul 5, 2017 at 5:32 PM, Greg KH wrote: > On Wed, Jul 05, 2017 at 08:16:33AM -0700, Guenter Roeck wrote: >> If we start shaming people for not providing unit tests, all we'll accomplish is >> that people will stop providing bug fixes. > > Yes, this is the key! > > Steven, just look at everything marked with a "Fixes:" or "stable@" tag > from 4.12-rc1..4.12 and try to determine how you would write a test for > the majority of them. > > Yes, for some subsystems this can work (look at xfstests as one great > example for filesystems, same for the i915 tests), but for the majority > of the kernel, at this point in time, it doesn't make sense. > > So take Carlos's advice, start small, do it for your subsystem if you > don't touch hardware (easy peasy, right?), and let's see how it goes, > and see if we have the infrastructure to do it even today. Right now, > kselftests is finally getting a unified output format, which is great, > it shows that people are starting to use and rely on it. What else will > we need to make this more widely used, we don't know yet... This is very hard work and takes a long time. Since 3 years I'm trying to establish the i915 test suite as an overall drm validation set. At least the generic parts like for the cross-driver kernel modeset interfaces, but also allowing other drivers to test their hw specific command submission. It's very slow going ... -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From greg at kroah.com Wed Jul 5 18:42:45 2017 From: greg at kroah.com (Greg KH) Date: Wed, 5 Jul 2017 20:42:45 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705115219.02370220@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> <20170705115219.02370220@gandalf.local.home> Message-ID: <20170705184245.GA22044@kroah.com> On Wed, Jul 05, 2017 at 11:52:19AM -0400, Steven Rostedt wrote: > > So take Carlos's advice, start small, do it for your subsystem if you > > Yes, lets start small. What do you think about all reproducers getting > into selftests? If it's not 100% reproducing, then it's up to the > individual, but any test that can trigger a bug 100% should be added. That would be great. One could argue that we should be adding the "stack guard" testing apps to the selftest tree now, as a number of us have them floating around in their test directories at the moment. > I'd like to expand selftests to include configs too. If there's a > config that triggers a bug, that should be added to a list of "configs" > to be tested as well. So a test needs a specific configuration? We need a way to specify that in a generic fashion so that all tests don't have to duplicate that logic. Time to write a helper function to parse /proc/config.gz :) thanks, greg k-h From greg at kroah.com Wed Jul 5 18:45:44 2017 From: greg at kroah.com (Greg KH) Date: Wed, 5 Jul 2017 20:45:44 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> Message-ID: <20170705184544.GB22044@kroah.com> On Wed, Jul 05, 2017 at 09:54:29AM -0700, Dan Williams wrote: > > I wrote some test infrastructure to go after xhci TRB boundary > conditions [1]. So, yes, some of these are possible to unit test, but > of course not all. > > [1]: http://marc.info/?l=linux-usb&m=140872785411304&w=2 I forgot about that, what ever happened to it, any reason it never got merged? thanks, greg k-h From dan.j.williams at intel.com Wed Jul 5 19:47:25 2017 From: dan.j.williams at intel.com (Dan Williams) Date: Wed, 5 Jul 2017 12:47:25 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705184544.GB22044@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705184544.GB22044@kroah.com> Message-ID: On Wed, Jul 5, 2017 at 11:45 AM, Greg KH wrote: > On Wed, Jul 05, 2017 at 09:54:29AM -0700, Dan Williams wrote: >> >> I wrote some test infrastructure to go after xhci TRB boundary >> conditions [1]. So, yes, some of these are possible to unit test, but >> of course not all. >> >> [1]: http://marc.info/?l=linux-usb&m=140872785411304&w=2 > > I forgot about that, what ever happened to it, any reason it never got > merged? Ran out of time before being consumed by NVDIMM stuff, but I did take some of the lessons learned over into tools/testing/nvdimm/. I haven't done the work to integrate that into kselftest, so far it's only exercised by the tests in the 'ndctl' [1] utility. [1]: https://github.com/pmem/ndctl From broonie at kernel.org Thu Jul 6 09:28:36 2017 From: broonie at kernel.org (Mark Brown) Date: Thu, 6 Jul 2017 10:28:36 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705130200.7c653f61@gandalf.local.home> References: <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> Message-ID: <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> On Wed, Jul 05, 2017 at 01:02:00PM -0400, Steven Rostedt wrote: > Guenter Roeck wrote: > > If a test to reproduce a problem exists, it might be more beneficial to suggest > > to the patch submitter that it would be great if that test would be submitted > > as unit test instead of shaming that person for not doing so. Acknowledging and > > praising kselftest submissions might help more than shaming for non-submissions. > > My concern would be that once the shaming starts, it won't stop. > I think this is a communication issue. My word for "shaming" was to > call out a developer for not submitting a test. It wasn't about making > fun of them, or anything like that. I was only making a point > about how to teach people that they need to be more aware of the > testing infrastructure. Not about actually demeaning people. I think before anything like that is viable we need to show a concerted and visible interest in actually running the tests we already have and paying attention to the results - if people can see that they're just checking a checkbox that will often result in low quality tests which can do more harm than good. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From daniel.vetter at ffwll.ch Thu Jul 6 09:41:39 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Thu, 6 Jul 2017 11:41:39 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> References: <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> Message-ID: On Thu, Jul 6, 2017 at 11:28 AM, Mark Brown wrote: > On Wed, Jul 05, 2017 at 01:02:00PM -0400, Steven Rostedt wrote: >> Guenter Roeck wrote: > >> > If a test to reproduce a problem exists, it might be more beneficial to suggest >> > to the patch submitter that it would be great if that test would be submitted >> > as unit test instead of shaming that person for not doing so. Acknowledging and >> > praising kselftest submissions might help more than shaming for non-submissions. > >> > My concern would be that once the shaming starts, it won't stop. > >> I think this is a communication issue. My word for "shaming" was to >> call out a developer for not submitting a test. It wasn't about making >> fun of them, or anything like that. I was only making a point >> about how to teach people that they need to be more aware of the >> testing infrastructure. Not about actually demeaning people. > > I think before anything like that is viable we need to show a concerted > and visible interest in actually running the tests we already have and > paying attention to the results - if people can see that they're just > checking a checkbox that will often result in low quality tests which > can do more harm than good. +1. That pretty much means large-scale CI. The i915 test suite has suffered quite a bit over the past years because the CI infrastructure didn't keep up. Result is that running full CI kills pretty much every platform there is eventually, and it's really hard to get back to a state where the testsuite can be used to catch regressions again. -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From laurent.pinchart at ideasonboard.com Thu Jul 6 11:34:38 2017 From: laurent.pinchart at ideasonboard.com (Laurent Pinchart) Date: Thu, 06 Jul 2017 14:34:38 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705121000.5430d7d0@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <8c6843e8-73d9-a898-0366-0b72dfeb79a2@redhat.com> <20170705121000.5430d7d0@gandalf.local.home> Message-ID: <7042009.5tkGy6PEBL@avalon> On Wednesday 05 Jul 2017 12:10:00 Steven Rostedt wrote: > On Wed, 5 Jul 2017 11:08:48 -0400 Carlos O'Donell wrote: > > On 07/05/2017 10:33 AM, Steven Rostedt wrote: > > > No test should be written for a single specific hardware. It should be a > > > general functionality that different hardware can execute. > > > > Why? We test all sorts of hardware in userspace and we see value in that > > testing. > > One reason is for bit rot. I'm not totally against it. But I envision > that if we have hundreds of tests for very specific pieces of hardware, > it's value will diminish over time. Unless we can get a good > infrastructure written where the hardware info is more of a data sheet > then a single test itself. That's all nice, but when the hardware is complex and not fully abstracted behind a kernel API, tests are bound to be hardware-specific. Of course, a bug or regression observed only with a specific device, but triggered through the usage of abstract APIs only, can lead to a test case written for that device but runnable with any device in the same category. In that case the test case should certainly be added to a test suite for the corresponding API/subsystem, not to an hidden test suite for a particular device. -- Regards, Laurent Pinchart From dan.carpenter at oracle.com Thu Jul 6 14:40:29 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Thu, 6 Jul 2017 17:40:29 +0300 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170627135839.GB1886@jagdpanzerIV.localdomain> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> Message-ID: <20170706144028.46a2mt2mdzpt6ip7@mwanda> People have mentioned "make oldconfig" but I've never had a lot of luck with that. It always just prints "* Restart config..." and deletes my config. Also I hate menus. It's such a pain if you want to enable a feature and you have to do a dungeon crawl through our menu system to try find it. I wrote a script a couple years ago to create kernel configs. I do a make defconfig, then I take a distro config and I do: for i in $(grep =m old_config) ; do ./scripts/kconfig/kconfig set $i done This prints a lot of errors and the code is only half implemented but it's honestly the easiest way for me to get a bootable kernel these days. If someone wanted to the could add a "./scripts/kconfig/kconfig file " command that would read a line at a time and call `./scripts/kconfig/kconfig set $line` over and over. regards, dan carpenter From dan.carpenter at oracle.com Thu Jul 6 14:41:16 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Thu, 6 Jul 2017 17:41:16 +0300 Subject: [Ksummit-discuss] [PATCH 1/2] kconfig: add a silent option to conf_write() In-Reply-To: <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: <20170706144116.kcvhyxezcpinhwq7@mwanda> The conf_write() function prints output "configuration written to .config" but I don't want it to print anything so I have added an option for that. Signed-off-by: Dan Carpenter --- scripts/kconfig/conf.c | 4 ++-- scripts/kconfig/confdata.c | 5 +++-- scripts/kconfig/gconf.c | 4 ++-- scripts/kconfig/lkc_proto.h | 2 +- scripts/kconfig/mconf.c | 4 ++-- scripts/kconfig/nconf.c | 4 ++-- 6 files changed, 12 insertions(+), 11 deletions(-) diff --git a/scripts/kconfig/conf.c b/scripts/kconfig/conf.c index 866369f10ff8..c73b5ab859a2 100644 --- a/scripts/kconfig/conf.c +++ b/scripts/kconfig/conf.c @@ -690,7 +690,7 @@ int main(int ac, char **av) /* silentoldconfig is used during the build so we shall update autoconf. * All other commands are only used to generate a config. */ - if (conf_get_changed() && conf_write(NULL)) { + if (conf_get_changed() && conf_write(NULL, 0)) { fprintf(stderr, _("\n*** Error during writing of the configuration.\n\n")); exit(1); } @@ -705,7 +705,7 @@ int main(int ac, char **av) return 1; } } else if (input_mode != listnewconfig) { - if (conf_write(NULL)) { + if (conf_write(NULL, 0)) { fprintf(stderr, _("\n*** Error during writing of the configuration.\n\n")); exit(1); } diff --git a/scripts/kconfig/confdata.c b/scripts/kconfig/confdata.c index 297b079ae4d9..7e8dbae6af30 100644 --- a/scripts/kconfig/confdata.c +++ b/scripts/kconfig/confdata.c @@ -738,7 +738,7 @@ int conf_write_defconfig(const char *filename) return 0; } -int conf_write(const char *name) +int conf_write(const char *name, bool silent) { FILE *out; struct symbol *sym; @@ -831,7 +831,8 @@ int conf_write(const char *name) return 1; } - conf_message(_("configuration written to %s"), newname); + if (!silent) + conf_message(_("configuration written to %s"), newname); sym_set_change_count(0); diff --git a/scripts/kconfig/gconf.c b/scripts/kconfig/gconf.c index cfddddb9c9d7..115b5602d05e 100644 --- a/scripts/kconfig/gconf.c +++ b/scripts/kconfig/gconf.c @@ -523,7 +523,7 @@ void on_load1_activate(GtkMenuItem * menuitem, gpointer user_data) void on_save_activate(GtkMenuItem * menuitem, gpointer user_data) { - if (conf_write(NULL)) + if (conf_write(NULL), 0) text_insert_msg(_("Error"), _("Unable to save configuration !")); } @@ -536,7 +536,7 @@ store_filename(GtkFileSelection * file_selector, gpointer user_data) fn = gtk_file_selection_get_filename(GTK_FILE_SELECTION (user_data)); - if (conf_write(fn)) + if (conf_write(fn), 0) text_insert_msg(_("Error"), _("Unable to save configuration !")); gtk_widget_destroy(GTK_WIDGET(user_data)); diff --git a/scripts/kconfig/lkc_proto.h b/scripts/kconfig/lkc_proto.h index d5398718ec2a..1690888bdbc4 100644 --- a/scripts/kconfig/lkc_proto.h +++ b/scripts/kconfig/lkc_proto.h @@ -5,7 +5,7 @@ void conf_parse(const char *name); int conf_read(const char *name); int conf_read_simple(const char *name, int); int conf_write_defconfig(const char *name); -int conf_write(const char *name); +int conf_write(const char *name, bool silent); int conf_write_autoconf(void); bool conf_get_changed(void); void conf_set_changed_callback(void (*fn)(void)); diff --git a/scripts/kconfig/mconf.c b/scripts/kconfig/mconf.c index 315ce2c7cb9d..c029b5417fa9 100644 --- a/scripts/kconfig/mconf.c +++ b/scripts/kconfig/mconf.c @@ -937,7 +937,7 @@ static void conf_save(void) case 0: if (!dialog_input_result[0]) return; - if (!conf_write(dialog_input_result)) { + if (!conf_write(dialog_input_result, 0)) { set_config_filename(dialog_input_result); return; } @@ -971,7 +971,7 @@ static int handle_exit(void) switch (res) { case 0: - if (conf_write(filename)) { + if (conf_write(filename, 0)) { fprintf(stderr, _("\n\n" "Error while writing of the configuration.\n" "Your configuration changes were NOT saved." diff --git a/scripts/kconfig/nconf.c b/scripts/kconfig/nconf.c index 003114779815..b4b0666bdc4c 100644 --- a/scripts/kconfig/nconf.c +++ b/scripts/kconfig/nconf.c @@ -666,7 +666,7 @@ static int do_exit(void) /* if we got here, the user really wants to exit */ switch (res) { case 0: - res = conf_write(filename); + res = conf_write(filename, 0); if (res) btn_dialog( main_window, @@ -1436,7 +1436,7 @@ static void conf_save(void) case 0: if (!dialog_input_result[0]) return; - res = conf_write(dialog_input_result); + res = conf_write(dialog_input_result, 0); if (!res) { set_config_filename(dialog_input_result); return; -- 2.11.0 From dan.carpenter at oracle.com Thu Jul 6 14:42:08 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Thu, 6 Jul 2017 17:42:08 +0300 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: <20170706144208.6hlgxwo37gntk6qm@mwanda> This tool barely works, it's just a rough draft. Sometimes I want to search for a config so I have to load menuconfig, then search for the config entry, then exit. With this script I simply run: ./scripts/kconfig/kconfig search COMEDI Quite often I find myself trying to enable a feature by doing this: echo CONFIG_FEATURE=y >> .config But when I try to boot the new kernel, I find that the feature isn't there because the kernel runs `make oldconfig` and I didn't have all the depends selected so it silently removed it. With this feature what you can do is: ./scripts/kconfig/kconfig set FEATURE=y It helps you enable the dependencies or it at least prints an error if it can't enable the feature. But this code isn't all implemented. 1) It doesn't calculate the dependencies well. See expr_parse() for more details. 2) It doesn't work well for things like: ./scripts/kconfig/kconfig set BT_INTEL=m because those aren't visible, they can only be using depend statements. Or say you try to set FEATURE=m when something else depends on it be set =y then the error message is wrong. The other problem is that I don't know how to print the help text. Again, this is just a rough draft. Signed-off-by: Dan Carpenter --- scripts/kconfig/Makefile | 6 +- scripts/kconfig/kconfig | 33 +++++ scripts/kconfig/lconf.c | 332 +++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 370 insertions(+), 1 deletion(-) create mode 100755 scripts/kconfig/kconfig create mode 100644 scripts/kconfig/lconf.c diff --git a/scripts/kconfig/Makefile b/scripts/kconfig/Makefile index eb8144643b78..a2a90be2e149 100644 --- a/scripts/kconfig/Makefile +++ b/scripts/kconfig/Makefile @@ -33,6 +33,9 @@ config: $(obj)/conf nconfig: $(obj)/nconf $< $(silent) $(Kconfig) +lconfig: $(obj)/lconf + @ $< $(silent) $(Kconfig) + silentoldconfig: $(obj)/conf $(Q)mkdir -p include/config include/generated $(Q)test -e include/generated/autoksyms.h || \ @@ -183,12 +186,13 @@ lxdialog += lxdialog/textbox.o lxdialog/yesno.o lxdialog/menubox.o conf-objs := conf.o zconf.tab.o mconf-objs := mconf.o zconf.tab.o $(lxdialog) nconf-objs := nconf.o zconf.tab.o nconf.gui.o +lconf-objs := lconf.o zconf.tab.o kxgettext-objs := kxgettext.o zconf.tab.o qconf-cxxobjs := qconf.o qconf-objs := zconf.tab.o gconf-objs := gconf.o zconf.tab.o -hostprogs-y := conf nconf mconf kxgettext qconf gconf +hostprogs-y := conf nconf mconf kxgettext qconf gconf lconf clean-files := qconf.moc .tmp_qtcheck .tmp_gtkcheck clean-files += zconf.tab.c zconf.lex.c zconf.hash.c gconf.glade.h diff --git a/scripts/kconfig/kconfig b/scripts/kconfig/kconfig new file mode 100755 index 000000000000..beab8fc829c9 --- /dev/null +++ b/scripts/kconfig/kconfig @@ -0,0 +1,33 @@ +#!/bin/sh + +usage() { + echo "kconfig [search|set] string" + exit 1; +} + +if [ "$1" = "" ] ; then + usage +fi + +if [ "$1" = "search" ] ; then + + search=$2 + NCONFIG_MODE=kconfig_search SEARCH=${search} make lconfig + +elif [ "$1" = "set" ] ; then + + config=$2 + setting=$3 + + if [ $config = "" ] ; then + echo "nothing to set" + exit 1 + fi + + NCONFIG_MODE=kconfig_set CONFIG=${config} SETTING=${setting} make lconfig + +else + usage +fi + + diff --git a/scripts/kconfig/lconf.c b/scripts/kconfig/lconf.c new file mode 100644 index 000000000000..ebc3cbd4ef83 --- /dev/null +++ b/scripts/kconfig/lconf.c @@ -0,0 +1,332 @@ +/* + * Copyright (C) 2015 Oracle + * Released under the terms of the GNU GPL v2.0. + * + */ +#define _GNU_SOURCE +#include +#include + +#include "lkc.h" +#include "nconf.h" +#include + +static int indent; +static char line[128]; + +static int get_depends(struct symbol *sym); + +static void strip(char *str) +{ + char *p = str; + int l; + + while ((isspace(*p))) + p++; + l = strlen(p); + if (p != str) + memmove(str, p, l + 1); + if (!l) + return; + p = str + l - 1; + while ((isspace(*p))) + *p-- = 0; +} + +static void xfgets(char *str, int size, FILE *in) +{ + if (fgets(str, size, in) == NULL) + fprintf(stderr, "\nError in reading or end of file.\n"); +} + +static tristate str_to_tristate(const char *str) +{ + switch (str[0]) { + case 'y': case 'Y': + return yes; + case 'm': case 'M': + return mod; + case 'n': case 'N': + default: + return no; + } +} + +static int conf_askvalue(struct symbol *sym, const char *def) +{ + enum symbol_type type = sym_get_type(sym); + + if (!sym_has_value(sym)) + printf(_("(NEW) ")); + + line[0] = '\n'; + line[1] = 0; + + if (!sym_is_changable(sym)) { + printf("%s\n", def); + line[0] = '\n'; + line[1] = 0; + return 0; + } + + fflush(stdout); + xfgets(line, 128, stdin); + + switch (type) { + case S_INT: + case S_HEX: + case S_STRING: + printf("%s\n", def); + return 1; + default: + ; + } + printf("%s", line); + return 1; +} + +static struct property *get_symbol_prop(struct symbol *sym) +{ + struct property *prop = NULL; + + for_all_properties(sym, prop, P_SYMBOL) + break; + return prop; +} + +static int conf_sym(struct symbol *sym) +{ + tristate oldval, newval; + struct property *prop; + + while (1) { + if (sym->name) + printf("%s: ", sym->name); + for_all_prompts(sym, prop) + printf("%*s%s ", indent - 1, "", _(prop->text)); + putchar('['); + oldval = sym_get_tristate_value(sym); + switch (oldval) { + case no: + putchar('N'); + break; + case mod: + putchar('M'); + break; + case yes: + putchar('Y'); + break; + } + if (oldval != no && sym_tristate_within_range(sym, no)) + printf("/n"); + if (oldval != mod && sym_tristate_within_range(sym, mod)) + printf("/m"); + if (oldval != yes && sym_tristate_within_range(sym, yes)) + printf("/y"); + /* FIXME: I don't know how to get the help text from the sym */ + printf("] "); + if (!conf_askvalue(sym, sym_get_string_value(sym))) + return 0; + strip(line); + + switch (line[0]) { + case 'n': + case 'N': + newval = no; + if (!line[1] || !strcmp(&line[1], "o")) + break; + continue; + case 'm': + case 'M': + newval = mod; + if (!line[1]) + break; + continue; + case 'y': + case 'Y': + newval = yes; + if (!line[1] || !strcmp(&line[1], "es")) + break; + continue; + case 0: + newval = oldval; + break; + default: + continue; + } + if (sym_set_tristate_value(sym, newval)) { + /* FIXME: if I don't write it doesn't save */ + conf_write(NULL, 1); + return 1; + } + } +} + +static int enable_sym(struct symbol *sym) +{ + if (sym_get_tristate_value(sym) != no) + return 0; + + if (!sym->visible) { + if (!get_depends(sym)) + return 0; + printf("%s: has missing dependencies\n", sym->name); + } + + return conf_sym(sym); +} + +static void expr_parse(struct expr *e) +{ + if (!e) + return; + + switch (e->type) { + case E_EQUAL: + printf("set '%s' to '%s'\n", e->left.sym->name, e->right.sym->name); + break; + + case E_AND: + expr_parse(e->left.expr); + expr_parse(e->right.expr); + break; + + case E_SYMBOL: + enable_sym(e->left.sym); + break; + + case E_NOT: + case E_UNEQUAL: + case E_OR: + case E_LIST: + case E_RANGE: + default: + printf("HELP. Lot of unimplemented code. %d\n", e->type); + break; + } +} + +static int get_depends(struct symbol *sym) +{ + struct property *prop; + struct gstr res = str_new(); + + prop = get_symbol_prop(sym); + if (!prop) + return 0; + + expr_gstr_print(prop->visible.expr, &res); + printf("%s\n\n", str_get(&res)); + + expr_parse(prop->visible.expr); + + return 1; +} + +static void kconfig_search(void) +{ + char *search_str; + struct symbol **sym_arr; + struct gstr res; + + search_str = getenv("SEARCH"); + if (!search_str) + return; + + sym_arr = sym_re_search(search_str); + res = get_relations_str(sym_arr, NULL); + printf("%s", str_get(&res)); +} + +static void kconfig_set(void) +{ + struct symbol *sym; + char *config; + char *setting; + int res; + + config = getenv("CONFIG"); + if (!config) + return; + if (strncmp(config, "CONFIG_", 7) == 0) + config += 7; + + setting = strchr(config, '='); + if (setting) { + *setting = '\0'; + setting++; + } else { + setting = getenv("SETTING"); + if (setting && *setting == '\0') + setting = NULL; + } + + sym = sym_find(config); + if (!sym) { + printf("Error: '%s' not found.\n", config); + return; + } + + if (sym->curr.tri == str_to_tristate(setting)) { + printf("Already set: %s=%s\n", sym->name, setting); + return; + } + + if (!sym->visible) { + printf("\n%s: has missing dependencies\n", sym->name); + if (!get_depends(sym)) + return; + } + if (!sym->visible) { + printf("Error: unmet dependencies\n"); + return; + } + + if (!setting) { + conf_sym(sym); + } else if (!sym_set_string_value(sym, setting)) { + printf("Error: setting '%s=%s' failed.\n", sym->name, setting); + return; + } + + res = conf_write(NULL, 1); + if (res) { + printf("Error during writing of configuration.\n" + "Your configuration changes were NOT saved.\n"); + return; + } + + printf("set: %s=%s\n", config, sym_get_string_value(sym)); +} + +int main(int ac, char **av) +{ + char *mode; + + setlocale(LC_ALL, ""); + bindtextdomain(PACKAGE, LOCALEDIR); + textdomain(PACKAGE); + + if (ac > 1 && strcmp(av[1], "-s") == 0) { + /* Silence conf_read() until the real callback is set up */ + conf_set_message_callback(NULL); + av++; + } + conf_parse(av[1]); + conf_read(NULL); + + mode = getenv("NCONFIG_MODE"); + if (!mode) + return 1; + + if (strcmp(mode, "kconfig_search") == 0) { + kconfig_search(); + return 0; + } + if (strcmp(mode, "kconfig_set") == 0) { + kconfig_set(); + return 0; + } + + return 1; +} -- 2.11.0 From James.Bottomley at HansenPartnership.com Thu Jul 6 14:48:05 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Thu, 06 Jul 2017 07:48:05 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> References: <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> Message-ID: <1499352485.2765.14.camel@HansenPartnership.com> On Thu, 2017-07-06 at 10:28 +0100, Mark Brown wrote: > On Wed, Jul 05, 2017 at 01:02:00PM -0400, Steven Rostedt wrote: > > > > Guenter Roeck wrote: > > > > > > > > > If a test to reproduce a problem exists, it might be more > > > beneficial to suggest to the patch submitter that it would be > > > great if that test would be submitted as unit test instead of > > > shaming that person for not doing so. Acknowledging and > > > praising kselftest submissions might help more than shaming for > > > non-submissions. > > > > > > > > > My concern would be that once the shaming starts, it won't stop. > > > > > I think this is a communication issue. My word for "shaming" was to > > call out a developer for not submitting a test. It wasn't about > > making fun of them, or anything like that. I was only making a > > point about how to teach people that they need to be more aware of > > the testing infrastructure. Not about actually demeaning people. > > I think before anything like that is viable we need to show a > concerted and visible interest in actually running the tests we > already have and paying attention to the results - if people can see > that they're just checking a checkbox that will often result in low > quality tests which can do more harm than good. it depends what you mean by "we". ?I used to run a battery of tests over every SCSI commit. ?It was time consuming and slowed down the process, plus it was me who always got to diagnose failures. ?Nowadays I don't bother: I rely on 0day to run its usual tests plus a couple of extras I asked for it's a much more streamlined process (meaning less work for me) and everyone is happy. The corollary I take away from this is that the less intrusive the test infrastructure is (at least to my process) the happier I am. ?The 0day quantum leap for me was going from testing my tree and telling me of problems after I've added the patch to testing patches posted to the mailing list, which tells me of problems *before* the commit gets added to the tree. James -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From tytso at mit.edu Thu Jul 6 14:53:46 2017 From: tytso at mit.edu (Theodore Ts'o) Date: Thu, 6 Jul 2017 10:53:46 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> Message-ID: <20170706145346.6w2uzcf7xacbr3or@thunk.org> On Thu, Jul 06, 2017 at 11:41:39AM +0200, Daniel Vetter wrote: > > +1. That pretty much means large-scale CI. The i915 test suite has > suffered quite a bit over the past years because the CI infrastructure > didn't keep up. Result is that running full CI kills pretty much every > platform there is eventually, and it's really hard to get back to a > state where the testsuite can be used to catch regressions again. I assume the i915 test suite requires real hardware and can't be run on VM's; is that correct? - Ted From rostedt at goodmis.org Thu Jul 6 15:08:04 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Thu, 6 Jul 2017 11:08:04 -0400 Subject: [Ksummit-discuss] [PATCH 1/2] kconfig: add a silent option to conf_write() In-Reply-To: <20170706144116.kcvhyxezcpinhwq7@mwanda> References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144116.kcvhyxezcpinhwq7@mwanda> Message-ID: <20170706110804.44ca24b9@gandalf.local.home> On Thu, 6 Jul 2017 17:41:16 +0300 Dan Carpenter wrote: > The conf_write() function prints output "configuration written to .config" but > I don't want it to print anything so I have added an option for that. > > Signed-off-by: Dan Carpenter > --- I know you replied to the TECH TOPIC about kconfig, but did you really mean to send patches to the ksummit-discuss mailing list and not to any other mailing list (like LKML or linux-kbuild at vger.kernel.org)? -- Steve From torvalds at linux-foundation.org Thu Jul 6 16:41:36 2017 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Thu, 6 Jul 2017 09:41:36 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170706144028.46a2mt2mdzpt6ip7@mwanda> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: On Thu, Jul 6, 2017 at 7:40 AM, Dan Carpenter wrote: > People have mentioned "make oldconfig" but I've never had a lot of luck > with that. It always just prints "* Restart config..." and deletes my > config. Really? For me, "make oldconfig" is pretty much the only thing I ever use (apart from build testing). It's very convenient once you have a baseline, and want to just get the new questions for when the Kconfig files change. It's also how I notice when somebody adds a new config entry that doesn't default to 'n'. It's also very convenient when you end up changing your config: just edit the damn .config file directly, and then re-run "make oldconfig" just to make sure everything gets updated (and then you'll notice that you tried to disable some config entry, but it got re-enabled again because there was something else that depended on it and selected it ;) So I wonder why it wouldn't work for you. Now, admittedly, I literally only ever use two source files: the previous ".config" file, and if that is missing (after a "git clean -dqfx" or similar), just /etc/kernel-config. The oldconfig logic has fallbacks to other cases, but they are all useless imho. Also, I build in the source tree. Maybe you use a separate object tree and it gets that case wrong. Linus From rdunlap at infradead.org Thu Jul 6 17:11:26 2017 From: rdunlap at infradead.org (Randy Dunlap) Date: Thu, 6 Jul 2017 10:11:26 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: <966c1fce-6f2f-d158-d086-cf8e2eac97a9@infradead.org> On 07/06/2017 09:41 AM, Linus Torvalds wrote: > On Thu, Jul 6, 2017 at 7:40 AM, Dan Carpenter wrote: >> People have mentioned "make oldconfig" but I've never had a lot of luck >> with that. It always just prints "* Restart config..." and deletes my >> config. > > Really? > > For me, "make oldconfig" is pretty much the only thing I ever use > (apart from build testing). > > It's very convenient once you have a baseline, and want to just get > the new questions for when the Kconfig files change. It's also how I > notice when somebody adds a new config entry that doesn't default to > 'n'. > > It's also very convenient when you end up changing your config: just > edit the damn .config file directly, and then re-run "make oldconfig" > just to make sure everything gets updated (and then you'll notice that > you tried to disable some config entry, but it got re-enabled again > because there was something else that depended on it and selected it > ;) > > So I wonder why it wouldn't work for you. > > Now, admittedly, I literally only ever use two source files: the > previous ".config" file, and if that is missing (after a "git clean > -dqfx" or similar), just /etc/kernel-config. > > The oldconfig logic has fallbacks to other cases, but they are all useless imho. > > Also, I build in the source tree. Maybe you use a separate object tree > and it gets that case wrong. Nah, I use O=objdir all the time and oldconfig works for me. -- ~Randy From rostedt at goodmis.org Thu Jul 6 19:10:08 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Thu, 6 Jul 2017 15:10:08 -0400 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> References: <20170629195537.534445e7@gandalf.local.home> <20170629203224.6bf7f29a@gandalf.local.home> <20170629205218.5b9a7923@gandalf.local.home> <20170629211641.5aeb3af7@gandalf.local.home> <20170629212750.5c3542ee@gandalf.local.home> <20170629221245.489760b1@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> Message-ID: <20170706151008.24addd2b@gandalf.local.home> On Fri, 30 Jun 2017 18:37:59 +0000 Josef Bacik wrote: > [ I forgot to add Tom to the Cc list. Sending again. ] > > On Fri, 30 Jun 2017 14:29:56 -0400 > Steven Rostedt wrote: > > > On Fri, 30 Jun 2017 18:24:12 +0000 > > Josef Bacik wrote: > > > > > Yup I?ll start bugging people to submit talk proposals, starting with you! I?ll put up my proposal in the next day or two, I think Brendan has something he?s going to talk about. Thanks, > > > > I shouldn't have used the term "talk", as it really is all about > > discussions. In fact, if you need more than one slide, you have too > > many. > > > > That said, I could probably come up with a few things, starting with > > this trace event issue. But it will be pointless if Peter Zijlstra and > > Mathieu are not there. > > > > But having ideas about dynamic fields in tracepoints is always > > interesting. Not to mention talking about Tom Zanussi's latest > > histogram work. It may be pretty much completed, but I would like to > > discuss where we go from there. > > > > One last thing. I don't want to have too many responsibilities, as I'm > > on the LPC program committee and I need to make sure I have time to > > fulfill any action items I'm responsible for during the conference. > > > > Yeah plumbers is a weird venue for tracing, I always hope that we are > going to have people like Brendan or other sysadmin-y people show up > and say ?this is what sucks about tracing, please fix it?, and then > we can go fix it. It doesn?t really seem to happen that way tho, and > for things like tracing ABI there just aren?t the right people in the > room to have that kind of discussion. My proposal was just going to > be a laundry list of things that would make my life easier, but it > doesn?t really warrant a full micro-conference to listen to me bitch > for an hour. If it turns out nobody else has much to talk about then > we can just declare tracing is feature complete and we can talk about > something else ;). Thanks, > At this rate, I'm guessing that Tracing is not going to be on the Plumbers' agenda. -- Steve From laurent.pinchart at ideasonboard.com Thu Jul 6 21:19:46 2017 From: laurent.pinchart at ideasonboard.com (Laurent Pinchart) Date: Fri, 07 Jul 2017 00:19:46 +0300 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170706144028.46a2mt2mdzpt6ip7@mwanda> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: <1601331.LiGNiPeBdk@avalon> On Thursday 06 Jul 2017 17:40:29 Dan Carpenter wrote: > People have mentioned "make oldconfig" but I've never had a lot of luck > with that. It always just prints "* Restart config..." and deletes my > config. I like oldconfig as it makes it easy to find about new options when upgrading the kernel. However, there's one thing that bothers me. When jumping by more than one kernel version, the number of options can be quite high, in which case I sometimes make mistakes answering questions. I'd love it if Kconfig allowed me to go back and correct mistakes, instead of having to note the option down and modify it manually afterwards. > Also I hate menus. It's such a pain if you want to enable a feature and > you have to do a dungeon crawl through our menu system to try find it. > > I wrote a script a couple years ago to create kernel configs. I do a > make defconfig, then I take a distro config and I do: > > for i in $(grep =m old_config) ; do > ./scripts/kconfig/kconfig set $i > done > > This prints a lot of errors and the code is only half implemented but > it's honestly the easiest way for me to get a bootable kernel these > days. If someone wanted to the could add a "./scripts/kconfig/kconfig > file " command that would read a line at a time and call > `./scripts/kconfig/kconfig set $line` over and over. -- Regards, Laurent Pinchart From daniel.vetter at ffwll.ch Thu Jul 6 21:28:28 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Thu, 6 Jul 2017 23:28:28 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170706145346.6w2uzcf7xacbr3or@thunk.org> References: <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> <20170706145346.6w2uzcf7xacbr3or@thunk.org> Message-ID: On Thu, Jul 6, 2017 at 4:53 PM, Theodore Ts'o wrote: > On Thu, Jul 06, 2017 at 11:41:39AM +0200, Daniel Vetter wrote: >> +1. That pretty much means large-scale CI. The i915 test suite has >> suffered quite a bit over the past years because the CI infrastructure >> didn't keep up. Result is that running full CI kills pretty much every >> platform there is eventually, and it's really hard to get back to a >> state where the testsuite can be used to catch regressions again. > > I assume the i915 test suite requires real hardware and can't be run > on VM's; is that correct? Yes, that's another problem. If all bigger teams/subsystems would do what we'd do, but extended to all mailing lists, your patch series would get replies from a few hundred CI farms. Not sure that would scale ... And there's no way ever that one single entity will have hardware for everything. And if you only CI the mailing list for your own subsystem then every time a merge window happens your CI will be out of service (4.13 seems extremely bad, atm nothing survives an extended run on linux-next for us). -Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From shuahkh at osg.samsung.com Thu Jul 6 22:24:01 2017 From: shuahkh at osg.samsung.com (Shuah Khan) Date: Thu, 6 Jul 2017 16:24:01 -0600 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705153259.GA7265@kroah.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> Message-ID: On 07/05/2017 09:32 AM, Greg KH wrote: > On Wed, Jul 05, 2017 at 08:16:33AM -0700, Guenter Roeck wrote: >> If we start shaming people for not providing unit tests, all we'll accomplish is >> that people will stop providing bug fixes. > > Yes, this is the key! > > Steven, just look at everything marked with a "Fixes:" or "stable@" tag > from 4.12-rc1..4.12 and try to determine how you would write a test for > the majority of them. > > Yes, for some subsystems this can work (look at xfstests as one great > example for filesystems, same for the i915 tests), but for the majority > of the kernel, at this point in time, it doesn't make sense. > > So take Carlos's advice, start small, do it for your subsystem if you > don't touch hardware (easy peasy, right?), and let's see how it goes, > and see if we have the infrastructure to do it even today. Right now, > kselftests is finally getting a unified output format, which is great, > it shows that people are starting to use and rely on it. What else will > we need to make this more widely used, we don't know yet... > Over the past couple of years, kselftests have seen improvements to run on ARM in kernel ci rings. TAP13 will definitely make it easier to find run to run differences. There is the effort to use ksefltests to test stable releases (4.4 LTS for example), which will help make the tests fail/skip gracefully when a feature isn't enabled/supported. The work so far is two fold: - enable them to run in test rings. - making them easy to use As per test development, we are constantly adding tests and I see new tests getting added for sub-systems that aren't hardware dependent. You will see lots of activity in mm, timers, seccomp, net, sys-calls to name a few. I am going to be looking for TAP13 format compliance for new tests starting 4.13. I am not sure how popular they are among developers and sub-system maintainers though. Maybe this is one area we can try to improve usage. thanks, -- Shuah From rostedt at goodmis.org Thu Jul 6 22:32:49 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Thu, 6 Jul 2017 18:32:49 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> Message-ID: <20170706183249.60b2aef9@gandalf.local.home> On Thu, 6 Jul 2017 16:24:01 -0600 Shuah Khan wrote: > Over the past couple of years, kselftests have seen improvements to run > on ARM in kernel ci rings. TAP13 will definitely make it easier to find > run to run differences. There is the effort to use ksefltests to test > stable releases (4.4 LTS for example), which will help make the tests > fail/skip gracefully when a feature isn't enabled/supported. > > The work so far is two fold: > > - enable them to run in test rings. > - making them easy to use > > As per test development, we are constantly adding tests and I see new tests > getting added for sub-systems that aren't hardware dependent. You will see > lots of activity in mm, timers, seccomp, net, sys-calls to name a few. > > I am going to be looking for TAP13 format compliance for new tests starting > 4.13. > > I am not sure how popular they are among developers and sub-system maintainers > though. Maybe this is one area we can try to improve usage. Maybe this should be included in the MAINTAINERS SUMMIT as well. To consolidate the format of all the kselftests and have something that everyone (or most) developers agree on. -- Steve From shuahkh at osg.samsung.com Thu Jul 6 22:40:45 2017 From: shuahkh at osg.samsung.com (Shuah Khan) Date: Thu, 6 Jul 2017 16:40:45 -0600 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170706183249.60b2aef9@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705153259.GA7265@kroah.com> <20170706183249.60b2aef9@gandalf.local.home> Message-ID: <803733a4-491b-3303-5e22-a057d4eadd3d@osg.samsung.com> On 07/06/2017 04:32 PM, Steven Rostedt wrote: > On Thu, 6 Jul 2017 16:24:01 -0600 > Shuah Khan wrote: > > >> Over the past couple of years, kselftests have seen improvements to run >> on ARM in kernel ci rings. TAP13 will definitely make it easier to find >> run to run differences. There is the effort to use ksefltests to test >> stable releases (4.4 LTS for example), which will help make the tests >> fail/skip gracefully when a feature isn't enabled/supported. >> >> The work so far is two fold: >> >> - enable them to run in test rings. >> - making them easy to use >> >> As per test development, we are constantly adding tests and I see new tests >> getting added for sub-systems that aren't hardware dependent. You will see >> lots of activity in mm, timers, seccomp, net, sys-calls to name a few. >> >> I am going to be looking for TAP13 format compliance for new tests starting >> 4.13. >> >> I am not sure how popular they are among developers and sub-system maintainers >> though. Maybe this is one area we can try to improve usage. As a clarification, what I meant by "how popular they are among developers and sub-system maintainers" is that how often developers and sub-system maintainers run kselftests and are there any obstacles for running them. It would be good to get feedback on usage by us as in developers. > > Maybe this should be included in the MAINTAINERS SUMMIT as well. To > consolidate the format of all the kselftests and have something that > everyone (or most) developers agree on. thanks, -- Shuah From fengguang.wu at intel.com Fri Jul 7 03:33:02 2017 From: fengguang.wu at intel.com (Fengguang Wu) Date: Fri, 7 Jul 2017 11:33:02 +0800 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705112707.54d7f345@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> Message-ID: <20170707033302.rgpq5knzx3qvvr2p@wfg-t540p.sh.intel.com> On Wed, Jul 05, 2017 at 11:27:07AM -0400, Steven Rostedt wrote: [snip] >I need to be clearer on this. What I meant was, if there's a bug >where someone has a test that easily reproduces the bug, then if >there's not a test added to selftests for said bug, then we should >shame those into doing so. Besides shaming, there's one more option -- acknowledgement. When it's a test case or test tool that discovered the bug, we could acknowledge it by adding one line in the bug fixing patch. The exact forms can be discussed, but here are some examples to show the basic idea: Tool: lockdep Tool: ktest Tool: smatch Tool: trinity Tool: syzkaller Tool: xfstests/tests/ext4/025 Tool: scripts/coccinelle/locks/call_kern.cocci Tool: tools/testing/selftests/bpf/test_align.c Reports from test infrastructures like 0day could go further to help acknowledge the tool author or maintainer by showing such lines in its bug report email: You may consider adding these lines in the bug fixing patch: -----------------------[ cut here ]---------------------------------- Fixes: XXXXXXXXXX ("title of the buggy commit") Tool: tools/testing/selftests/bpf/test_align.c Reported-by: 0day test robot -----------------------[ cut here ]---------------------------------- Regards, Fengguang From frowand.list at gmail.com Fri Jul 7 04:52:25 2017 From: frowand.list at gmail.com (Frank Rowand) Date: Thu, 6 Jul 2017 21:52:25 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170707033302.rgpq5knzx3qvvr2p@wfg-t540p.sh.intel.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170707033302.rgpq5knzx3qvvr2p@wfg-t540p.sh.intel.com> Message-ID: <595F1389.60308@gmail.com> On 07/06/17 20:33, Fengguang Wu wrote: > On Wed, Jul 05, 2017 at 11:27:07AM -0400, Steven Rostedt wrote: > [snip] >> I need to be clearer on this. What I meant was, if there's a bug >> where someone has a test that easily reproduces the bug, then if >> there's not a test added to selftests for said bug, then we should >> shame those into doing so. > > Besides shaming, there's one more option -- acknowledgement. > > When it's a test case or test tool that discovered the bug, we could > acknowledge it by adding one line in the bug fixing patch. The exact > forms can be discussed, but here are some examples to show the basic > idea: > > Tool: lockdep > Tool: ktest > Tool: smatch > Tool: trinity > Tool: syzkaller > Tool: xfstests/tests/ext4/025 > Tool: scripts/coccinelle/locks/call_kern.cocci > Tool: tools/testing/selftests/bpf/test_align.c > > Reports from test infrastructures like 0day could go further to help > acknowledge the tool author or maintainer by showing such lines in its > bug report email: > > You may consider adding these lines in the bug fixing patch: > > -----------------------[ cut here ]---------------------------------- > Fixes: XXXXXXXXXX ("title of the buggy commit") > Tool: tools/testing/selftests/bpf/test_align.c > Reported-by: 0day test robot > -----------------------[ cut here ]---------------------------------- > > Regards, > Fengguang > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss at lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss > That is a great idea! If a tool is shown to be catching a large number of bugs then I am more likely to add it to my test process. -Frank From avagin at gmail.com Fri Jul 7 06:15:20 2017 From: avagin at gmail.com (Andrei Vagin) Date: Thu, 6 Jul 2017 23:15:20 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> Message-ID: <20170707061519.GA25786@gmail.com> Here I want to share our experience of testing linux-next and other trees. In CRIU we have a lot of tests for all sort of user-visible primitives. Our goal is to catch changes which breaks CRIU before they will be pushed to the Linus tree. https://criu.org/linux-next We run our test suite once a day for linux-next and a dozen of other trees. About a year ago we used DO to get a virtual machine to run tests, but now we use travis-ci. Here is an example of a daily report: https://travis-ci.org/avagin/criu/builds/250632728 What are benefits of this approach? * It is free. * Everyone can run these tests for any kernel and he/she doesn't need to spend hours to understand how to do that. * You don't need to have a hardware to run tests * You can do this periodically or for each patch or patchset For example, If we want to run CRIU tests for a kernel, we need to apply this patch to it: https://github.com/avagin/linux/commit/2f34796b04cead83fa85cf92cf694ac4369ca970 and push its code to github, then travis-ci will run test for this kernel: https://travis-ci.org/avagin/linux/builds/250895561 Here is a detailed article which describes how we start a new kernel in travis-ci: https://avagin.github.io/travis-kexec-criu The main idea, what I want to say, is that developers will use tests, only if they will be able to execute them with minimal forces. In ideal case, someone else has to run tests for them. In CRIU, we run our tests for each patchset and a patchset can be accepted only if it passed all test: https://patchwork.criu.org/project/criu/series/?ordering=-last_updated I know that the first problem is to write tests, but the next step is to setup CI to run these tests for all changes and I think we can start thinking about this problem too. On Sun, Jul 02, 2017 at 07:51:43PM +0200, Thorsten Leemhuis wrote: > Hi! Sorry, I know I'm late -- real life (travel, day job, ...) kept me > away from spending time on Linux kernel regression work :-/ > > Maybe I'm taking it a bit to far for the new kid in town, but I think I > want to propose two sessions. One for the maintainer summit, that deals > with a the most critical issues relevant to regression tracking. And one > technical session to deal with all the other stuff. Obviously we can > move below mentioned topics from one to the other or talk about them at > both if we want. > > = [MAINTAINERS SUMMIT] Improve regression tracking = > > * Follow up from last year: What to do about bugzilla.kernel.org? > Reporters still get stranded there. > * How to get subsystems maintainer involved more in regression tracking > to better make sure that reported regressions are tracked and not > forgotten accidentally. > * Frustrations with regression tracking aka. how to establish > regression tracking properly to make sure it will never go away again. > > = [TECH TOPIC] Improve the kernels quality by getting more people > involved in regression testing and reporting = > > * A short report from the outcome of the maintainer summit discussion; > also pick up and topics here that where not properly discussed on the > maintainer summit or were postponed to this session. > * How to get distros more involved in regression tracking; especially > those that have a technical aware user base or normally ship up2date > kernel images (and thus have an greater interest in avoiding > regressions). I'm mainly thinking about Arch Linux, Debian, Fedora, and > openSUSE Tumbleweed here; having Ubuntu in the boat would be good, too! > (might be wise to talk about this on the maintainers summit as well, if > the right people are there) > * How to make it more easy to (ideally automatically!) track the > current status and the progress of each regression? Are there any tools > that could make regression tracking easier for all of us while not > introducing much overhead for maintainers? > > = Details = > > Below you'll find few more words about some points mentioned above; > there are a few other topics as well we could discuss if we want. But > first, a few general words on regression tracking from my point of view: > > * There are a lot of areas in regression tracking where things are far > from good (read: in a bad state). That makes it easy to discuss current > problems and their solutions for hours -- and at the same time forget > that discussing itself doesn't get us much forward (the old bugzilla > issue mentioned in this mail is a good example). We thus IMHO should > focus on the most important issues and lay the groundwork to establish > regression tracking properly again, then we move on to solve things that > are harder to solve. > > * Regression tracking currently is quite boring and exhausting (read: > high burn-out risk), as it involves quite a lot of manual work finding > regressions and keeping track of their progress (and at the end of the > day it does not feel like you achieved much). Some of that work can not > be automated. But quite a bit can and that would help a great deal to > establish regression tracking properly (currently I'm the only one doing > it and some development cycles I simply don't find spare time for it). > > I currently don't see any existing solutions that fit well with our > mail focused workflow and at the same time do not introduce much > overhead for subsystem maintainers (which I assume is what everyone > wants, as I fear solutions with much overhead won't fly at all). Ideas > how to solve this tricky problem area are highly welcomed. It's > something that can be discussed when the aforementioned points > "establish regression tracking properly" and "make it more easy to > manually or automatically track the current status of a regression" come up. > > == What to do about bugzilla.kernel.org = > > Discussed last year already; see https://lwn.net/Articles/705245/ for > details. Situation didn't change much since then: the bugzilla instance > was updated, but people still get stranded there as most subsystems > ignore it. That afaics frustrates people and makes them stop testing or > reporting bugs. > > Discuss how to improve things. [my2cent] Maybe a short term solution > like this could work: Serve a static page on bugzilla.kernel.org that > tells people where regressions/bugs for certain subsystems can be > reported, as it most of the time is some mailing list anyway. Such a > page could get compiled from MAINTAINERS (there is the "B:" field now > that points to bugzilla; if its not there point to a mailing lists; also > explain get_maintainers.pl). > > Leave our bugzilla reachable via bugzilla.kernel.org/frontpage (or > something like that) for those few subsystems that use it; that's afaics > ACPI and PM (including Cpufreq, Cpuidle, Hibernation, Suspend, ...) and > maybe PCI (not sure) -- or should we tell them to move to > bugzilla.freedesktop.org (or somewhere else) to get rid of our bugzilla > in the long etrm and make Konstantins life easier? Anyway: Make sure > bugs for other subsystems can't get filed in bugzilla.kernel.org anymore > to make sure they get lost there. [/my2cent] > > == How to get subsystems maintainer more involved in regression tracking > to [?] == > > One reasons why I put this up is: It would help me a lot if people let > regressions at leemhuis.info (side note: might be wise to make a > mailing-list that replaces this address) get told about regressions -- > simply CCing it on reports or answers to regressions reports is enough; > forwarding/bouncing mails there (even without additional text) is fine, > too. > > The other reason I included it: This came up in last years discussion on > this list and it seemed some people thought we can get the subsystems > maintainers more involved; so I thought it might be wise to discuss it. > Might also be a good idea to discuss here how to get distro kernel > maintainer more involved if enough are around. > > == How to establish regression tracking properly [?] == > > This is a pretty vague topic on purpose. People seem to agree that > regression tracking is important, but for years nobody did it (it > stopped a little while after Rafael had to move on) and the little bit > that I can do in my rare spare time won't help much (and I have no idea > how long I can continue to find time for it). > > == Make it easier to track the progress of regression == > > One of the main reasons that makes regression tracking hard currently: > getting aware or regressions and tracking their progress is a lot of > manual work. I plan one step that hopefully makes the job a little > easier and at the same time might allow some automation in the long > term: ask people to include a certain keyword in their regressions > reports. Maybe something like "Linux-Regression" that doesn't get too > much false positives when searching for it on lists and via Google > (suggestions for a better tag welcome). > > In addition, I plan to hand out some form of ID for each regressions I > track and ask people to include it -- especially when they post patches > that fix said regression or move the discussion to a new place (like > "Corrects: Linux-Regression-d2afd"; again: suggestions welcome! Maybe I > should just use a URL where people find details?). > > That way I can notice more easy when a fix for a regression hits > linux-next or master; I also get aware if a discussion moves from > bugzilla to LKML or from one thread to another (fingers crossed). > Obviously it depends on cooperation of those involved. > > If this works out we could write a script or something that watches > mailing lists, bug trackers and git trees for the tag in question. That > script could file a database and automatically do some of the tracking job. > > == get distros more involved == > > I assume at least Ben (Debian), Laura (Fedora), and Takashi (openSUSE) > are around, so it might be a good idea to sit together and talk > regression tracking in general and how we could get the distros kernel > maintainers more involved. Even better would be to sit down before to > maybe come up with some ideas/plans we could talk during this session. > > One topic could be: How to make it easier for users of popular distros > to get involved in testing. The "Kernel of the day" (KOTD) from > SUSE/openSUSE was mentioned recently on this list already, but I got the > impression that the existence of this repo is not well known; guess it's > the same for my own Kernel Vanilla Repositories for Fedora (those > contain packages with a quite recent mainline version; see > https://fedoraproject.org/wiki/Kernel_Vanilla_Repositories ) or the fact > that Fedora rawhide ships a recent mainline snapshot all the time. But > should distros also offer Linux-next somewhere? Or anything else? And > should the distros send experienced users upstream when they found a > regression? Or will subsystem maintainers send those users away because > they assume those kernels are not vanilla? > > > == Topics or vague ideas I left out on purpose == > > Here is a list of other things we could talk about, but I think better > left for a later time: > > * Kerneloops (http://oops.kernel.org/): It was discussed last year on > this list. I have no idea what the current status is. Is someone > watching & analysing it? And poking the right people when needed? (I > doubt it) > > * Regression tracking for stable kernels (many bugs only get noticed > once a new mainline version got released; at that time it might still be > easy to revert a certain patch in mainline and stable) > > * statistics: I didn't spend time to create statistics, like Rafael did > in the past. They'd be nice to have, but for now I think my time is > better spend elsewhere. > > * work towards growing the number of tester by making it easier for > them (better documentation, easier configuration, bisection scripts, ...) > > * maybe document a few some procedures for those that are not regular > kernel developers (like the "When users report bugs on the Fedora > tracker that look like actual upstream bugs, what's the best way to have > those reported?" thing that Laura mentioned earlier this month in the > mail "Bug reporting feedback loop" > > * provide better services than only a plain text list of regression on > a mailing list? > > * better documentation? for example explain the difference between bugs > and regressions somewhere to make people understand why their bugs might > get ignored, but as the same time know that we handle regressions more > seriously. > > * Should the regression tracker nag subsystem maintainers (and > reporters) more often if they are inactive? How do people for example > feel about (Semi-)Automatic nagging mails for regressions where there is > no progress? > > * Is the data and the format of the current reports show useful at all? > If not: How to improve it? > > * regression tracking is a fair amount of work, and it's frustrating, > and people burn out. How to avoid that? Can we maybe get regression > tracking on solid ground by somehow building a healthy community around > it (containing kernel developers, Distro maintainers and people that are > willing to help in their spare time) that work on regressions > testing/tracking and other QA stuff? > > * how to make the Linux kernel development so good that the mainstream > distros stop their kernel forks and do what they do with Firefox: Ship > the latest stable version (users get a new version with new features > every few weeks) or a longterm branch (makes a big version jump about > once a year; see Firefox ESR). > > Ugh, pretty long mail. Sorry about that. Maybe I shouldn't have looked > so closely into LWN.net articles about regression tracking and older > discussions about it. > > Ciao, Thorsten > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss at lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss From dan.carpenter at oracle.com Fri Jul 7 09:02:06 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Fri, 7 Jul 2017 12:02:06 +0300 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> Message-ID: <20170707090206.uiry6j7yizpl7yw4@mwanda> On Fri, Jul 07, 2017 at 07:55:27AM +0200, Krzysztof Kozlowski wrote: > On Thu, Jul 6, 2017 at 4:42 PM, Dan Carpenter wrote: > > This tool barely works, it's just a rough draft. > > > > Sometimes I want to search for a config so I have to load menuconfig, > > then search for the config entry, then exit. With this script I > > simply run: > > > > ./scripts/kconfig/kconfig search COMEDI > > > > Quite often I find myself trying to enable a feature by doing this: > > > > echo CONFIG_FEATURE=y >> .config > > > > But when I try to boot the new kernel, I find that the feature isn't > > there because the kernel runs `make oldconfig` and I didn't have all > > the depends selected so it silently removed it. With this feature > > what you can do is: > > > > ./scripts/kconfig/kconfig set FEATURE=y > > Sounds useful. I need to enable few options from scripts and if > dependencies change they could be silently skipped. > > Probably it would be nice to print what was effectively enabled to get > your feature in. However why not extending existing scripts/config? It > already has the feature for setting kconfig options (without looking > at dependencies - so like >> of yours). > I didn't know about scripts/config when I wrote it. scripts/config is essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally useless. regards, dan carpenter From broonie at kernel.org Fri Jul 7 10:03:40 2017 From: broonie at kernel.org (Mark Brown) Date: Fri, 7 Jul 2017 11:03:40 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <1499352485.2765.14.camel@HansenPartnership.com> References: <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <20170706092836.ifcnc2qqwufndhdl@sirena.org.uk> <1499352485.2765.14.camel@HansenPartnership.com> Message-ID: <20170707100340.kgks5aykbnwtc6om@sirena.org.uk> On Thu, Jul 06, 2017 at 07:48:05AM -0700, James Bottomley wrote: > On Thu, 2017-07-06 at 10:28 +0100, Mark Brown wrote: > > I think before anything like that is viable we need to show a > > concerted and visible interest in actually running the tests we > > already have and paying attention to the results - if people can see > > that they're just checking a checkbox that will often result in low > > quality tests which can do more harm than good. > it depends what you mean by "we". ?I used to run a battery of tests > over every SCSI commit. ?It was time consuming and slowed down the We as a community, I think something viable needs to be central services like kernelci that's automated and allows multiple people to be involved with the analysis. Hand running tests at scale just doesn't. > The corollary I take away from this is that the less intrusive the test > infrastructure is (at least to my process) the happier I am. ?The 0day > quantum leap for me was going from testing my tree and telling me of > problems after I've added the patch to testing patches posted to the > mailing list, which tells me of problems *before* the commit gets added > to the tree. I think we'd get a long way just by looking at what's ending up in -next - it's not as good as detecting things before they go in but it's workable if people keep on top of things. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From dan.carpenter at oracle.com Fri Jul 7 11:36:51 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Fri, 7 Jul 2017 14:36:51 +0300 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> Message-ID: <20170707113650.ee6oys5u4vq5hgdi@mwanda> On Thu, Jul 06, 2017 at 09:41:36AM -0700, Linus Torvalds wrote: > On Thu, Jul 6, 2017 at 7:40 AM, Dan Carpenter wrote: > > People have mentioned "make oldconfig" but I've never had a lot of luck > > with that. It always just prints "* Restart config..." and deletes my > > config. > > Really? > Argh. You're right. I'm an idiot. It's actually working fine, but it asked so many questions I thought it was broken. regards, dan carpenter From krzk at kernel.org Fri Jul 7 05:55:27 2017 From: krzk at kernel.org (Krzysztof Kozlowski) Date: Fri, 7 Jul 2017 07:55:27 +0200 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: <20170706144208.6hlgxwo37gntk6qm@mwanda> References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> Message-ID: On Thu, Jul 6, 2017 at 4:42 PM, Dan Carpenter wrote: > This tool barely works, it's just a rough draft. > > Sometimes I want to search for a config so I have to load menuconfig, > then search for the config entry, then exit. With this script I > simply run: > > ./scripts/kconfig/kconfig search COMEDI > > Quite often I find myself trying to enable a feature by doing this: > > echo CONFIG_FEATURE=y >> .config > > But when I try to boot the new kernel, I find that the feature isn't > there because the kernel runs `make oldconfig` and I didn't have all > the depends selected so it silently removed it. With this feature > what you can do is: > > ./scripts/kconfig/kconfig set FEATURE=y Sounds useful. I need to enable few options from scripts and if dependencies change they could be silently skipped. Probably it would be nice to print what was effectively enabled to get your feature in. However why not extending existing scripts/config? It already has the feature for setting kconfig options (without looking at dependencies - so like >> of yours). Best regards, Krzysztof > > It helps you enable the dependencies or it at least prints an error > if it can't enable the feature. > > But this code isn't all implemented. 1) It doesn't calculate the > dependencies well. See expr_parse() for more details. 2) It > doesn't work well for things like: > > ./scripts/kconfig/kconfig set BT_INTEL=m > > because those aren't visible, they can only be using depend > statements. Or say you try to set FEATURE=m when something else > depends on it be set =y then the error message is wrong. The > other problem is that I don't know how to print the help text. > Again, this is just a rough draft. > > Signed-off-by: Dan Carpenter > --- > scripts/kconfig/Makefile | 6 +- > scripts/kconfig/kconfig | 33 +++++ > scripts/kconfig/lconf.c | 332 +++++++++++++++++++++++++++++++++++++++++++++++ > 3 files changed, 370 insertions(+), 1 deletion(-) > create mode 100755 scripts/kconfig/kconfig > create mode 100644 scripts/kconfig/lconf.c > From linus.walleij at linaro.org Sun Jul 9 03:56:47 2017 From: linus.walleij at linaro.org (Linus Walleij) Date: Sun, 9 Jul 2017 05:56:47 +0200 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: <20170707090206.uiry6j7yizpl7yw4@mwanda> References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> <20170707090206.uiry6j7yizpl7yw4@mwanda> Message-ID: On Fri, Jul 7, 2017 at 11:02 AM, Dan Carpenter wrote: > Krzysztof Kozlowski wrote: >> However why not extending existing scripts/config? It >> already has the feature for setting kconfig options (without looking >> at dependencies - so like >> of yours). > > I didn't know about scripts/config when I wrote it. scripts/config is > essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally > useless. Maybe useless for you but i use it every day in my work. To compile a kernel for a purpose I have a custom makefile.mak in my top kernel dir that calls scripts/config to set stuff up on-the-fly with multiple rules like this: config-base: FORCE @mkdir -p $(build_dir) @cp $(rootfs) $(build_dir)/$(rootfsbase) $(MAKE) $(make_options) u8500_defconfig config-initramfs: have-rootfs config-base # Configure in the initramfs $(CURDIR)/scripts/config --file $(config_file) \ --enable BLK_DEV_INITRD \ --set-str INITRAMFS_SOURCE $(rootfsbase) \ --enable RD_GZIP \ --enable INITRAMFS_COMPRESSION_GZIP (....) config: config-common config-distro config-initramfs $(CURDIR)/scripts/config --file $(config_file) \ --enable USE_OF \ --enable ARM_APPENDED_DTB \ --enable ARM_ATAG_DTB_COMPAT \ --enable PROC_DEVICETREE yes "" | make $(make_options) oldconfig For the full Makefile see: https://dflund.se/~triad/krad/makefiles/ux500.mak There are several of these, like some that create a minimal i586 system with busybox on an initramfs: https://dflund.se/~triad/krad/makefiles/i586.mak I don't know if I am stupid in using this rather than config fragments, but it works for me. That said, what you have brewing looks better :) Yours, Linus Walleij From geert at linux-m68k.org Sun Jul 9 08:31:59 2017 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Sun, 9 Jul 2017 10:31:59 +0200 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> <20170707090206.uiry6j7yizpl7yw4@mwanda> Message-ID: On Sun, Jul 9, 2017 at 5:56 AM, Linus Walleij wrote: >> Krzysztof Kozlowski wrote: >>> However why not extending existing scripts/config? It >>> already has the feature for setting kconfig options (without looking >>> at dependencies - so like >> of yours). >> >> I didn't know about scripts/config when I wrote it. scripts/config is >> essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally >> useless. > > Maybe useless for you but i use it every day in my work. To compile a kernel I assume the script has it uses. But to scratch Dan's itch (and mine, for generating .config from DTS), which is the non-trivial case, it may not work. So I'll definitely give Dan's script a try, thanks! > yes "" | make $(make_options) oldconfig That will become an infinite loop if "y" is not a valid answer for the newly introduced option (e.g. if it needs a number)? Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From linux at leemhuis.info Sun Jul 9 13:46:50 2017 From: linux at leemhuis.info (Thorsten Leemhuis) Date: Sun, 9 Jul 2017 15:46:50 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705103335.0cbd9984@gandalf.local.home> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705103335.0cbd9984@gandalf.local.home> Message-ID: <8e49d1f3-2216-ca77-ac06-d62c08c18aea@leemhuis.info> On 05.07.2017 16:33, Steven Rostedt wrote: > On Wed, 5 Jul 2017 16:06:07 +0200 > Greg KH wrote: > [...] >> I don't mean to poo-poo the idea, but please realize that around 75% of >> the kernel is hardware/arch support, so that means that 75% of the >> changes/fixes deal with hardware things (yes, change is in direct >> correlation to size of the codebase in the tree, strange but true). > I would say that if it's for a specific hardware, then it's really up > to the maintainer if there should be a test or not. As a lot of these > is just to deal with some quirk or non standard that the hardware does. > But are these regressions, or just some feature that's been broken on > that hardware since its conception? > > That is, Thorsten this is more for you, how much real regressions are in > hardware? [...] >From this and other mails in this thread I got the impression some more data would be helpful -- for example a few percentage numbers on * how many of the regressions are in hardware-specific/driver code * how many regressions suddenly pop up due to a unrelated (and maybe even correct) change * for how many regressions does it make sense to write a selftest to catch similar issues beforehand in the future. I'll try to gather some of those numbers when doing regression tracking for 4.13 (sorry again that I had to skip 4.12), so be prepare yourself for a mail when you include a "Fixes:" tag in a commit ;-) Then there is some data to talk about on the summit or continue the discussion on this mailing list or LKML. BTW, Steven, you in this thread wrote "discuss if we want to consolidate the format of all the kselftests and have something that everyone (or most) developers agree on". I put that in my notes and try to make sure we do not forget about this. Or is this something you'll drive forward yourself? Ciao, Thorsten P.S.: Sorry, I'm a bit late with my reply here. My real job (which is not really about kernel work) and some other things required my attention in the past few days... From rdunlap at infradead.org Sun Jul 9 17:03:03 2017 From: rdunlap at infradead.org (Randy Dunlap) Date: Sun, 9 Jul 2017 10:03:03 -0700 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> <20170707090206.uiry6j7yizpl7yw4@mwanda> Message-ID: <404833db-da51-e348-060e-c3b4f6a27e0d@infradead.org> On 07/09/2017 01:31 AM, Geert Uytterhoeven wrote: > On Sun, Jul 9, 2017 at 5:56 AM, Linus Walleij wrote: >>> Krzysztof Kozlowski wrote: >>>> However why not extending existing scripts/config? It >>>> already has the feature for setting kconfig options (without looking >>>> at dependencies - so like >> of yours). >>> >>> I didn't know about scripts/config when I wrote it. scripts/config is >>> essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally >>> useless. >> >> Maybe useless for you but i use it every day in my work. To compile a kernel > > I assume the script has it uses. > But to scratch Dan's itch (and mine, for generating .config from DTS), which > is the non-trivial case, it may not work. > So I'll definitely give Dan's script a try, thanks! > >> yes "" | make $(make_options) oldconfig > > That will become an infinite loop if "y" is not a valid answer for the newly > introduced option (e.g. if it needs a number)? yes "" just answers with a null string, not 'y'. -- ~Randy From frowand.list at gmail.com Sun Jul 9 17:32:22 2017 From: frowand.list at gmail.com (Frank Rowand) Date: Sun, 9 Jul 2017 10:32:22 -0700 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> <20170707090206.uiry6j7yizpl7yw4@mwanda> Message-ID: <596268A6.3080007@gmail.com> On 07/09/17 01:31, Geert Uytterhoeven wrote: > On Sun, Jul 9, 2017 at 5:56 AM, Linus Walleij wrote: >>> Krzysztof Kozlowski wrote: >>>> However why not extending existing scripts/config? It >>>> already has the feature for setting kconfig options (without looking >>>> at dependencies - so like >> of yours). >>> >>> I didn't know about scripts/config when I wrote it. scripts/config is >>> essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally >>> useless. >> >> Maybe useless for you but i use it every day in my work. To compile a kernel > > I assume the script has it uses. > But to scratch Dan's itch (and mine, for generating .config from DTS), which > is the non-trivial case, it may not work. Hi Geert, An aid, though not a full solution, is scripts/dtc/dt_to_config. Though if I remember correctly, you are already familiar with that. For anyone who wants more information on the complexities of using dt_to_config, how to use it, and why it has difficulties providing a precise configuration automatically, see: http://elinux.org/Device_Tree_presentations_papers_articles#linux_kernel_configuration slides 33 - 80. -Frank > So I'll definitely give Dan's script a try, thanks! > >> yes "" | make $(make_options) oldconfig > > That will become an infinite loop if "y" is not a valid answer for the newly > introduced option (e.g. if it needs a number)? > > Gr{oetje,eeting}s, > > Geert > > -- > Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org > > In personal conversations with technical people, I call myself a hacker. But > when I'm talking to journalists I just say "programmer" or something like that. > -- Linus Torvalds > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss at lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss > From geert at linux-m68k.org Sun Jul 9 19:43:20 2017 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Sun, 9 Jul 2017 21:43:20 +0200 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: <404833db-da51-e348-060e-c3b4f6a27e0d@infradead.org> References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> <20170707090206.uiry6j7yizpl7yw4@mwanda> <404833db-da51-e348-060e-c3b4f6a27e0d@infradead.org> Message-ID: On Sun, Jul 9, 2017 at 7:03 PM, Randy Dunlap wrote: > On 07/09/2017 01:31 AM, Geert Uytterhoeven wrote: >> On Sun, Jul 9, 2017 at 5:56 AM, Linus Walleij wrote: >>>> Krzysztof Kozlowski wrote: >>>>> However why not extending existing scripts/config? It >>>>> already has the feature for setting kconfig options (without looking >>>>> at dependencies - so like >> of yours). >>>> >>>> I didn't know about scripts/config when I wrote it. scripts/config is >>>> essentially a UI around "echo CONFIG_FOO=m >> .config". It's totally >>>> useless. >>> >>> Maybe useless for you but i use it every day in my work. To compile a kernel >> >> I assume the script has it uses. >> But to scratch Dan's itch (and mine, for generating .config from DTS), which >> is the non-trivial case, it may not work. >> So I'll definitely give Dan's script a try, thanks! >> >>> yes "" | make $(make_options) oldconfig >> >> That will become an infinite loop if "y" is not a valid answer for the newly >> introduced option (e.g. if it needs a number)? > > yes "" > just answers with a null string, not 'y'. Oops, that's correct. /me on a lazy Sunday afternoon... So the difficult part is "yes y" or "yes n". Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From geert at linux-m68k.org Mon Jul 10 09:44:22 2017 From: geert at linux-m68k.org (Geert Uytterhoeven) Date: Mon, 10 Jul 2017 11:44:22 +0200 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: <20170706144208.6hlgxwo37gntk6qm@mwanda> References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> Message-ID: Hi Dan, On Thu, Jul 6, 2017 at 4:42 PM, Dan Carpenter wrote: > This tool barely works, it's just a rough draft. > > Sometimes I want to search for a config so I have to load menuconfig, > then search for the config entry, then exit. With this script I > simply run: > > ./scripts/kconfig/kconfig search COMEDI > > Quite often I find myself trying to enable a feature by doing this: > > echo CONFIG_FEATURE=y >> .config > > But when I try to boot the new kernel, I find that the feature isn't > there because the kernel runs `make oldconfig` and I didn't have all > the depends selected so it silently removed it. With this feature > what you can do is: > > ./scripts/kconfig/kconfig set FEATURE=y > > It helps you enable the dependencies or it at least prints an error > if it can't enable the feature. > > But this code isn't all implemented. 1) It doesn't calculate the > dependencies well. See expr_parse() for more details. 2) It > doesn't work well for things like: > > ./scripts/kconfig/kconfig set BT_INTEL=m > > because those aren't visible, they can only be using depend > statements. Or say you try to set FEATURE=m when something else > depends on it be set =y then the error message is wrong. The > other problem is that I don't know how to print the help text. > Again, this is just a rough draft. > > Signed-off-by: Dan Carpenter Thanks! With the small fixes below, it worked fine for all cases I tried it with. > --- /dev/null > +++ b/scripts/kconfig/lconf.c > @@ -0,0 +1,332 @@ > +/* > + * Copyright (C) 2015 Oracle > + * Released under the terms of the GNU GPL v2.0. > + * > + */ > +#define _GNU_SOURCE scripts/kconfig/lconf.c:6:0: warning: "_GNU_SOURCE" redefined #define _GNU_SOURCE ^ :0:0: note: this is the location of the previous definition You can do: #ifndef _GNU_SOURCE #define _GNU_SOURCE #endif like scripts/kconfig/nconf.c does. > +static int conf_sym(struct symbol *sym) > +{ > + if (sym_set_tristate_value(sym, newval)) { > + /* FIXME: if I don't write it doesn't save */ > + conf_write(NULL, 1); scripts/kconfig/lconf.c: In function ?conf_sym?: scripts/kconfig/lconf.c:159:4: error: too many arguments to function ?conf_write? conf_write(NULL, 1); ^ In file included from scripts/kconfig/lkc.h:24:0, from scripts/kconfig/lconf.c:10: scripts/kconfig/lkc_proto.h:8:5: note: declared here It seems it never took 2 parameters in upstream? Dropping the "1" works. > +static void kconfig_set(void) > +{ > + res = conf_write(NULL, 1); Likewise For search, it doesn't work with the CONFIG_ prefix: $ path-to-source-tree/scripts/kconfig/kconfig search CONFIG_IPMMU_VMSA GEN ./Makefile No matches found. $ path-to-source-tree/scripts/kconfig/kconfig search IPMMU_VMSA GEN ./Makefile Symbol: IPMMU_VMSA [=n] Type : boolean Prompt: Renesas VMSA-compatible IPMMU Location: -> Device Drivers -> IOMMU Hardware Support (IOMMU_SUPPORT [=n]) Defined at drivers/iommu/Kconfig:275 Depends on: IOMMU_SUPPORT [=n] && (ARM [=y] || IOMMU_DMA [=n]) && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) Selects: IOMMU_API [=n] && IOMMU_IO_PGTABLE_LPAE [=n] && ARM_DMA_USE_IOMMU [=n] For set, it works with or without the CONFIG_ prefix: $ path-to-source-tree/scripts/kconfig/kconfig set CONFIG_IPMMU_VMSA=y GEN ./Makefile IPMMU_VMSA: has missing dependencies IOMMU_SUPPORT [=n] && (ARM [=y] || IOMMU_DMA [=n]) && (ARCH_RENESAS [=y] || COMPILE_TEST [=n]) IOMMU_SUPPORT: IOMMU Hardware Support [N/y] y y # # configuration written to .config # HELP. Lot of unimplemented code. 1 HELP. Lot of unimplemented code. 1 # # configuration written to .config # set: IPMMU_VMSA=y $ diff .config{.orig,} --- .config.orig 2017-07-10 11:34:13.181395059 +0200 +++ .config 2017-07-10 11:34:23.297370970 +0200 @@ -4,6 +4,9 @@ # CONFIG_ARM=y CONFIG_ARM_HAS_SG_CHAIN=y +CONFIG_NEED_SG_DMA_LENGTH=y +CONFIG_ARM_DMA_USE_IOMMU=y +CONFIG_ARM_DMA_IOMMU_ALIGNMENT=8 CONFIG_MIGHT_HAVE_PCI=y CONFIG_SYS_SUPPORTS_APM_EMULATION=y CONFIG_HAVE_PROC_CPU=y @@ -3452,6 +3455,7 @@ CONFIG_SYNC_FILE=y # CONFIG_SW_SYNC is not set # CONFIG_AUXDISPLAY is not set # CONFIG_UIO is not set +# CONFIG_VFIO is not set # CONFIG_VIRT_DRIVERS is not set # @@ -3634,7 +3638,19 @@ CONFIG_RENESAS_OSTM=y CONFIG_SH_TIMER_TMU=y CONFIG_EM_TIMER_STI=y # CONFIG_MAILBOX is not set -# CONFIG_IOMMU_SUPPORT is not set +CONFIG_IOMMU_API=y +CONFIG_IOMMU_SUPPORT=y + +# +# Generic IOMMU Pagetable Support +# +CONFIG_IOMMU_IO_PGTABLE=y +CONFIG_IOMMU_IO_PGTABLE_LPAE=y +# CONFIG_IOMMU_IO_PGTABLE_LPAE_SELFTEST is not set +# CONFIG_IOMMU_IO_PGTABLE_ARMV7S is not set +CONFIG_OF_IOMMU=y +CONFIG_IPMMU_VMSA=y +# CONFIG_ARM_SMMU is not set # # Remoteproc drivers Nice! BTW, forgetting the =y causes a crash: $ path-to-source-tree/scripts/kconfig/kconfig set IPMMU_VMSA GEN ./Makefile path-to-source-tree/scripts/kconfig/Makefile:37: recipe for target 'lconfig' failed make[4]: *** [lconfig] Segmentation fault path-to-source-tree/Makefile:548: recipe for target 'lconfig' failed make[3]: *** [lconfig] Error 2 Makefile:152: recipe for target 'sub-make' failed make[2]: *** [sub-make] Error 2 Makefile:24: recipe for target '__sub-make' failed make[1]: *** [__sub-make] Error 2 GNUmakefile:10: recipe for target 'all' failed make: *** [all] Error 2 Gr{oetje,eeting}s, Geert -- Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert at linux-m68k.org In personal conversations with technical people, I call myself a hacker. But when I'm talking to journalists I just say "programmer" or something like that. -- Linus Torvalds From dan.carpenter at oracle.com Mon Jul 10 11:15:56 2017 From: dan.carpenter at oracle.com (Dan Carpenter) Date: Mon, 10 Jul 2017 14:15:56 +0300 Subject: [Ksummit-discuss] [PATCH 2/2] kconfig: new command line kernel configuration tool In-Reply-To: References: <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170706144208.6hlgxwo37gntk6qm@mwanda> Message-ID: <20170710111555.b66w4vuc6irur5n4@mwanda> On Mon, Jul 10, 2017 at 11:44:22AM +0200, Geert Uytterhoeven wrote: > > --- /dev/null > > +++ b/scripts/kconfig/lconf.c > > @@ -0,0 +1,332 @@ > > +/* > > + * Copyright (C) 2015 Oracle > > + * Released under the terms of the GNU GPL v2.0. > > + * > > + */ > > +#define _GNU_SOURCE > > scripts/kconfig/lconf.c:6:0: warning: "_GNU_SOURCE" redefined > #define _GNU_SOURCE > ^ > :0:0: note: this is the location of the previous definition > > You can do: > > #ifndef _GNU_SOURCE > #define _GNU_SOURCE > #endif > > like scripts/kconfig/nconf.c does. Will do. > > > > +static int conf_sym(struct symbol *sym) > > +{ > > > + if (sym_set_tristate_value(sym, newval)) { > > + /* FIXME: if I don't write it doesn't save */ > > + conf_write(NULL, 1); > > scripts/kconfig/lconf.c: In function ?conf_sym?: > scripts/kconfig/lconf.c:159:4: error: too many arguments to > function ?conf_write? > conf_write(NULL, 1); I added that in [PATCH 1/2], otherwise there is a lot of unwanted output. > Likewise > > > For search, it doesn't work with the CONFIG_ prefix: Will fix. > > Nice! > > BTW, forgetting the =y causes a crash: > Oops. Sorry. Will fix. regards, dan carpenter From tony.luck at intel.com Mon Jul 10 17:15:33 2017 From: tony.luck at intel.com (Luck, Tony) Date: Mon, 10 Jul 2017 17:15:33 +0000 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170707113650.ee6oys5u4vq5hgdi@mwanda> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170707113650.ee6oys5u4vq5hgdi@mwanda> Message-ID: <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> > Argh. You're right. I'm an idiot. It's actually working fine, but it > asked so many questions I thought it was broken. I run: $ yes "" | make oldconfig to just pick the default answer for all the questions. It works almost all of the time. Only recent break was when using a RHEL config as the start point some change in the MPT2SAS and MPT3SAS bits left me with a kernel with no driver to get to my root file system. -Tony From alexandre.belloni at free-electrons.com Mon Jul 10 17:33:35 2017 From: alexandre.belloni at free-electrons.com (Alexandre Belloni) Date: Mon, 10 Jul 2017 19:33:35 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170707113650.ee6oys5u4vq5hgdi@mwanda> <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> Message-ID: <20170710173335.4ksnso6dzaekoxz4@piout.net> On 10/07/2017 at 17:15:33 +0000, Luck, Tony wrote: > > Argh. You're right. I'm an idiot. It's actually working fine, but it > > asked so many questions I thought it was broken. > > I run: > > $ yes "" | make oldconfig > I know the yes trick works for kernels older than 3.7 but maybe people should start using make olddefconfig ;) -- Alexandre Belloni, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com From torvalds at linux-foundation.org Mon Jul 10 18:28:58 2017 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Mon, 10 Jul 2017 11:28:58 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: <20170710173335.4ksnso6dzaekoxz4@piout.net> References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170707113650.ee6oys5u4vq5hgdi@mwanda> <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> <20170710173335.4ksnso6dzaekoxz4@piout.net> Message-ID: On Mon, Jul 10, 2017 at 10:33 AM, Alexandre Belloni wrote: > > I know the yes trick works for kernels older than 3.7 but maybe people > should start using make olddefconfig ;) Honestly, I wish more people just ran "oldconfig" and then started complaining about people adding insane Kconfig options. I seem to be the only one ever pushing back against some of the people out there that add Kconfig options that really make zero sense (or that add the oddest drivers or features with a crazy "default this thing to on"). Linus From rdunlap at infradead.org Mon Jul 10 19:44:55 2017 From: rdunlap at infradead.org (Randy Dunlap) Date: Mon, 10 Jul 2017 12:44:55 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170707113650.ee6oys5u4vq5hgdi@mwanda> <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> <20170710173335.4ksnso6dzaekoxz4@piout.net> Message-ID: On 07/10/2017 11:28 AM, Linus Torvalds wrote: > On Mon, Jul 10, 2017 at 10:33 AM, Alexandre Belloni > wrote: >> >> I know the yes trick works for kernels older than 3.7 but maybe people >> should start using make olddefconfig ;) > > Honestly, I wish more people just ran "oldconfig" and then started > complaining about people adding insane Kconfig options. > > I seem to be the only one ever pushing back against some of the people > out there that add Kconfig options that really make zero sense (or > that add the oddest drivers or features with a crazy "default this > thing to on"). I could -- and I have a few times. But usually it needs a $maintainer to make others listen. I'm just a lurker. -- ~Randy From vrothberg at suse.com Tue Jul 11 06:21:32 2017 From: vrothberg at suse.com (Valentin Rothberg) Date: Tue, 11 Jul 2017 08:21:32 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] is Kconfig a bit hard sometimes? In-Reply-To: References: <20170627135839.GB1886@jagdpanzerIV.localdomain> <20170706144028.46a2mt2mdzpt6ip7@mwanda> <20170707113650.ee6oys5u4vq5hgdi@mwanda> <3908561D78D1C84285E8C5FCA982C28F613009A7@ORSMSX114.amr.corp.intel.com> <20170710173335.4ksnso6dzaekoxz4@piout.net> Message-ID: <20170711062132.GA13470@nebuchadnezzar.suse.de> On Jul 10 '17 11:28, Linus Torvalds wrote: > On Mon, Jul 10, 2017 at 10:33 AM, Alexandre Belloni > wrote: > > > > I know the yes trick works for kernels older than 3.7 but maybe people > > should start using make olddefconfig ;) > > Honestly, I wish more people just ran "oldconfig" and then started > complaining about people adding insane Kconfig options. If you want, we could add a "--diff-options" to checkkconfigsymbols.py. It runs reasonably fast and would also report options outside the current architecture. Kind regards, Valentin From dhowells at redhat.com Wed Jul 12 12:43:30 2017 From: dhowells at redhat.com (David Howells) Date: Wed, 12 Jul 2017 13:43:30 +0100 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace Message-ID: <10144.1499863410@warthog.procyon.org.uk> Whilst undertaking a foray into container space and, related to that, looking at overhauling the mounting API, it occurred to me that I could make use of the mount context (now fs_context) that I was creating to allow the filesystem driver to pass supplementary error information back to the userspace program that was driving it in the form of textual messages: int fd = fsopen("ext4"); write(fd, "d /dev/sda2"); write(fd, "o user_xattr"); if (fsmount(fd, "/mnt") == -1) { /* Something went wrong, read back any error info */ size = read(fd, buffer, sizeof(buffer)); /* Now print the supplementary error message */ fprintf(stderr, "%*.*s\n", size, size, buffer); } This would be particularly useful in the case of mounting a filesystem where so many things can go wrong that a small number is insufficient to represent them all. Yes, you have the dmesg log, but that's not necessarily available to you and is potentially intermixed with other things. Further, it's more user-friendly if the mount command or your GUI gives you the errors directly. However, it occurred to me that this feature might be useful in other cases, not just mounting, and there are cases where it's not easy or not possible to get the message back to userspace because there's no user-accessible context (eg. automounting), or because the context is buried several levels down the stack (eg. NFS mount doing a pathwalk). In which case, would it make sense to attach such a facility to the task_struct instead? I implemented a test of this using prctl, but a new syscall might be a better idea, at least for reading. (*) int old_setting = prctl(PR_ERRMSG_ENABLE, int setting); Enable (setting == 1) or disable (setting == 0) the facility. Disabling the facility clears the error buffer. (*) int size = prctl(PR_ERRMSG_READ, char *buffer, int buf_size); Read back a message and discard it. Anyway, some questions: (1) Is this something that would be of interest on a more global scale? Or should I just stick with stashing it in the fs_context structure and find someway to route around the pathwalk in nfs mount? Or is this totally a bad idea and only dmesg should ever be used? If it is of interest globally: (2) How big should I make each task's message buffer? My current implementation allows each task to hold a single message if enabled. (3) Should I allow warnings in addition to errors? (4) Should I allow wait() and co. to try and retrieve errors from zombies? (5) Should execve() disable the facility? (6) Could all the messages be static (not kmalloc'd) and cleared/redacted by rmmod? This would potentially prevent the use of formatted messages. David From acme at kernel.org Wed Jul 12 14:33:21 2017 From: acme at kernel.org (Arnaldo Carvalho de Melo) Date: Wed, 12 Jul 2017 11:33:21 -0300 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <10144.1499863410@warthog.procyon.org.uk> References: <10144.1499863410@warthog.procyon.org.uk> Message-ID: <20170712143321.GL27350@kernel.org> Em Wed, Jul 12, 2017 at 01:43:30PM +0100, David Howells escreveu: > Whilst undertaking a foray into container space and, related to that, looking > at overhauling the mounting API, it occurred to me that I could make use of > the mount context (now fs_context) that I was creating to allow the filesystem > driver to pass supplementary error information back to the userspace program > that was driving it in the form of textual messages: > > int fd = fsopen("ext4"); > write(fd, "d /dev/sda2"); > write(fd, "o user_xattr"); > if (fsmount(fd, "/mnt") == -1) { > /* Something went wrong, read back any error info */ > size = read(fd, buffer, sizeof(buffer)); > /* Now print the supplementary error message */ > fprintf(stderr, "%*.*s\n", size, size, buffer); > } > > This would be particularly useful in the case of mounting a filesystem where > so many things can go wrong that a small number is insufficient to represent > them all. Yes, you have the dmesg log, but that's not necessarily available > to you and is potentially intermixed with other things. Further, it's more > user-friendly if the mount command or your GUI gives you the errors directly. > > However, it occurred to me that this feature might be useful in other cases, > not just mounting, and there are cases where it's not easy or not possible to > get the message back to userspace because there's no user-accessible context > (eg. automounting), or because the context is buried several levels down the > stack (eg. NFS mount doing a pathwalk). > > In which case, would it make sense to attach such a facility to the > task_struct instead? I implemented a test of this using prctl, but a new > syscall might be a better idea, at least for reading. > > (*) int old_setting = prctl(PR_ERRMSG_ENABLE, int setting); > > Enable (setting == 1) or disable (setting == 0) the facility. > Disabling the facility clears the error buffer. > > (*) int size = prctl(PR_ERRMSG_READ, char *buffer, int buf_size); > > Read back a message and discard it. There were discussions about this in the not so distant past, perf being one of the areas where something like this would help a lot, lemme dig it, yeah, there is even a short LWN article describing it and with links to the lkml posts: https://lwn.net/Articles/657341/ Involces prctl as yours, etc, etc. What we do now in tools/perf/ with what we do have now is to have strerrno like messages for each class and method (well, we have for some of them), like: int perf_evsel__open_strerror(struct perf_evsel *evsel, struct target *target, int err, char *msg, size_t size); where we have a switch to see, from syscall errno return and intended target (CPU, system wide, a specific thread, cgroups, etc), who is asking this (user, root, etc) and lots of other tunables, how to best translate this to the user, formatting it in a string allows us to show it in whatever GUI is in use. - Arnaldo > > Anyway, some questions: > > (1) Is this something that would be of interest on a more global scale? > > Or should I just stick with stashing it in the fs_context structure and > find someway to route around the pathwalk in nfs mount? > > Or is this totally a bad idea and only dmesg should ever be used? > > If it is of interest globally: > > (2) How big should I make each task's message buffer? My current > implementation allows each task to hold a single message if enabled. > > (3) Should I allow warnings in addition to errors? > > (4) Should I allow wait() and co. to try and retrieve errors from zombies? > > (5) Should execve() disable the facility? > > (6) Could all the messages be static (not kmalloc'd) and cleared/redacted by > rmmod? This would potentially prevent the use of formatted messages. > > David > _______________________________________________ > Ksummit-discuss mailing list > Ksummit-discuss at lists.linuxfoundation.org > https://lists.linuxfoundation.org/mailman/listinfo/ksummit-discuss From acme at kernel.org Wed Jul 12 14:44:28 2017 From: acme at kernel.org (Arnaldo Carvalho de Melo) Date: Wed, 12 Jul 2017 11:44:28 -0300 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <20170712143321.GL27350@kernel.org> References: <10144.1499863410@warthog.procyon.org.uk> <20170712143321.GL27350@kernel.org> Message-ID: <20170712144428.GM27350@kernel.org> Em Wed, Jul 12, 2017 at 11:33:21AM -0300, Arnaldo Carvalho de Melo escreveu: > What we do now in tools/perf/ with what we do have now is to have > strerrno like messages for each class and method (well, we have for some > of them), like: > > int perf_evsel__open_strerror(struct perf_evsel *evsel, > struct target *target, > int err, char *msg, size_t size); > > where we have a switch to see, from syscall errno return and intended > target (CPU, system wide, a specific thread, cgroups, etc), who is > asking this (user, root, etc) and lots of other tunables, how to best > translate this to the user, formatting it in a string allows us to show > it in whatever GUI is in use. To get this clearer in terms of actual usage, here is a (simplified) snippet for 'perf top': try_again: if (perf_evsel__open(event, cpus, threads) < 0) { if (perf_evsel__fallback(event, errno, msg, sizeof(msg))) { if (verbose > 0) ui__warning("%s\n", msg); goto try_again; } perf_evsel__open_strerror(event, target, errno, msg, sizeof(msg)); ui__error("%s\n", msg); goto out_err; } - Arnaldo From dhowells at redhat.com Wed Jul 12 14:57:56 2017 From: dhowells at redhat.com (David Howells) Date: Wed, 12 Jul 2017 15:57:56 +0100 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <10144.1499863410@warthog.procyon.org.uk> References: <10144.1499863410@warthog.procyon.org.uk> Message-ID: <12463.1499871476@warthog.procyon.org.uk> David Howells wrote: > In which case, would it make sense to attach such a facility to the > task_struct instead? I implemented a test of this using prctl, but a new > syscall might be a better idea, at least for reading. > > (*) int old_setting = prctl(PR_ERRMSG_ENABLE, int setting); > > Enable (setting == 1) or disable (setting == 0) the facility. > Disabling the facility clears the error buffer. > > (*) int size = prctl(PR_ERRMSG_READ, char *buffer, int buf_size); > > Read back a message and discard it. I forgot to add that I've kept the in-kernel interface I have for this very simple for the moment: void errorf(const char *fmt, ...); int invalf(const char *fmt, ...); where these functions take printf-style arguments and where invalf() is the same as errorf(), but returns -EINVAL for convenience. To take an example from NFS: - if (auth_info->flavor_len + 1 >= max_flavor_len) { - dfprintk(MOUNT, "NFS: too many sec= flavors\n"); - return -EINVAL; - } + if (auth_info->flavor_len + 1 >= max_flavor_len) + return invalf("NFS: too many sec= flavors"); David From stephen at networkplumber.org Wed Jul 12 15:21:39 2017 From: stephen at networkplumber.org (Stephen Hemminger) Date: Wed, 12 Jul 2017 08:21:39 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <12463.1499871476@warthog.procyon.org.uk> References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> Message-ID: <20170712082139.17cfd33a@xeon-e3> On Wed, 12 Jul 2017 15:57:56 +0100 David Howells wrote: > David Howells wrote: > > > In which case, would it make sense to attach such a facility to the > > task_struct instead? I implemented a test of this using prctl, but a new > > syscall might be a better idea, at least for reading. > > > > (*) int old_setting = prctl(PR_ERRMSG_ENABLE, int setting); > > > > Enable (setting == 1) or disable (setting == 0) the facility. > > Disabling the facility clears the error buffer. > > > > (*) int size = prctl(PR_ERRMSG_READ, char *buffer, int buf_size); > > > > Read back a message and discard it. > > I forgot to add that I've kept the in-kernel interface I have for this very > simple for the moment: > > void errorf(const char *fmt, ...); > int invalf(const char *fmt, ...); > > where these functions take printf-style arguments and where invalf() is the > same as errorf(), but returns -EINVAL for convenience. To take an example > from NFS: > > - if (auth_info->flavor_len + 1 >= max_flavor_len) { > - dfprintk(MOUNT, "NFS: too many sec= flavors\n"); > - return -EINVAL; > - } > + if (auth_info->flavor_len + 1 >= max_flavor_len) > + return invalf("NFS: too many sec= flavors"); Netlink has recently got extended error reporting, still not used widely and library support is lacking in most places. From torvalds at linux-foundation.org Wed Jul 12 16:19:55 2017 From: torvalds at linux-foundation.org (Linus Torvalds) Date: Wed, 12 Jul 2017 09:19:55 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <20170712082139.17cfd33a@xeon-e3> References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> Message-ID: On Wed, Jul 12, 2017 at 8:21 AM, Stephen Hemminger wrote: > > Netlink has recently got extended error reporting, still not used widely > and library support is lacking in most places. Yeah, and that "not widely supported and library support is lacking" is always going to be an issue with anything like that. Along with internationalization, which is a whole nasty set of issues in itself with error messages. It's not going to happen, in other words. The problems are basically insurmountable, and the thing it fixes will always be some special case that doesn't much matter. Every time it comes up it is because some developer found one case that they were hunting down and it annoyed them, and the developer went "if only it had included more information and it would have been obvious". But every time it comes up people ignore this basic issue: [torvalds at i7 linux]$ git grep -e '-E[A-Z]\{4\}' | wc -l 182523 Give it up. It's really is a horrible idea for so many reasons. Linus From stephen at networkplumber.org Wed Jul 12 16:35:07 2017 From: stephen at networkplumber.org (Stephen Hemminger) Date: Wed, 12 Jul 2017 09:35:07 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> Message-ID: <20170712093507.4482f3fc@xeon-e3> On Wed, 12 Jul 2017 09:19:55 -0700 Linus Torvalds wrote: > On Wed, Jul 12, 2017 at 8:21 AM, Stephen Hemminger > wrote: > > > > Netlink has recently got extended error reporting, still not used widely > > and library support is lacking in most places. > > Yeah, and that "not widely supported and library support is lacking" > is always going to be an issue with anything like that. > > Along with internationalization, which is a whole nasty set of issues > in itself with error messages. > > It's not going to happen, in other words. The problems are basically > insurmountable, and the thing it fixes will always be some special > case that doesn't much matter. > > Every time it comes up it is because some developer found one case > that they were hunting down and it annoyed them, and the developer > went "if only it had included more information and it would have been > obvious". > > But every time it comes up people ignore this basic issue: > > [torvalds at i7 linux]$ git grep -e '-E[A-Z]\{4\}' | wc -l > 182523 > > > Give it up. It's really is a horrible idea for so many reasons. For netlink, it isn't so bad. 80% of the usage is in iproute2 and therefore getting tool support for the usual cases isn't too hard. I fear kernel developers think at too low a level. They think if glibc and/or 1st level command can handle an extension, their work is done. But in the modern world, there are many scripts and layers above that. For the networking case, the worst case examples are things where configuration is done in stuff like some layer on top of Openstack, in python code which is scripting ip commands, which is talking to the kernel. Good luck on trying to get any meaningful error handling out of that dog pile. From leon at kernel.org Fri Jul 14 04:04:47 2017 From: leon at kernel.org (Leon Romanovsky) Date: Fri, 14 Jul 2017 07:04:47 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170630062717.534b06e9@canb.auug.org.au> References: <1498754169.2834.61.camel@HansenPartnership.com> <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> Message-ID: <20170714040447.GT1528@mtr-leonro.local> On Fri, Jun 30, 2017 at 06:27:17AM +1000, Stephen Rothwell wrote: > Hi Kees, > > On Thu, 29 Jun 2017 13:16:40 -0700 Kees Cook wrote: > > > > [1] If the solution for this is to merge other -next trees into mine, > > I guess I can do that, though it can be very messy if any of them are > > forced to make their commits unstable. It also creates headaches, > > AIUI, for sfr if my tree suddenly gains a bunch of other trees so it's > > not clear where something came from. > > I don't have a problem with trees in linux-next sharing *commits* - I > have problems when they share *patches* that are different commits > (that affect files that get changed in other commits). Do we have any sane way to overcome this limitation? I tried to add my tree [1] to participate in linux-next. My tree includes my submission queue and important patches posted to the mailing list to the RDMA subsystem. The absence of ability to add parallel tree with same commits doesn't allow us effectively test the RDMA patches. The reasons to it are combination of mostly two factors: my tree is not official one [2] (all patches in my tree are not officially final) and very sporadic update very close and/or during merge window [3]. In this cycle, we missed merge window [4] because lack of ready for pull tree [5]. Thanks [1] https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/ [2] https://git.kernel.org/cgit/linux/kernel/git/dledford/rdma.git/ [3] http://marc.info/?l=linux-next&m=149999488214297&w=2 [4] http://marc.info/?l=linux-rdma&m=149980130008834&w=2 [5] http://marc.info/?l=linux-rdma&m=149987945120683&w=2 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From greg at kroah.com Fri Jul 14 09:54:09 2017 From: greg at kroah.com (Greg KH) Date: Fri, 14 Jul 2017 11:54:09 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714040447.GT1528@mtr-leonro.local> References: <1498754169.2834.61.camel@HansenPartnership.com> <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> Message-ID: <20170714095409.GF2269@kroah.com> On Fri, Jul 14, 2017 at 07:04:47AM +0300, Leon Romanovsky wrote: > On Fri, Jun 30, 2017 at 06:27:17AM +1000, Stephen Rothwell wrote: > > Hi Kees, > > > > On Thu, 29 Jun 2017 13:16:40 -0700 Kees Cook wrote: > > > > > > [1] If the solution for this is to merge other -next trees into mine, > > > I guess I can do that, though it can be very messy if any of them are > > > forced to make their commits unstable. It also creates headaches, > > > AIUI, for sfr if my tree suddenly gains a bunch of other trees so it's > > > not clear where something came from. > > > > I don't have a problem with trees in linux-next sharing *commits* - I > > have problems when they share *patches* that are different commits > > (that affect files that get changed in other commits). > > Do we have any sane way to overcome this limitation? > > I tried to add my tree [1] to participate in linux-next. My tree > includes my submission queue and important patches posted to the mailing list > to the RDMA subsystem. > > The absence of ability to add parallel tree with same commits doesn't allow us > effectively test the RDMA patches. Why do you need "parallel" trees in linux-next? What is that going to help with? > The reasons to it are combination of mostly two factors: my tree is not > official one [2] (all patches in my tree are not officially final) and very > sporadic update very close and/or during merge window [3]. If it's not "official", why should it be in linux-next? thanks, greg k-h From leon at kernel.org Fri Jul 14 10:29:20 2017 From: leon at kernel.org (Leon Romanovsky) Date: Fri, 14 Jul 2017 13:29:20 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714095409.GF2269@kroah.com> References: <1498754169.2834.61.camel@HansenPartnership.com> <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> Message-ID: <20170714102920.GY1528@mtr-leonro.local> On Fri, Jul 14, 2017 at 11:54:09AM +0200, Greg KH wrote: > On Fri, Jul 14, 2017 at 07:04:47AM +0300, Leon Romanovsky wrote: > > On Fri, Jun 30, 2017 at 06:27:17AM +1000, Stephen Rothwell wrote: > > > Hi Kees, > > > > > > On Thu, 29 Jun 2017 13:16:40 -0700 Kees Cook wrote: > > > > > > > > [1] If the solution for this is to merge other -next trees into mine, > > > > I guess I can do that, though it can be very messy if any of them are > > > > forced to make their commits unstable. It also creates headaches, > > > > AIUI, for sfr if my tree suddenly gains a bunch of other trees so it's > > > > not clear where something came from. > > > > > > I don't have a problem with trees in linux-next sharing *commits* - I > > > have problems when they share *patches* that are different commits > > > (that affect files that get changed in other commits). > > > > Do we have any sane way to overcome this limitation? > > > > I tried to add my tree [1] to participate in linux-next. My tree > > includes my submission queue and important patches posted to the mailing list > > to the RDMA subsystem. > > > > The absence of ability to add parallel tree with same commits doesn't allow us > > effectively test the RDMA patches. > > Why do you need "parallel" trees in linux-next? What is that going to > help with? We are developing against two subsystems at the same time (netdev vs. RDMA) and need to ensure that combination of them is working. Currently me (RDMA) and Saeed (netdev) are merging out trees by ourselves [1] and instructs our verification (end-to-end and QA) to run from that tree. It means that we are missing a lot of stuff related to PCI, nvme, vitalization and storage where our technology is used. The difference in maintainers style between netdev and RDMA causes to have long queue (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. And the situation is worse when someone posts patches which has potential to break other vendors. Ability to have "parallel" trees will allow us to run our (other vendors expressed the same desire) verification on top of linux-next with all goodies of automatic regression systems which we have as a hardware vendor. So I would like to have "parallel" tree where I can put all my RDMA patches + important patches from other parties and run from linux-next. > > > The reasons to it are combination of mostly two factors: my tree is not > > official one [2] (all patches in my tree are not officially final) and very > > sporadic update very close and/or during merge window [3]. > > If it's not "official", why should it be in linux-next? Because, official updates occur mostly twice in the cycle on -rc3 (for fixes) and before merge window, while it is too late for us because we are preparing our submission queues for next cycle (Linus's requirement for Mellanox's submissions) and verification is busy with that. Thanks [1] https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux.git/ branches:queue-next and queue-rc [2] https://git.kernel.org/pub/scm/linux/kernel/git/leon/linux-rdma.git/log/?h=testing/queue-next > > thanks, > > greg k-h -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From andrew at lunn.ch Fri Jul 14 14:10:57 2017 From: andrew at lunn.ch (Andrew Lunn) Date: Fri, 14 Jul 2017 16:10:57 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714102920.GY1528@mtr-leonro.local> References: <1498754169.2834.61.camel@HansenPartnership.com> <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> Message-ID: <20170714141057.GC21743@lunn.ch> > The difference in maintainers style between netdev and RDMA causes to have long queue > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. It is possible to get 0-day to run against any arbitrary git tree, if you ask nicely. If same is true for the kernel-ci project. So if you are willing to do the merge work, you can get it tested. Andrew From broonie at kernel.org Fri Jul 14 15:05:50 2017 From: broonie at kernel.org (Mark Brown) Date: Fri, 14 Jul 2017 16:05:50 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714141057.GC21743@lunn.ch> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> Message-ID: <20170714150550.ubtkwmd3wcx554m6@sirena.org.uk> On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > The difference in maintainers style between netdev and RDMA causes to have long queue > > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. > It is possible to get 0-day to run against any arbitrary git tree, if > you ask nicely. If same is true for the kernel-ci project. So if you > are willing to do the merge work, you can get it tested. Trees can be added to kernelci, yes. Another approach would be to work out a workflow with the upstreams that makes this better, if they'd take pull requests for example. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From leon at kernel.org Fri Jul 14 15:35:44 2017 From: leon at kernel.org (Leon Romanovsky) Date: Fri, 14 Jul 2017 18:35:44 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714141057.GC21743@lunn.ch> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> Message-ID: <20170714153544.GE1528@mtr-leonro.local> On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > The difference in maintainers style between netdev and RDMA causes to have long queue > > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. > > It is possible to get 0-day to run against any arbitrary git tree, if > you ask nicely. If same is true for the kernel-ci project. So if you > are willing to do the merge work, you can get it tested. 0-day is checking my tree, so it is not the problem. I don't see how kernel-ci can help me, because RDMA requires special hardware to run it and it usually requires more than two endpoints (servers) connected together. My problem is related to changes in other trees for example netdev, which can break RDMA functionality. Technology wise, there are: 1. RoCE - RDMA over Converged Ethernet - netdev is below RDMA 2. IPoIB - IP over Infiniband - netdev is above RDMA 3. HFI-VNIC - Ethernet over OmniPath - netdev is above RDMA 4. iWARP - RDMA over IP networks e.t.c. > > Andrew -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From James.Bottomley at HansenPartnership.com Fri Jul 14 15:43:58 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Fri, 14 Jul 2017 08:43:58 -0700 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714153544.GE1528@mtr-leonro.local> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714153544.GE1528@mtr-leonro.local> Message-ID: <1500047038.2853.16.camel@HansenPartnership.com> On Fri, 2017-07-14 at 18:35 +0300, Leon Romanovsky wrote: > On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > > > > > > > The difference in maintainers style between netdev and RDMA > > > causes to have long queue > > > (100+) of patches posted to the ML [2], which are not cross- > > > checked in various CIs. > > > > It is possible to get 0-day to run against any arbitrary git tree, > > if you ask nicely. If same is true for the kernel-ci project. So if > > you are willing to do the merge work, you can get it tested. > > 0-day is checking my tree, so it is not the problem. > > I don't see how kernel-ci can help me, because RDMA requires special > hardware to run it and it usually requires more than two endpoints > (servers) connected together. > > My problem is related to changes in other trees for example netdev, > which can break RDMA functionality. > > Technology wise, there are: > 1. RoCE - RDMA over Converged Ethernet - netdev is below RDMA > 2. IPoIB - IP over Infiniband - netdev is above RDMA > 3. HFI-VNIC - Ethernet over OmniPath - netdev is above RDMA > 4. iWARP - RDMA over IP networks > e.t.c. So I think your goal is to get your tree and the one above you (Doug's tree) into linux-next without causing a mismerge nightmare? I still didn't get why you can't change workflow to share commits? If you can do that, linux-next can be based on both your tree and the one above it. You can do this either by you sending pull requests or by you basing on the upstream tree and rebasing when the patches are accepted (rebase is very good at recognizing and discarding the same patch with a different commit id). James -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From leon at kernel.org Fri Jul 14 15:51:42 2017 From: leon at kernel.org (Leon Romanovsky) Date: Fri, 14 Jul 2017 18:51:42 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714150550.ubtkwmd3wcx554m6@sirena.org.uk> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714150550.ubtkwmd3wcx554m6@sirena.org.uk> Message-ID: <20170714155142.GF1528@mtr-leonro.local> On Fri, Jul 14, 2017 at 04:05:50PM +0100, Mark Brown wrote: > On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > > The difference in maintainers style between netdev and RDMA causes to have long queue > > > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. > > > It is possible to get 0-day to run against any arbitrary git tree, if > > you ask nicely. If same is true for the kernel-ci project. So if you > > are willing to do the merge work, you can get it tested. > > Trees can be added to kernelci, yes. Another approach would be to work > out a workflow with the upstreams that makes this better, if they'd take > pull requests for example. Isn't the goal of this topic in maintainers summit? Improving workflows :) So, my way to overcome my issues was to add "parallel" tree and to stop crying about "long queues". Thanks -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From leon at kernel.org Fri Jul 14 16:08:41 2017 From: leon at kernel.org (Leon Romanovsky) Date: Fri, 14 Jul 2017 19:08:41 +0300 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <1500047038.2853.16.camel@HansenPartnership.com> References: <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714153544.GE1528@mtr-leonro.local> <1500047038.2853.16.camel@HansenPartnership.com> Message-ID: <20170714160841.GG1528@mtr-leonro.local> On Fri, Jul 14, 2017 at 08:43:58AM -0700, James Bottomley wrote: > On Fri, 2017-07-14 at 18:35 +0300, Leon Romanovsky wrote: > > On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > > > > > > > > > > The difference in maintainers style between netdev and RDMA > > > > causes to have long queue > > > > (100+) of patches posted to the ML [2], which are not cross- > > > > checked in various CIs. > > > > > > It is possible to get 0-day to run against any arbitrary git tree, > > > if you ask nicely. If same is true for the kernel-ci project. So if > > > you are willing to do the merge work, you can get it tested. > > > > 0-day is checking my tree, so it is not the problem. > > > > I don't see how kernel-ci can help me, because RDMA requires special > > hardware to run it and it usually requires more than two endpoints > > (servers) connected together. > > > > My problem is related to changes in other trees for example netdev, > > which can break RDMA functionality. > > > > Technology wise, there are: > > 1. RoCE - RDMA over Converged Ethernet - netdev is below RDMA > > 2. IPoIB - IP over Infiniband - netdev is above RDMA > > 3. HFI-VNIC - Ethernet over OmniPath - netdev is above RDMA > > 4. iWARP - RDMA over IP networks > > e.t.c. > > So I think your goal is to get your tree and the one above you (Doug's > tree) into linux-next without causing a mismerge nightmare? Yeah, exactly, I acknowledge Doug's work and just want to be sure that all other tress are not breaking our technology and want to see it as soon as possible. In regards, of my submissions, I'm pretty confident with it. The patches are backed by verification teams and don't got public without approval. > > I still didn't get why you can't change workflow to share commits? If > you can do that, linux-next can be based on both your tree and the one > above it. You can do this either by you sending pull requests or by you > basing on the upstream tree and rebasing when the patches are accepted > (rebase is very good at recognizing and discarding the same patch with > a different commit id). 1. I would like to send pull requests, but It doesn't depend on me to honor or not pull request. 2. In my early days, I tried to base on upstream and rebase, but it caused to emails from Stephen [2], maybe I need to try it again. [1] http://www.mail-archive.com/linux-kernel at vger.kernel.org/msg1302627.html > > James -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 833 bytes Desc: not available URL: From andrew at lunn.ch Fri Jul 14 16:18:24 2017 From: andrew at lunn.ch (Andrew Lunn) Date: Fri, 14 Jul 2017 18:18:24 +0200 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714153544.GE1528@mtr-leonro.local> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714153544.GE1528@mtr-leonro.local> Message-ID: <20170714161824.GJ21743@lunn.ch> On Fri, Jul 14, 2017 at 06:35:44PM +0300, Leon Romanovsky wrote: > On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > > The difference in maintainers style between netdev and RDMA causes to have long queue > > > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. > > > > It is possible to get 0-day to run against any arbitrary git tree, if > > you ask nicely. If same is true for the kernel-ci project. So if you > > are willing to do the merge work, you can get it tested. > > 0-day is checking my tree, so it is not the problem. > > I don't see how kernel-ci can help me, because RDMA requires special > hardware to run it and it usually requires more than two endpoints (servers) > connected together. kernel-ci are happy to receive hardware. I've sent them boards in the past which have been added to their test farm. Kernel-ci is mostly about boot testing, but they do do some tests post boot. So if you can supply tests as well, they may run them for you. > My problem is related to changes in other trees for example netdev, which > can break RDMA functionality. > > Technology wise, there are: > 1. RoCE - RDMA over Converged Ethernet - netdev is below RDMA > 2. IPoIB - IP over Infiniband - netdev is above RDMA > 3. HFI-VNIC - Ethernet over OmniPath - netdev is above RDMA > 4. iWARP - RDMA over IP networks How much of this do you already have automated test for? You can also setup your own test farm, using the kernels kernel-ci builds. Andrew From broonie at sirena.org.uk Fri Jul 14 16:20:25 2017 From: broonie at sirena.org.uk (Mark Brown) Date: Fri, 14 Jul 2017 17:20:25 +0100 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714155142.GF1528@mtr-leonro.local> References: <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714150550.ubtkwmd3wcx554m6@sirena.org.uk> <20170714155142.GF1528@mtr-leonro.local> Message-ID: <20170714162025.liz3hmpedz2rfquq@sirena.org.uk> On Fri, Jul 14, 2017 at 06:51:42PM +0300, Leon Romanovsky wrote: > On Fri, Jul 14, 2017 at 04:05:50PM +0100, Mark Brown wrote: > > Trees can be added to kernelci, yes. Another approach would be to work > > out a workflow with the upstreams that makes this better, if they'd take > > pull requests for example. > Isn't the goal of this topic in maintainers summit? Improving workflows :) It's also the goal here! > So, my way to overcome my issues was to add "parallel" tree and to stop > crying about "long queues". So I guess what everyone is suggesting here is changing this from being a parallel tree to a tree that's part of the normal workflow for this code. -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 488 bytes Desc: not available URL: From Bart.VanAssche at wdc.com Fri Jul 14 16:28:04 2017 From: Bart.VanAssche at wdc.com (Bart Van Assche) Date: Fri, 14 Jul 2017 16:28:04 +0000 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] Developing across multiple areas of the kernel In-Reply-To: <20170714161824.GJ21743@lunn.ch> References: <1498758126.2834.70.camel@HansenPartnership.com> <20170629182044.GP21846@wotan.suse.de> <20170630062717.534b06e9@canb.auug.org.au> <20170714040447.GT1528@mtr-leonro.local> <20170714095409.GF2269@kroah.com> <20170714102920.GY1528@mtr-leonro.local> <20170714141057.GC21743@lunn.ch> <20170714153544.GE1528@mtr-leonro.local> <20170714161824.GJ21743@lunn.ch> Message-ID: <1500049683.2662.6.camel@wdc.com> On Fri, 2017-07-14 at 18:18 +0200, Andrew Lunn wrote: > On Fri, Jul 14, 2017 at 06:35:44PM +0300, Leon Romanovsky wrote: > > On Fri, Jul 14, 2017 at 04:10:57PM +0200, Andrew Lunn wrote: > > > > The difference in maintainers style between netdev and RDMA causes to have long queue > > > > (100+) of patches posted to the ML [2], which are not cross-checked in various CIs. > > > > > > It is possible to get 0-day to run against any arbitrary git tree, if > > > you ask nicely. If same is true for the kernel-ci project. So if you > > > are willing to do the merge work, you can get it tested. > > > > 0-day is checking my tree, so it is not the problem. > > > > I don't see how kernel-ci can help me, because RDMA requires special > > hardware to run it and it usually requires more than two endpoints (servers) > > connected together. > > kernel-ci are happy to receive hardware. I've sent them boards in the > past which have been added to their test farm. Kernel-ci is mostly > about boot testing, but they do do some tests post boot. So if you can > supply tests as well, they may run them for you. > > > My problem is related to changes in other trees for example netdev, which > > can break RDMA functionality. > > > > Technology wise, there are: > > 1. RoCE - RDMA over Converged Ethernet - netdev is below RDMA > > 2. IPoIB - IP over Infiniband - netdev is above RDMA > > 3. HFI-VNIC - Ethernet over OmniPath - netdev is above RDMA > > 4. iWARP - RDMA over IP networks > > How much of this do you already have automated test for? You can also > setup your own test farm, using the kernels kernel-ci builds. Hello Andrew, The srp-test software is fully automated. It requires IB hardware today but does not require a second server because it uses IB loopback. As soon as I have the time I will add RoCE support to the upstream SRP initiator and target drivers such that these tests can be run on top of Ethernet hardware. Please let me know if you would like to start using this software and if you need help. See also https://github.com/bvanassche/srp-test. Bart. From sergey.senozhatsky.work at gmail.com Wed Jul 19 06:24:01 2017 From: sergey.senozhatsky.work at gmail.com (Sergey Senozhatsky) Date: Wed, 19 Jul 2017 15:24:01 +0900 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: <20170619052146.GA2889@jagdpanzerIV.localdomain> References: <20170619052146.GA2889@jagdpanzerIV.localdomain> Message-ID: <20170719062401.GA12064@jagdpanzerIV.localdomain> On (06/19/17 14:21), Sergey Senozhatsky wrote: > Hello, > > I, Petr Mladek and Steven Rostedt would like to propose a printk > tech topic (as suggested by Steven). We are currently exploring the idea > of complete redesign and rework of printk and it would be extremely helpful > to hear from the community. printk serves different purposes, and some of > requirements of printk tend to contradict each other; printk is monolithic > and quite heavy, no wonder, it causes problems sometimes. I made a trivial printk TODO list. The list is incomplete and mostly was created for personal use: thus it's probably a bit hard to read, but at the same time it contains some quotes/opinions/ideas copy-pastes and web-links. May be can be of some use. This also looks like our possible (some approximation) agenda [if the topic will be accepted]. -ss From sergey.senozhatsky.work at gmail.com Wed Jul 19 06:25:38 2017 From: sergey.senozhatsky.work at gmail.com (Sergey Senozhatsky) Date: Wed, 19 Jul 2017 15:25:38 +0900 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: <20170719062401.GA12064@jagdpanzerIV.localdomain> References: <20170619052146.GA2889@jagdpanzerIV.localdomain> <20170719062401.GA12064@jagdpanzerIV.localdomain> Message-ID: <20170719062538.GB12064@jagdpanzerIV.localdomain> On (07/19/17 15:24), Sergey Senozhatsky wrote: > On (06/19/17 14:21), Sergey Senozhatsky wrote: > > Hello, > > > > I, Petr Mladek and Steven Rostedt would like to propose a printk > > tech topic (as suggested by Steven). We are currently exploring the idea > > of complete redesign and rework of printk and it would be extremely helpful > > to hear from the community. printk serves different purposes, and some of > > requirements of printk tend to contradict each other; printk is monolithic > > and quite heavy, no wonder, it causes problems sometimes. > > I made a trivial printk TODO list. The list is incomplete and mostly > was created for personal use: thus it's probably a bit hard to read, > but at the same time it contains some quotes/opinions/ideas copy-pastes > and web-links. May be can be of some use. This also looks like our > possible (some approximation) agenda [if the topic will be accepted]. > d'oh... the link... https://github.com/sergey-senozhatsky/printk-todo -ss From daniel.vetter at ffwll.ch Wed Jul 19 07:26:23 2017 From: daniel.vetter at ffwll.ch (Daniel Vetter) Date: Wed, 19 Jul 2017 09:26:23 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: <20170719062538.GB12064@jagdpanzerIV.localdomain> References: <20170619052146.GA2889@jagdpanzerIV.localdomain> <20170719062401.GA12064@jagdpanzerIV.localdomain> <20170719062538.GB12064@jagdpanzerIV.localdomain> Message-ID: On Wed, Jul 19, 2017 at 8:25 AM, Sergey Senozhatsky wrote: > On (07/19/17 15:24), Sergey Senozhatsky wrote: >> On (06/19/17 14:21), Sergey Senozhatsky wrote: >> > Hello, >> > >> > I, Petr Mladek and Steven Rostedt would like to propose a printk >> > tech topic (as suggested by Steven). We are currently exploring the idea >> > of complete redesign and rework of printk and it would be extremely helpful >> > to hear from the community. printk serves different purposes, and some of >> > requirements of printk tend to contradict each other; printk is monolithic >> > and quite heavy, no wonder, it causes problems sometimes. >> >> I made a trivial printk TODO list. The list is incomplete and mostly >> was created for personal use: thus it's probably a bit hard to read, >> but at the same time it contains some quotes/opinions/ideas copy-pastes >> and web-links. May be can be of some use. This also looks like our >> possible (some approximation) agenda [if the topic will be accepted]. >> > > d'oh... the link... > > https://github.com/sergey-senozhatsky/printk-todo lgtm, two quick notes: - my mail with the fbdev discussion seems to be in the wrong chapter. Move it from "console_sem" to "fbdev, tty, drm, etc .."? - feature request for per-console output: Per-console flag to always use a kthread/offloading, even when oops/panic is happening. kms definitely wants that. Please note that in that section. I can help with implementing, once we get there. Cheers, Daniel -- Daniel Vetter Software Engineer, Intel Corporation +41 (0) 79 365 57 48 - http://blog.ffwll.ch From dwmw2 at infradead.org Wed Jul 19 07:35:20 2017 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 19 Jul 2017 09:35:20 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: <20170620172738.zh4maxtfmlwhyrnt@sirena.org.uk> References: <20170619052146.GA2889@jagdpanzerIV.localdomain> <20170619103912.2edbf88a@gandalf.local.home> <20170619152055.GM3786@lunn.ch> <01a7d603-c0a2-7aae-8c8d-587063da5e61@suse.com> <20170619162317.4nxx6jsvuzvdtasz@sirena.org.uk> <20170620155825.GC409@tigerII.localdomain> <3908561D78D1C84285E8C5FCA982C28F612DAC67@ORSMSX114.amr.corp.intel.com> <20170620171134.GA444@tigerII.localdomain> <20170620172738.zh4maxtfmlwhyrnt@sirena.org.uk> Message-ID: <1500449720.19151.7.camel@infradead.org> On Tue, 2017-06-20 at 18:27 +0100, Mark Brown wrote: > On Wed, Jun 21, 2017 at 02:11:34AM +0900, Sergey Senozhatsky wrote: > > > > > another thing that I found useful is a CPU number of the processor > > that stored a particular line to the logbuf. > > At some point we start reinventing ftrace...??there's issues with > joining the two up but there should at least be lessons we can learn. > The other way of looking at this is "why are you abusing printk for stuff that should have been done via ftrace or other means instead". I confess I haven't got my curmudgeonly brain out of that mode at all, ever since realising that printk had been made asynchronous and unreliable (how long ago was that?) and that you could no longer see the dying gasp of a crashing kernel on its serial console. Rather than morphing printk into something more capable of bulk transport, I'd rather see it go back to its roots of debugging/diagnostics. The original complaint of "all this printk output makes things too slow" was better addressed by printing less or at lower severity (or adjusting the console loglevel), IMO. As things stand, the requirements for the various printk (ab)use cases seem to be contradictory ? if we're going to have a redesign then I think it would be good to take a holistic view and decide what it's actually *supposed* to be used for. And, perhaps more to the point, what it isn't supposed to be used for. -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4938 bytes Desc: not available URL: From dwmw2 at infradead.org Wed Jul 19 07:59:31 2017 From: dwmw2 at infradead.org (David Woodhouse) Date: Wed, 19 Jul 2017 09:59:31 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: References: <20170619103912.2edbf88a@gandalf.local.home> <20170619152055.GM3786@lunn.ch> <20170619122651.57ba27c4@gandalf.local.home> <20170624081411.58b4fb6a@vento.lan> <20170624140659.GM4875@lunn.ch> <20170624184216.2ffd4a96@gandalf.local.home> <20170624232140.GA27473@lunn.ch> <20170624234805.GT10672@ZenIV.linux.org.uk> <20170625012913.GC27473@lunn.ch> Message-ID: <1500451171.19151.13.camel@infradead.org> On Mon, 2017-06-26 at 10:46 +0200, Jiri Kosina wrote: > On Sat, 24 Jun 2017, Linus Torvalds wrote: > > > > > > > > > It is how the embedded world operates, RS232, or now more often, RS232? > > > with a built in USB-RS232 converter, so you use USB on the host. > > I'm not saying that serial lines shouldn't be an option. > > > > But for a *large* user base, they simply aren't. > > > > On regular PC's, it's often not an option any more. Even in the data > > center, it's often not an option any more. > I don't really agree here. Yes, the mid-to-hig-end servers don't probably? > contain the actual UART chip any more, but the vast majority of those have? > somehting that's emulated in firmware, and actually do have a serial? > console line connector (not the 9-pin one, but rather RJ-45 with either? > Cisco or Yost pinout), which is then connected into serial-over-TCP? > concentrator box, exposing the serial console over telnet (or some? > proprietary client application). This is seen in DCs quite frequently. > > Even machines that have very good IPMI support still ship with this. Yeah, we definitely still have a "serial console" in the data centre, even if it's not actually RS232 any more. Or indeed "serial". You want to catch those failures where even kdump doesn't manage to give you a viable report of the original crash? You'd better be watching... Even on regular PCs we have the USB debug ports which can serve the same purpose. But still, we're talking about printk being used for its original $DEITY-intended purpose for debugging and diagnostic data. Not for the random "hey, here's a channel I can abuse to send data up to userspace" stuff. I heartily agree with Steven when he says that "printk is used too freely". -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 4938 bytes Desc: not available URL: From rostedt at goodmis.org Wed Jul 19 13:02:39 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Wed, 19 Jul 2017 09:02:39 -0400 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> Message-ID: <20170719090239.39f031c5@gandalf.local.home> On Wed, 12 Jul 2017 09:19:55 -0700 Linus Torvalds wrote: > On Wed, Jul 12, 2017 at 8:21 AM, Stephen Hemminger > wrote: > > > > Netlink has recently got extended error reporting, still not used widely > > and library support is lacking in most places. > > Yeah, and that "not widely supported and library support is lacking" > is always going to be an issue with anything like that. > > Along with internationalization, which is a whole nasty set of issues > in itself with error messages. > > It's not going to happen, in other words. The problems are basically > insurmountable, and the thing it fixes will always be some special > case that doesn't much matter. > > Every time it comes up it is because some developer found one case > that they were hunting down and it annoyed them, and the developer > went "if only it had included more information and it would have been > obvious". > > But every time it comes up people ignore this basic issue: > > [torvalds at i7 linux]$ git grep -e '-E[A-Z]\{4\}' | wc -l > 182523 > Note a lot of those -E* are not going to user space. Some are in comments, and some are used internally. I use them to pass back information to other kernel only routines, as some errors are more critical than others. > > Give it up. It's really is a horrible idea for so many reasons. > One reason that this has never taken off is that there is no good infrastructure in doing it. I wouldn't tell people to give it up, but I don't see a one size fits all. In tracing, we have ways to pass detailed errors back to user space. But that's probably one of the easier cases as we have defined methods to do so. A more generic approach would require a lot more planning, and making it simple to use both in user space and in the kernel. If it is too complex in either place, it will be ignored. -- Steve From sergey.senozhatsky.work at gmail.com Thu Jul 20 05:19:08 2017 From: sergey.senozhatsky.work at gmail.com (Sergey Senozhatsky) Date: Thu, 20 Jul 2017 14:19:08 +0900 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: References: <20170619052146.GA2889@jagdpanzerIV.localdomain> <20170719062401.GA12064@jagdpanzerIV.localdomain> <20170719062538.GB12064@jagdpanzerIV.localdomain> Message-ID: <20170720051908.GB7483@jagdpanzerIV.localdomain> On (07/19/17 09:26), Daniel Vetter wrote: [..] > > d'oh... the link... > > > > https://github.com/sergey-senozhatsky/printk-todo > > lgtm, two quick notes: > - my mail with the fbdev discussion seems to be in the wrong chapter. > Move it from "console_sem" to "fbdev, tty, drm, etc .."? thanks for taking a look! and sorry for not being very responsive these weeks, still struggling to recover from my sickness. the list is incomplete and very spontaneous, I'll improve it. > - feature request for per-console output: Per-console flag to always > use a kthread/offloading, even when oops/panic is happening. kms > definitely wants that. Please note that in that section. I can help > with implementing, once we get there. thanks. will add. > Per-console flag to always use a kthread/offloading, even when oops/panic > is happening. kms definitely wants that. > hmm... kthread offloading during panic() is really risky. nothing guarantees that we will be able to call into the scheduler and wake up that console printing-kthread, or that we will be able to schedule at all. we may be in panic() from NMI handler, with the rest of CPUs stopped. it's quite a risky thing to do. that's why we disable printk offloading when in panic() - we don't want to make the things any worse. before doing this I think I want to make call_console_drivers() to be more reliable. right now we pick the first unseen messages from the logbuf and iterate over registered consoles calling ->write() on every driver from the console drivers list. if one of consoles is misbehaving, then the entire console output mechanism stops: we don't print anything on other consoles until current con->write() returns. so probably I want to make it more independent. -ss From sergey.senozhatsky.work at gmail.com Thu Jul 20 07:53:47 2017 From: sergey.senozhatsky.work at gmail.com (Sergey Senozhatsky) Date: Thu, 20 Jul 2017 16:53:47 +0900 Subject: [Ksummit-discuss] [TECH TOPIC] printk redesign In-Reply-To: <1500449720.19151.7.camel@infradead.org> References: <20170619103912.2edbf88a@gandalf.local.home> <20170619152055.GM3786@lunn.ch> <01a7d603-c0a2-7aae-8c8d-587063da5e61@suse.com> <20170619162317.4nxx6jsvuzvdtasz@sirena.org.uk> <20170620155825.GC409@tigerII.localdomain> <3908561D78D1C84285E8C5FCA982C28F612DAC67@ORSMSX114.amr.corp.intel.com> <20170620171134.GA444@tigerII.localdomain> <20170620172738.zh4maxtfmlwhyrnt@sirena.org.uk> <1500449720.19151.7.camel@infradead.org> Message-ID: <20170720075347.GA356@jagdpanzerIV.localdomain> Hello, On (07/19/17 09:35), David Woodhouse wrote: [..] > > At some point we start reinventing ftrace...??there's issues with > > joining the two up but there should at least be lessons we can learn. > > > > The other way of looking at this is "why are you abusing printk for > stuff that should have been done via ftrace or other means instead". > > I confess I haven't got my curmudgeonly brain out of that mode at all, > ever since realising that printk had been made asynchronous and > unreliable (how long ago was that?) and that you could no longer see > the dying gasp of a crashing kernel on its serial console. > > Rather than morphing printk into something more capable of bulk > transport, I'd rather see it go back to its roots of > debugging/diagnostics. > > The original complaint of "all this printk output makes things too > slow" was better addressed by printing less or at lower severity (or > adjusting the console loglevel), IMO. > > As things stand, the requirements for the various printk (ab)use cases > seem to be contradictory ? if we're going to have a redesign then I > think it would be good to take a holistic view and decide what it's > actually *supposed* to be used for. And, perhaps more to the point, > what it isn't supposed to be used for. just some thoughts, at glance printk has 3 major issues - it has to do offloading, no doubt. - printk() can deadlock, easily. (that's the whole reason there is printk_deferred()) - printk from NMI is not completely reliable. this area has been improved recently; but there are still cases when we can lose NMI-printk messages ... but there are more problems. and those issues are not completely printk fault. what I mean (and I'm not criticizing anyone), so we can split printk: defer printing of debug messages and have direct printing of important messages. and that's where the redesign hits the first obstacle: direct printing is unreliable. when we do call_console_drivers() we pass control to the outside world, and we never know where we will end up at. consoles can invoke timekeeping, networking, MM, and so on. so I think printk redesign better start from this part - make call to console drivers more reliable. if possible. what I'm talking about, by just one example: bug report https://marc.info/?l=dri-devel&m=149938825811219 root cause https://marc.info/?l=linux-mm&m=149939515214223&w=2 so printk live-locked, and there was no way to see any kernel logs until Tetsuo sysrq-c'ed the system. and the root cause was all those complex and difficult dependencies between completely different subsystems that printk depend on and that, in turn, depend on printk. > hm, this allocation, per se, looks ok to me. can't really blame it. > what you had is a combination of factors > > CPU0 CPU1 CPU2 > console_callback() > console_lock() > ^^^^^^^^^^^^^ > vprintk_emit() mutex_lock(&par->bo_mutex) > kzalloc(GFP_KERNEL) > console_trylock() kmem_cache_alloc() mutex_lock(&par->bo_mutex) > ^^^^^^^^^^^^^^^^ io_schedule_timeout there are more examples. more closer to the point, to the best of my knowledge, we don't have that much problems with the printk logbuf now. we made some progress there over the last year. yes, NMI printk is not completely awesome. where we do have problems, I think: a) we probably need to make more progress towards "and now we print it to the console" b) print out offloading c) printk deadlock and the need of printk_deferred() and it's not always crazy printk abuse to justify the existence of printk offloading. example: https://marc.info/?l=linux-mm&m=149977866327662 > you will find that calling cond_resched() (from console_unlock() from printk()) > can cause a delay of nearly one minute, and it can cause a delay of nearly 5 minutes > to complete one out_of_memory() call. example: https://marc.info/?l=linux-kernel&m=149509270422321 printk, to me, is a debugging/diagnostics tool. and we can't fully rely on it, even we do reasonable things, like OOM print out. moreover, I think, to some extent, due to printk imperfections, the more debugging options we enable (CONFIG_DEBUG_PREEMPT, CONFIG_DEBUG_SPINLOCK, etc.) the less stable the kernel, potentially, gets. because those options use printk() to report the problems. so might_sleep() or spin_dump() called from "a wrong place" can eventually deadlock printk() and the system. example: https://marc.info/?l=linux-kernel&m=149007148320611 well, just my thoughts. -ss From dhowells at redhat.com Fri Jul 21 13:41:39 2017 From: dhowells at redhat.com (David Howells) Date: Fri, 21 Jul 2017 14:41:39 +0100 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> Message-ID: <7884.1500644499@warthog.procyon.org.uk> Linus Torvalds wrote: > But every time it comes up people ignore this basic issue: > > [torvalds at i7 linux]$ git grep -e '-E[A-Z]\{4\}' | wc -l > 182523 > > > Give it up. It's really is a horrible idea for so many reasons. Are you okay with me making it possible to retrieve mount errors, warnings and informational messages through fd-arbitrated-mount I'm working on? For example (and skipping some of the parameters for brevity): int fs_fd; static inline void e(int x) { char buf[1024]; int i; if (x == -1) fprintf(stderr, "Mount error: %m\n"); /* Read back any messages */ while (i = read(fs_fd, buf), i != -1) { buf[i] = 0; fprintf(stderr, "%s\n", buf); } if (x == -1) exit(1); } fs_fd = fsopen("ext4"); e(write(fs_fd, "d /dev/sda3")); e(write(fs_fd, "o user_xattr")); e(write(fs_fd, "o acl")); e(write(fs_fd, "o data=ordered")); e(write(fs_fd, "x create")); e(fsmount(fs_fd, AT_FDCWD, "/mnt", MS_NODEV)); close(fs_fd); David From mathieu.desnoyers at efficios.com Fri Jul 21 21:45:57 2017 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Fri, 21 Jul 2017 21:45:57 +0000 (UTC) Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <20170706151008.24addd2b@gandalf.local.home> References: <20170629195537.534445e7@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> <20170706151008.24addd2b@gandalf.local.home> Message-ID: <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> ----- On Jul 6, 2017, at 3:10 PM, rostedt rostedt at goodmis.org wrote: > On Fri, 30 Jun 2017 18:37:59 +0000 > Josef Bacik wrote: > >> [ I forgot to add Tom to the Cc list. Sending again. ] >> >> On Fri, 30 Jun 2017 14:29:56 -0400 >> Steven Rostedt wrote: >> >> > On Fri, 30 Jun 2017 18:24:12 +0000 >> > Josef Bacik wrote: >> > >> > > Yup I?ll start bugging people to submit talk proposals, starting with you! I?ll >> > > put up my proposal in the next day or two, I think Brendan has something he?s >> > > going to talk about. Thanks, >> > >> > I shouldn't have used the term "talk", as it really is all about >> > discussions. In fact, if you need more than one slide, you have too >> > many. >> > >> > That said, I could probably come up with a few things, starting with >> > this trace event issue. But it will be pointless if Peter Zijlstra and >> > Mathieu are not there. >> > >> > But having ideas about dynamic fields in tracepoints is always >> > interesting. Not to mention talking about Tom Zanussi's latest >> > histogram work. It may be pretty much completed, but I would like to >> > discuss where we go from there. >> > >> > One last thing. I don't want to have too many responsibilities, as I'm >> > on the LPC program committee and I need to make sure I have time to >> > fulfill any action items I'm responsible for during the conference. >> > >> >> Yeah plumbers is a weird venue for tracing, I always hope that we are >> going to have people like Brendan or other sysadmin-y people show up >> and say ?this is what sucks about tracing, please fix it?, and then >> we can go fix it. It doesn?t really seem to happen that way tho, and >> for things like tracing ABI there just aren?t the right people in the >> room to have that kind of discussion. My proposal was just going to >> be a laundry list of things that would make my life easier, but it >> doesn?t really warrant a full micro-conference to listen to me bitch >> for an hour. If it turns out nobody else has much to talk about then >> we can just declare tracing is feature complete and we can talk about >> something else ;). Thanks, >> > > At this rate, I'm guessing that Tracing is not going to be on the > Plumbers' agenda. Since the Kernel Summit and Plumbers do not seem like a good fit to have discussions involving both tracing end users and developers, we have adapted the Tracing Summit schedule to have half day of the usual presentations, and half day dedicated to such discussions. Steven has volunteered to run the discussion part. The Tracing Summit will take place on October 27th in Prague, on the Friday right after Kernel Summit. So if you have tracing topics that you would like to discuss at this event, the CFP/CFD and all the information are available here: http://tracingsummit.org/wiki/TracingSummit2017 Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com From James.Bottomley at HansenPartnership.com Fri Jul 21 23:15:14 2017 From: James.Bottomley at HansenPartnership.com (James Bottomley) Date: Fri, 21 Jul 2017 16:15:14 -0700 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> References: <20170629195537.534445e7@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> <20170706151008.24addd2b@gandalf.local.home> <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> Message-ID: <1500678914.2900.77.camel@HansenPartnership.com> On Fri, 2017-07-21 at 21:45 +0000, Mathieu Desnoyers wrote: > > ----- On Jul 6, 2017, at 3:10 PM, rostedt rostedt at goodmis.org wrote: > > > > > On Fri, 30 Jun 2017 18:37:59 +0000 > > Josef Bacik wrote: > > > > > > > > [ I forgot to add Tom to the Cc list. Sending again. ] > > > > > > On Fri, 30 Jun 2017 14:29:56 -0400 > > > Steven Rostedt wrote: > > > > > > > > > > > On Fri, 30 Jun 2017 18:24:12 +0000 > > > > Josef Bacik wrote: > > > > ?? > > > > > > > > > > Yup I?ll start bugging people to submit talk proposals, > > > > > starting with you!??I?ll put up my proposal in the next day > > > > > or two, I think Brendan has something he?s going to talk > > > > > about.??Thanks, > > > > > > > > I shouldn't have used the term "talk", as it really is all > > > > about discussions. In fact, if you need more than one slide, > > > > you have too many. > > > > > > > > That said, I could probably come up with a few things, starting > > > > with this trace event issue. But it will be pointless if Peter > > > > Zijlstra and Mathieu are not there. > > > > > > > > But having ideas about dynamic fields in tracepoints is always > > > > interesting. Not to mention talking about Tom Zanussi's latest > > > > histogram work. It may be pretty much completed, but I would > > > > like to discuss where we go from there. > > > > > > > > One last thing. I don't want to have too many responsibilities, > > > > as I'm on the LPC program committee and I need to make sure I > > > > have time to fulfill any action items I'm responsible for > > > > during the conference. > > > > ?? > > > > > > Yeah plumbers is a weird venue for tracing, I always hope that we > > > are going to have people like Brendan or other sysadmin-y people > > > show up and say ?this is what sucks about tracing, please fix > > > it?, and then we can go fix it.??It doesn?t really seem to happen > > > that way tho, and for things like tracing ABI there just aren?t > > > the right people in the room to have that kind of discussion.??My > > > proposal was just going to be a laundry list of things that would > > > make my life easier, but it doesn?t really warrant a full micro- > > > conference to listen to me bitch for an hour.??If it turns out > > > nobody else has much to talk about then we can just declare > > > tracing is feature complete and we can talk about something else > > > ;).??Thanks, > > > > > > > At this rate, I'm guessing that Tracing is not going to be on the > > Plumbers' agenda. > > Since the Kernel Summit and Plumbers do not seem like a good fit to > have discussions involving both tracing end users and developers First the disclaimer: being on the Plumbers Programme Committee, I'm biased. ?However, I have to say that the design of Plumbers is to bring together everyone interested in the plumbing of Linux. ?That means end users as well, so it's not correct to say it's not a good fit. It also looks like there's been some renewed interest in having a Tracing MC at Plumbers, so my best guess now is that it will happen. ?That's not to say the two events can't easily co-exist: being on different continents means better opportunities for attendees with international travel restrictions. James From rostedt at goodmis.org Sun Jul 23 21:25:14 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Sun, 23 Jul 2017 17:25:14 -0400 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <40F38E70-C173-463F-99AF-099927AC63E4@fb.com> References: <20170629195537.534445e7@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> <20170706151008.24addd2b@gandalf.local.home> <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> <37EF4BA7-FC04-4580-8AD8-28E4C384DA88@goodmis.org> <40F38E70-C173-463F-99AF-099927AC63E4@fb.com> Message-ID: <20170723172514.564ed7c5@gandalf.local.home> On Sun, 23 Jul 2017 16:24:09 +0000 Josef Bacik wrote: > Do we want to talk about ABI at the micro conference? Facebook uses > tracing everywhere in production so I can talk about it from both a > user and maintainer standpoint. Thanks, Yes, please add that to the wiki. -- Steve From rostedt at goodmis.org Sat Jul 22 02:18:25 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Fri, 21 Jul 2017 22:18:25 -0400 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> References: <20170629195537.534445e7@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> <20170706151008.24addd2b@gandalf.local.home> <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com> Message-ID: <37EF4BA7-FC04-4580-8AD8-28E4C384DA88@goodmis.org> Actually, Brendan Gregg got enough proposals together and there will be a tracing MC at Plumbers this year. -- Steve On July 21, 2017 5:45:57 PM EDT, Mathieu Desnoyers wrote: > > >----- On Jul 6, 2017, at 3:10 PM, rostedt rostedt at goodmis.org wrote: > >> On Fri, 30 Jun 2017 18:37:59 +0000 >> Josef Bacik wrote: >> >>> [ I forgot to add Tom to the Cc list. Sending again. ] >>> >>> On Fri, 30 Jun 2017 14:29:56 -0400 >>> Steven Rostedt wrote: >>> >>> > On Fri, 30 Jun 2017 18:24:12 +0000 >>> > Josef Bacik wrote: >>> > >>> > > Yup I?ll start bugging people to submit talk proposals, starting >with you! I?ll >>> > > put up my proposal in the next day or two, I think Brendan has >something he?s >>> > > going to talk about. Thanks, >>> > >>> > I shouldn't have used the term "talk", as it really is all about >>> > discussions. In fact, if you need more than one slide, you have >too >>> > many. >>> > >>> > That said, I could probably come up with a few things, starting >with >>> > this trace event issue. But it will be pointless if Peter Zijlstra >and >>> > Mathieu are not there. >>> > >>> > But having ideas about dynamic fields in tracepoints is always >>> > interesting. Not to mention talking about Tom Zanussi's latest >>> > histogram work. It may be pretty much completed, but I would like >to >>> > discuss where we go from there. >>> > >>> > One last thing. I don't want to have too many responsibilities, as >I'm >>> > on the LPC program committee and I need to make sure I have time >to >>> > fulfill any action items I'm responsible for during the >conference. >>> > >>> >>> Yeah plumbers is a weird venue for tracing, I always hope that we >are >>> going to have people like Brendan or other sysadmin-y people show up >>> and say ?this is what sucks about tracing, please fix it?, and then >>> we can go fix it. It doesn?t really seem to happen that way tho, >and >>> for things like tracing ABI there just aren?t the right people in >the >>> room to have that kind of discussion. My proposal was just going to >>> be a laundry list of things that would make my life easier, but it >>> doesn?t really warrant a full micro-conference to listen to me bitch >>> for an hour. If it turns out nobody else has much to talk about >then >>> we can just declare tracing is feature complete and we can talk >about >>> something else ;). Thanks, >>> >> >> At this rate, I'm guessing that Tracing is not going to be on the >> Plumbers' agenda. > >Since the Kernel Summit and Plumbers do not seem like a good fit to >have >discussions involving both tracing end users and developers, we have >adapted the Tracing Summit schedule to have half day of the usual >presentations, and half day dedicated to such discussions. Steven has >volunteered to run the discussion part. > >The Tracing Summit will take place on October 27th in Prague, on the >Friday right after Kernel Summit. > >So if you have tracing topics that you would like to discuss at this >event, the CFP/CFD and all the information are available here: > >http://tracingsummit.org/wiki/TracingSummit2017 > >Thanks, > >Mathieu -- Sent from my Android device with K-9 Mail. Please excuse my brevity. From jbacik at fb.com Sun Jul 23 16:24:09 2017 From: jbacik at fb.com (Josef Bacik) Date: Sun, 23 Jul 2017 16:24:09 +0000 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <37EF4BA7-FC04-4580-8AD8-28E4C384DA88@goodmis.org> References: <20170629195537.534445e7@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <6AE378F0-42F7-45DE-9F3C-050A5019A1E8@fb.com> <20170630142956.7e0cb2d6@gandalf.local.home> <20170630143030.305b68a0@gandalf.local.home> <658A3F80-5E48-4EC4-A591-E3783AD3DADC@fb.com> <20170706151008.24addd2b@gandalf.local.home> <1188050494.22035.1500673557010.JavaMail.zimbra@efficios.com>, <37EF4BA7-FC04-4580-8AD8-28E4C384DA88@goodmis.org> Message-ID: <40F38E70-C173-463F-99AF-099927AC63E4@fb.com> Do we want to talk about ABI at the micro conference? Facebook uses tracing everywhere in production so I can talk about it from both a user and maintainer standpoint. Thanks, Josef Sent from my iPhone > On Jul 23, 2017, at 11:49 AM, Steven Rostedt wrote: > > Actually, Brendan Gregg got enough proposals together and there will be a tracing MC at Plumbers this year. > > -- Steve > > >> On July 21, 2017 5:45:57 PM EDT, Mathieu Desnoyers wrote: >> >> >> ----- On Jul 6, 2017, at 3:10 PM, rostedt rostedt at goodmis.org wrote: >> >>> On Fri, 30 Jun 2017 18:37:59 +0000 >>> Josef Bacik wrote: >>> >>>> [ I forgot to add Tom to the Cc list. Sending again. ] >>>> >>>> On Fri, 30 Jun 2017 14:29:56 -0400 >>>> Steven Rostedt wrote: >>>> >>>>> On Fri, 30 Jun 2017 18:24:12 +0000 >>>>> Josef Bacik wrote: >>>>> >>>>>> Yup I?ll start bugging people to submit talk proposals, starting >> with you! I?ll >>>>>> put up my proposal in the next day or two, I think Brendan has >> something he?s >>>>>> going to talk about. Thanks, >>>>> >>>>> I shouldn't have used the term "talk", as it really is all about >>>>> discussions. In fact, if you need more than one slide, you have >> too >>>>> many. >>>>> >>>>> That said, I could probably come up with a few things, starting >> with >>>>> this trace event issue. But it will be pointless if Peter Zijlstra >> and >>>>> Mathieu are not there. >>>>> >>>>> But having ideas about dynamic fields in tracepoints is always >>>>> interesting. Not to mention talking about Tom Zanussi's latest >>>>> histogram work. It may be pretty much completed, but I would like >> to >>>>> discuss where we go from there. >>>>> >>>>> One last thing. I don't want to have too many responsibilities, as >> I'm >>>>> on the LPC program committee and I need to make sure I have time >> to >>>>> fulfill any action items I'm responsible for during the >> conference. >>>>> >>>> >>>> Yeah plumbers is a weird venue for tracing, I always hope that we >> are >>>> going to have people like Brendan or other sysadmin-y people show up >>>> and say ?this is what sucks about tracing, please fix it?, and then >>>> we can go fix it. It doesn?t really seem to happen that way tho, >> and >>>> for things like tracing ABI there just aren?t the right people in >> the >>>> room to have that kind of discussion. My proposal was just going to >>>> be a laundry list of things that would make my life easier, but it >>>> doesn?t really warrant a full micro-conference to listen to me bitch >>>> for an hour. If it turns out nobody else has much to talk about >> then >>>> we can just declare tracing is feature complete and we can talk >> about >>>> something else ;). Thanks, >>>> >>> >>> At this rate, I'm guessing that Tracing is not going to be on the >>> Plumbers' agenda. >> >> Since the Kernel Summit and Plumbers do not seem like a good fit to >> have >> discussions involving both tracing end users and developers, we have >> adapted the Tracing Summit schedule to have half day of the usual >> presentations, and half day dedicated to such discussions. Steven has >> volunteered to run the discussion part. >> >> The Tracing Summit will take place on October 27th in Prague, on the >> Friday right after Kernel Summit. >> >> So if you have tracing topics that you would like to discuss at this >> event, the CFP/CFD and all the information are available here: >> >> https://urldefense.proofpoint.com/v2/url?u=http-3A__tracingsummit.org_wiki_TracingSummit2017&d=DwIFaQ&c=5VD0RTtNlTh3ycd41b3MUw&r=sDzg6MvHymKOUgI8SFIm4Q&m=i0WUwxGtMNqWf0sUfcJGu8mjy4EmALhzGj4FSSAj_10&s=uSqVNCjkvfgDy8m4bV0fRhLxXZzd2b5MOnzIs5uAugM&e= >> >> Thanks, >> >> Mathieu > > -- > Sent from my Android device with K-9 Mail. Please excuse my brevity. From miklos at szeredi.hu Mon Jul 24 07:55:19 2017 From: miklos at szeredi.hu (Miklos Szeredi) Date: Mon, 24 Jul 2017 09:55:19 +0200 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: <20170719090239.39f031c5@gandalf.local.home> References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> <20170719090239.39f031c5@gandalf.local.home> Message-ID: On Wed, Jul 19, 2017 at 3:02 PM, Steven Rostedt wrote: > On Wed, 12 Jul 2017 09:19:55 -0700 > Linus Torvalds wrote: > >> On Wed, Jul 12, 2017 at 8:21 AM, Stephen Hemminger >> wrote: >> > >> > Netlink has recently got extended error reporting, still not used widely >> > and library support is lacking in most places. >> >> Yeah, and that "not widely supported and library support is lacking" >> is always going to be an issue with anything like that. >> >> Along with internationalization, which is a whole nasty set of issues >> in itself with error messages. >> >> It's not going to happen, in other words. The problems are basically >> insurmountable, and the thing it fixes will always be some special >> case that doesn't much matter. >> >> Every time it comes up it is because some developer found one case >> that they were hunting down and it annoyed them, and the developer >> went "if only it had included more information and it would have been >> obvious". >> >> But every time it comes up people ignore this basic issue: >> >> [torvalds at i7 linux]$ git grep -e '-E[A-Z]\{4\}' | wc -l >> 182523 >> > > Note a lot of those -E* are not going to user space. Some are in > comments, and some are used internally. I use them to pass back > information to other kernel only routines, as some errors are more > critical than others. a) it wouldn't have to be for every error b) kernel prints detailed error in dmesg anyway, why not allow that info to be bound to the syscall that triggered the error? c) internationalization can be solved at the level where it matters (NOT in the kernel) My suggestion was to keep the kernel interface really simple, e.g.: return detailed_error(-EINVAL, "failure to do foo because of bar"); What are the insurmountable issues you are talking about? Thanks, Miklos From dhowells at redhat.com Mon Jul 24 08:25:17 2017 From: dhowells at redhat.com (David Howells) Date: Mon, 24 Jul 2017 09:25:17 +0100 Subject: [Ksummit-discuss] [TECH TOPIC] Getting better/supplementary error info back to userspace In-Reply-To: References: <10144.1499863410@warthog.procyon.org.uk> <12463.1499871476@warthog.procyon.org.uk> <20170712082139.17cfd33a@xeon-e3> <20170719090239.39f031c5@gandalf.local.home> Message-ID: <25485.1500884717@warthog.procyon.org.uk> Miklos Szeredi wrote: > My suggestion was to keep the kernel interface really simple, e.g.: > > return detailed_error(-EINVAL, "failure to do foo because of bar"); That's what I was thinking of, though I'd prefix the string with a source tag, such as "nfs", "vfs" or "dvb-core". David From mathieu.desnoyers at efficios.com Thu Jul 27 14:35:45 2017 From: mathieu.desnoyers at efficios.com (Mathieu Desnoyers) Date: Thu, 27 Jul 2017 14:35:45 +0000 (UTC) Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <20170629232016.4cde203e@gandalf.local.home> References: <20170629195537.534445e7@gandalf.local.home> <20170629212750.5c3542ee@gandalf.local.home> <20170629221245.489760b1@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <20170629232016.4cde203e@gandalf.local.home> Message-ID: <292206664.28374.1501166145927.JavaMail.zimbra@efficios.com> ----- On Jun 29, 2017, at 11:20 PM, rostedt rostedt at goodmis.org wrote: > On Thu, 29 Jun 2017 23:02:51 -0400 > Steven Rostedt wrote: > >> On Thu, 29 Jun 2017 19:58:54 -0700 >> Alexei Starovoitov wrote: >> >> >> > Also I'm not planning to fly to Prague just for tracing discussion. >> > There is netdev2.2 right after in Seoul. >> > And tracing microconf at plumbers in September which is imo better >> > suited to discuss tracing related topics. >> >> Which reminds me. The LPC Tracing Microconf WIKI has been stale, and not >> moving at all. If it is to be accepted, it needs some talk proposals, >> and fast! > > Also note, Mathieu has stated he wont be attending Plumbers, and I'm > not sure Peter will be either as he has smaller things to attend to. I take it back. Work permit delays postpone my conflicting house renovation work, so I will likely be able to make it to LPC finally. :) Thanks, Mathieu -- Mathieu Desnoyers EfficiOS Inc. http://www.efficios.com From rostedt at goodmis.org Thu Jul 27 15:57:20 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Thu, 27 Jul 2017 11:57:20 -0400 Subject: [Ksummit-discuss] [TECH TOPIC] Pulling away from the tracing ABI quicksands In-Reply-To: <292206664.28374.1501166145927.JavaMail.zimbra@efficios.com> References: <20170629195537.534445e7@gandalf.local.home> <20170629212750.5c3542ee@gandalf.local.home> <20170629221245.489760b1@gandalf.local.home> <20170630025852.xjoif3aai6rny5a2@ast-mbp> <20170629230251.02f380cb@gandalf.local.home> <20170629232016.4cde203e@gandalf.local.home> <292206664.28374.1501166145927.JavaMail.zimbra@efficios.com> Message-ID: <20170727115720.521f06aa@vmware.local.home> On Thu, 27 Jul 2017 14:35:45 +0000 (UTC) Mathieu Desnoyers wrote: > > Also note, Mathieu has stated he wont be attending Plumbers, and I'm > > not sure Peter will be either as he has smaller things to attend to. > > I take it back. Work permit delays postpone my conflicting house > renovation work, so I will likely be able to make it to LPC finally. :) Great! I see you updated the Wiki that you are attending as well. http://wiki.linuxplumbersconf.org/2017:tracing Thanks, looking forward in seeing you there. -- Steve From ebiederm at xmission.com Mon Jul 31 16:54:45 2017 From: ebiederm at xmission.com (Eric W. Biederman) Date: Mon, 31 Jul 2017 11:54:45 -0500 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170705130200.7c653f61@gandalf.local.home> (Steven Rostedt's message of "Wed, 5 Jul 2017 13:02:00 -0400") References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> Message-ID: <87zibkzgve.fsf@xmission.com> Steven Rostedt writes: > On Wed, 5 Jul 2017 09:48:31 -0700 > Guenter Roeck wrote: > >> On 07/05/2017 08:27 AM, Steven Rostedt wrote: >> > On Wed, 5 Jul 2017 08:16:33 -0700 >> > Guenter Roeck wrote: >> [ ... ] >> >> >> >> If we start shaming people for not providing unit tests, all we'll accomplish is >> >> that people will stop providing bug fixes. >> > >> > I need to be clearer on this. What I meant was, if there's a bug >> > where someone has a test that easily reproduces the bug, then if >> > there's not a test added to selftests for said bug, then we should >> > shame those into doing so. >> > >> >> I don't think that public shaming of kernel developers is going to work >> any better than public shaming of children or teenagers. >> >> Maybe a friendlier approach would be more useful ? > > I'm a friendly shamer ;-) > >> >> If a test to reproduce a problem exists, it might be more beneficial to suggest >> to the patch submitter that it would be great if that test would be submitted >> as unit test instead of shaming that person for not doing so. Acknowledging and >> praising kselftest submissions might help more than shaming for non-submissions. >> >> > A bug that is found by inspection or hard to reproduce test cases are >> > not applicable, as they don't have tests that can show a regression. >> > >> >> My concern would be that once the shaming starts, it won't stop. > > I think this is a communication issue. My word for "shaming" was to > call out a developer for not submitting a test. It wasn't about making > fun of them, or anything like that. I was only making a point > about how to teach people that they need to be more aware of the > testing infrastructure. Not about actually demeaning people. > > Lets take a hypothetical sample. Say someone posted a bug report with > an associated reproducer for it. The developer then runs the reproducer > sees the bug, makes a fix and sends it to Linus and stable. Now the > developer forgets this and continues on their merry way. Along comes > someone like myself and sees a reproducing test case for a bug, but > sees no test added to kselftests. I would send an email along the lines > of "Hi, I noticed that there was a reproducer for this bug you fixed. > How come there was no test added to the kselftests to make sure it > doesn't appear again?" There, I "shamed" them ;-) I just want to point out that kselftests are hard to build and run. As I was looking at another issue I found a bug in one of the tests. It had defined a constant wrong. I have a patch. It took me a week of poking at the kselftest code and trying one thing or another (between working on other things) before I could figure out which combination of things would let the test build and run. Until kselftests get easier to run I don't think they are something we want to push to hard. Eric From rostedt at goodmis.org Mon Jul 31 20:11:23 2017 From: rostedt at goodmis.org (Steven Rostedt) Date: Mon, 31 Jul 2017 16:11:23 -0400 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <87zibkzgve.fsf@xmission.com> References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <87zibkzgve.fsf@xmission.com> Message-ID: <20170731161123.4d1e80ac@gandalf.local.home> On Mon, 31 Jul 2017 11:54:45 -0500 ebiederm at xmission.com (Eric W. Biederman) wrote: > Until kselftests get easier to run I don't think they are something we > want to push to hard. Then perhaps we should push making them easier to run. -- Steve From ebiederm at xmission.com Mon Jul 31 20:12:46 2017 From: ebiederm at xmission.com (Eric W. Biederman) Date: Mon, 31 Jul 2017 15:12:46 -0500 Subject: [Ksummit-discuss] [MAINTAINERS SUMMIT] & [TECH TOPIC] Improve regression tracking In-Reply-To: <20170731161123.4d1e80ac@gandalf.local.home> (Steven Rostedt's message of "Mon, 31 Jul 2017 16:11:23 -0400") References: <576cea07-770a-4864-c3f5-0832ff211e94@leemhuis.info> <20170703123025.7479702e@gandalf.local.home> <20170705084528.67499f8c@gandalf.local.home> <4080ecc7-1aa8-2940-f230-1b79d656cdb4@redhat.com> <20170705092757.63dc2328@gandalf.local.home> <20170705140607.GA30187@kroah.com> <20170705112707.54d7f345@gandalf.local.home> <20170705130200.7c653f61@gandalf.local.home> <87zibkzgve.fsf@xmission.com> <20170731161123.4d1e80ac@gandalf.local.home> Message-ID: <87o9s0z7pd.fsf@xmission.com> Steven Rostedt writes: > On Mon, 31 Jul 2017 11:54:45 -0500 > ebiederm at xmission.com (Eric W. Biederman) wrote: > >> Until kselftests get easier to run I don't think they are something we >> want to push to hard. > > Then perhaps we should push making them easier to run. Please. Eric