I use tuned for my Debian and Ubuntu VPSs that run real-time apps, and it seems to work well. Simpler than me toggling kernel parameters (also known as sysctl settings or kernel tunables) myself.
I think it worth noting briefly that almost everything discussed in TFA concerns bandwidth (network, disk, other I/O, CPU) and not latency. That's understandable, because performance for a lot of people is about bandwidth. But there are some of us for whom bandwidth definitely takes a back seat, and you need a different set of tools for tuning latency in Linux.
That’s a pretty reasonable point. It’s much easier to increase bandwidth than latency too[1] so often caring about latency can be important.
0. Another thing people may want to optimise for is performance per watt, but I won’t say much more about it.
1. There are cases where bandwidth optimisations are latency optimisations, eg if you can fit more of your processes onto one box, you are reducing the average distance between the processes and whatever they talk to and hence average latency
2. A very obvious thing to do when optimising for latency is increase bandwidth enough that the bandwidth doesn’t throttle you
3. I feel like mostly if you are aggressively optimising latency, there isn’t much Linux tuning to do. Maybe I’m wrong – I don’t really know much about this – but I think it’s mostly pinning to a core, running tickles, doing user space networking, and then hardwarey things like tuning page size, SMT, power-saving settings, and other things like choice of hardware.
Not the person you asked, but generally you might want to look at "frame-based" profilers. These are typically used in video games, but the concept is general, and can apply to other applications. The "frame" could also be something like a request or transaction being processed. I like Tracy[1], myself.
Another latency metric that you'll see, often w/respect to web apps and microservices is "P99" and similar. This is the amount of time in which 99% of requests get their response. For a higher percentile, you get a better idea of worst-case performance.
With the caution that if your goal is to reduce latency for the realtime properties of your system, chances are that you can turn any number of different knobs on your out of the box Linux distribution but it will not result in a satisfactory system.
As of Linux 6.5, the scheduler understands that when one SMT "core" is busy, that means it might not be the best idea to schedule something on the the other "core", since it's really just a single core with a very low cost context switch. This makes certain very-parallel things noticeably snappier for me, and I can see it on the CPU usage graphs.
If you are of a mind to change a tuneable parameter yet cannot tell me why this tuneable will have the desired effect, or why it is so-set in the first place, then I will not allow you to change it (in prod).
Even in prod (sometimes) there is no other way to "tell me if it will have the desired effect". Changing something to see what happens, even in production, is not automatically wrong. It's not automatically right either. It's almost like there is no thoughtless simple rule that is always right.
Nicely done Brendan! Thank you, knowing Brendan’s work with eBPF, I take this as a way to more easily monitor and assess performance under different types of performance. Tweaking/tuning comes with trade offs, and I usually end up optimizing one thing to the detriment of others..
Side note, I’ve found btop a super useful replacement for glances, to have an all-in-one TUI view of system performance and loading. Wonder how much those dev(s) are leveraging this, and whether anything out there’s motivated to build better TUI monitoring tools.
Every server I go on, first thing is, start up tmux, dedicate one window to btop.
For me, "tuning" linux for performance = disabling spectre/meltdown mitigations (in this case compute nodes are running in a VPC with no internet access, so seems pretty low risk)
Depends on what CPU you are running, on Zen 4 it's not supported to disable the mitigations and caused bugs/crashes. I think they did fix that exact crash but I'd still not recommend it. New CPUs from both AMD and Intel are designed to be run with at least the default mitigations on.
Bookmarked! This will be useful to me soon for something I'm working on.
I haven't read all the slides yet but one thing I was wondering was if you ever found any significant performance increases from kernel build options. In my Gentoo days when I would play around with build flags I would change kernel Makefile to use -O3 and apply a patch for -march=native. In hindsight, looking at some Phoronix benchmarks it appears this is actually harmful to a number of workloads. Curious if you ever found any cases otherwise.
Great site! I kind of have a predisposition to summarize linux performance, be it tuning or monitoring, taking a deep breath…
This is such a depth subject, with a long list of variety of observability tools. At minimum, make sure you know deeply uptime, dmesg, and iostat. These are your friends to give you a glimpse into various system aspects like load, memory, CPU, and more, enabling a diagnostic overview of system health. This is what I call, the “let me take a look at it” check list, 1st page of 100!
When emphasizing methodologies for performance analysis I recommend careful benchmarking to holistically evaluate system behavior and workload characteristics. with before and after scenarios. Make smaller changes first, then gradually compound what you think will provide benefits. Remember, labs and production never behave the same.
This is where it gets tricky, CPU profiling with tools like “perf” and visual aids like flame graphs enable targeted analysis of CPU activity, along with tracking hardware events to optimize computational efficiency. You need to know more than “it’s the app man, was fine until the latest release from development”
When you are the admin and speaking to a developer; Linux, tools like ftrace and BPF come into play, allowing for detailed tracking of kernel function execution and system calls, which can be vital in troubleshooting and performance optimization. You can also be the developer, varying the admin’s intuition… as the saying goes, trust but verify.
When it’s your code, then you better know BPF! It not only facilitates efficient in-kernel tracing but also propels the development of advanced custom profiling tools through bcc and bpftrace, offering deeper insights into system performance.
Last comment, it’s %$$% hard! Tuning means you need to navigate through adjusting a myriad of system components and kernel parameters, from CPUs and memory to network settings, aiming to optimize performance and reliability across various system workloads, else you can blame it on the network! :D
Really, you need to have a good behavioral attitude at change management, as chasing code or kernel parameters could be a daunting task that just overwhelms everyone in a moment where you might be time constrained and the preasure could lead to a higher degree of human errors.
Current kernel and current distro tuning is almost always folly unless there's a specific issue you're trying to work around.
Trying to squeeze a little more juice out of something is bound to come at the detriment of something else, or worse, break something else in unexpected ways.
Basically, if the tunables aren't obvious in whatever default config you're using, the issue isn't in that config, it's that you're asking too much of your hardware and just need better hardware.
That's .. yeah, that's completely false. I can think of dozens of things that are not right out of the box on any distro, on any hardware, in common practice. For example suppose you roll out Ubuntu on an EC2 instance, say a c6i.16xlarge, a 32C/64T, single-socket x86 server with a Nitro ENA. Where are the netrx interrupts delivered? Is RSS on/working? RPS, XPS? Interrupt coalescing? The distro can't make optimal choices for all use cases, but what they ship by default is a config that's not optimal for any use case. Literally nobody would consciously choose the defaults after thinking it over.
that is the free/open bsd (from what I know) thinking path and I love there are linux people that think on that terms too. not every year is the Linux on the Desktop year! :D
> Current kernel and current distro tuning is almost always folly unless there's a specific issue you're trying to work around.
Of course there's no reason to tune if stock works fine. Plenty of people buy or a rent a reasonable computer and it has more than enough capacity for their work with default tuning. That's fine.
But when you run out of CPU or memory or X, it's often a good idea to see if there's reasonable things you can do to get more out of the hardware you already have. Depending on what you're doing, there's often a lot of room for improvement.
For some networking tasks, doing proper alignment of threads and work with Receive Side Scaling or similar can make a tremendous improvement in capacity versus naive threads. In some environments, the bandwidth costs when you're using enough capacity to see that mean that machine costs of doing it well versus naively don't matter, so you may as well do it naively and spend your engineering time elsewhere. In other environments, gettingthe same work done with 10% of the nodes is valuable.
Also, in many cases, better hardware needs more tuning, rather than less. You don't need to spend a lot of time avoiding cross core communication on an 8-core single socket machine. But if you get a dual-socket, 128-core per socket machine and you're not careful about cross socket communication, you'll spend a lot of CPU on memory arbitration (which you'll have to know or learn how to look for)
Unfortunately for me, I have few times suffered with performance issues, and have not found good deep resource fast enough. For example, few times need to recompile FFMPEG or Unreal Engine, and have spent weeks for things, which on my hardware done in hours.
Now bookmarked this immediately.
Have not read deep, but from first view look good!
Often, if you tune up your settings for performance and interactivity, I/O will suffer, and viceversa. Your beast serving/copying tons of data concurrently might not be the best one to play that Vulkan/GL 4.5 game without frequent slowdowns.
I wish there were optimization scripts based on your use case like: web server, database, etc … that would turn off unnecessary services & tune settings appropriately.
The issue with performance is that junior sysadmins and developers start flipping knobs thinking that the defaults are somehow holding them back.
The truth is usually there are tradeoffs and the defaults fit a broad general case.
If you want throughput there are tunables for that, if you want low latency then usually those are inversely correlated. Same for tuning for low data loss after failure and so on.
You have to spend time learning the tradeoffs, which sysadmins used to do- now nobody has time as they have been munged into one role at many places.
This broad case is much broader than the typical servers that Linux is often used for. A simple example is file access times on a (database) server. These are largely unnecessary. It's even rare for a desktop user to actually look at these.
In the past I've been a sr dev (but a jr sysadmin) and was tasked with improving the performance of an upgraded database server. The problem turned out to be with NUMA on the larger server which a combination of reading and semi-random config fiddling of both Linux and MySQL parameters (plus a bit of BIOS/CMOS tweaking) brought up to expected levels.
There's often no better way than learning on the job as there is so much to know that you can't simply learn them up front for when you will need it. What we can do is learn what there is to know and remember to look into those if it seems relevant. I mean everyone who's well experienced now probably started config twiddling somewhere to get there.
There's a right and a wrong way to learn though. The wrong way, which is the one I most often see is: smash keys until it appears to work w/o every bothering to try to understand the deeper problem or why some particular combination of key smashing appears to work. This becomes cargo culting over time. Stack Overflow and ChatGPT make this oh so much worse. Often the change then get committed with a useless message such as "make thing work" w/o any explanation of what was broken, what the change fixed, and why or how. It becomes instant tech debt.
The right way is to follow the scientific method. Collect data. Make a hypothesis with a plausible mechanism of action. Test the hypothesis. Arrive at a solution. Record how you arrived at your new found knowledge so that those who follow you understand why you made the change. The person who follows you is your future self as often as not.
To give context for those who don't know: relatime is a relatively (I'm a dinosaur) newer introduction into the Linux kernel; mounting partitions noatime used to be the only alternative to the default which was updating the atime on every access, thus fiddling with that setting used to be important. Not any more.
BTW I used to continue to mount everything noatime anyway, since having the atime field be set upon file creation and not anymore afterwards was a way to get file creation time, and I found that more useful than the access time. This isn't necessary either anymore since the introduction of an actual file creation time.
The truth is that often nobody thinks too much of defaults, unless they are horribly wrong. So there are two good things about defaults: they're probably not horribly wrong and they don't require any additional work.
Some defaults are just historical curiosities, some defaults were configured 20 years ago and nobody took the crusade to update them, some might be bad, but changing them would break too much stuff in the wild.
Now I'm not suggesting that everyone should change everything. I almost never change defaults, myself. But I just don't agree than defaults are good. They're probably not bad, and that's about it.
Not GP, but no, it doesn't. It appears to work but then side effects get much worse. Taking the time to understand and to tweak the right settings, I mean, explain, is much better in the long run.
though i dont know much about it, i run linux and most of the documents i hear about tuning edge cases are now either avoided or boilerplated with a strong word of caution. For example: 10Gb ethernet used to require a litany of sysctl.conf chicanery to even approach half the line speed. not the case anymore, and most of the old kernel 2.4 optimisations are either nonsensical in 2023 or actively worsen the performance of the interface.
> The truth is usually there are tradeoffs and the defaults fit a broad general case.
Part of the problem is that reasonable defaults for performance is a somewhat new phenomenon. It used to be that the defaults for Linux kernel settings, Apache, MySQL, etc... were terrible for production use.
So there's a lot of history that "you have to change them" burned into people minds, documents, etc.
Yeah, they fit the general case of a MIPS R4400 with a 1mbps network adapter, a situation that nobody faces today. I think the most glaring example is `rmem_max` which is never sufficient to support a coast-to-coast 1gbps flow, and every individual Linux user in history has needed to independently discover this stupid sysctl.
I wouldn't say that that's a performance tuning, seniority, or even smart people problem. I've seen smart people blindy tuning knobs. Similarly, switching to an expensive O(1) algorithm for a tiny space. It beats me, but I think the problems are closer to lazyness or trying to discredit something.
Gregg's book "Systems Performance" was a real game changer for me. Helped me understand how Linux internals and system performance inter-relate. I love how he's able to take these pretty esoteric concepts and flesh them out. Truly one of the Linux GOATs
He also wrote a lot about Solaris, but I won't hold that against him /s.
https://access.redhat.com/documentation/en-us/red_hat_enterp...