• Pop!_Planet is still very much under development. Data for the wiki is being sourced from the Arch Linux and Ubuntu wikis, along with a bunch of completely unique content specific to Pop!_OS, and sourcing, converting and updating that content takes time; please be patient. If you can't find what you're looking for here, check the Arch Linux and Ubuntu wikis.
  • Welcome!

    I'll get straight to the point.

    When I started Pop!_Planet, I launched it because I saw a need for a centralized community for Pop!_OS. To be frank, I never expected the level of popularity it has achieved. Over the last year, we have gone from under 50 users, to almost 400 users. That's awesome! However... it also comes with a downside. We are rapidly running out of disk space on our server, and the bandwidth costs go up every month.

    Pop!_Planet is not affiliated with System76 in any way, and is funded completely out of pocket. From day one, I said that I'd never use on-site ads (I hate them as much as you do), so the only monetization we get is through donations. Right now, the donations we receive don't even cover our overhead.

    I know that most users will ignore this message, and that's ok. However, if even a few of our users are willing and able to donate a few dollars to help offset our expenses, it would be greatly appreciated.

    Support Pop!_Planet

    Thank you for your time,

    Dan Griffiths
    Pop!_Planet Founder

Pro Audio Pop!

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Hi folks,

I just bought a Darter Pro which I use primarily for writing and pro audio work. As per:

as per
I figure Pop!_Planet might be a great place to share what I've learned about optimizing Pop!_OS for pro audio (low latency) work, and perhaps contribute some viable suggestions to the OS roadmap to improve the "out of the box" experience for others.

I have been with Ubuntu linux and various related derivatives since 2009, when my Dell XPS m1330 suffered from Windows Vista cannibalizing itself. I was already running Open Office, and I decided to take a deep dive into FLOSS on the OS level. I have been in over my head ever since, and remain very excited about the collaborative and performance potentials that open source software and related community structure provides.

Please don't hesitate to reach out to me if you are interested in collaborating on improving the low latency desktop experience for Pop!_OS users
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Thanks! Would love to know what you have found...I have most of my stuff organized and will start posting it shortly.

UPDATE: still waiting on ability to edit the wiki
 
Last edited:

ayoungethan

Member
Jun 19, 2019
39
10
13
36
NOTE: My original intention was to post this to the Pop!_Planet wiki, but I have been unable to edit the wiki due to an unresolved technical issue: https://pop-planet.info/forums/threads/unable-to-edit-wiki.251/ It is posted in 2 parts due to character restrictions.

My current thought is to divide this article into three major sections:
1. concepts: basic ideas behind low latency audio performance optimization
2. setup process: concrete steps users can take to optimize their systems
3. System76 recommendations: based on aggregate user experience, research and market analysis, specific recommendations for System76 to implement to enhance low latency performance or make the setup process more accessible and streamlined.
4. additional resources (further reading)

The article currently lacks #2, the setup process. I have this written for my system (a System76 Darter Pro) and will post it shortly.

==Concepts==
This guide will cover how to make the most essential optimizations in Pop!_OS for low latency professional audio performance on a modern system, while still leaving the system in good shape to perform admirably with regards to battery life and throughput for more standard desktop tasks.

Many, if not most, of these concepts also apply to other operating systems. The performance optimizations will also apply to most GNU/Linux-based operating systems.

If you don't care about understanding the concepts underlying the performance optimization of a digital audio workstation (DAW), feel free to skip to the section with the specific performance tuning steps.

This tutorial was written by a lay person for lay people. There are gross generalizations that some experts may dispute. It is my hope that those generalizations are still practical and relevant to the task of having a basic understanding of latency and tuning a system for better latency.

===Computer performance: An introduction===

When we talk about desktop computer performance, we generally discuss two things:
Throughput: how much data or how many computations a computer can handle over a given timeframe
Energy efficiency: how much energy the computer uses to do its work

We typically see throughput vs energy efficiency as a tradeoff, and this is true for systems under constant heavy load (such as servers). However, for systems under highly-dynamic loads (heavy one moment, light the next), the "race to sleep" philosophy of optimizing power saving and performance comes into play, whereby a computer's ability to maximize its performance for discrete (dynamic) workloads actually saves power by allowing it to maximize the time spent in low power states (see https://en.wikichip.org/wiki/race-to-sleep).

However, in systems that handle digital audio (recording, playback and processing; often called Digital Audio Workstations, or DAWs), audio latency becomes an issue. We can define latency as the amount of time it takes for a signal or data to travel through a path, usually measured in milliseconds. The word "jitter" describes variations in latency over time. Professional audio systems not only need low latency (generally defined as sub-10ms latency paths), but also precise latency with negligible jitter. For instance, consider two computers with "avg 10ms latency," with 5 measurements taken over the course of a second (1000ms). Computer 1 might measure 9ms, 9ms, 10ms, 11ms, 11ms, which averages to 10ms, but with 2ms (+/- 1ms) of jitter. Computer 2 might measure latencies at 5ms, 5ms, 10ms, 10ms, 20ms, which also averages 10ms, but with 15ms of jitter. This jitter makes the signal highly unpredictable, which means it becomes very difficult or impossible to precisely synchronize, route and process various audio streams together, and can cause buffer overruns or underruns (collectively, xruns) and data loss. To my knowledge there is no way to prevent this data loss in the face of an xrun. The CPU delivers whatever data it was able to successfully process in the given latency timeframe, but the rest just gets dropped. While it may be possible to backload more data processing in the next buffer cycle, this will increase the likelihood of another xrun and may distort any data that depends on processing timestamps.

A buffer underrun occurs when an audio program fails to deliver information to the buffer quickly enough, subtracting from the amount of time the CPU has to do calculations, and effectively "starving" the CPU of time and data to crunch within the expected latency time. A buffer overrun occurs when data backs up because the CPU has too much data to process in too short of time for its speed. Either way, the CPU fails to crunch the data within the expected timeframe. Jitter is a byproduct of an inconsistent amount of time to process chunks of data, and occurs when a CPU either dynamically increases or decreases its buffer size according to demand, or, in the case of a larger buffer setting, the CPU is "allowed" to finish a processing task early (for the sake of increasing overall throughput) so it can move onto working on other tasks (context switching) before it has to come back and process the next buffer. That's good for increased throughput performance, but bad for stable latency. This is also why pre-emption for audio processing is so important: If a CPU has excess processing capacity with regards to the workload, it still needs to strictly and consistently feed data into the buffer and process and route that data, on a strict schedule. If a CPU has limited processing capacity, it obviously must strictly deprioritize other tasks as-needed to maintain the buffer data feed and processing and routing on the consistent schedule.

DAWs need consistent latency while performing recording, playback and processing tasks to keep everything in sync and ensure that no data gets lost. Stable latency is much more important for recording than low (but unstable) latency. It doesn't matter if the data takes a little longer to get recorded, as long as it all gets recorded consistently. Low latency is much more important for live response. Likewise, a high (but constant, low jitter) latency is not always a bad thing, for example, when playing back audio or mixing audio together, because it can all remain in sync. However, low latency is necessary for live, real-time control of digital instruments or plugins, multitracking via overdubs, and for monitoring live mixes in real time without perceptible lag or echo. Unless you are doing one of these three tasks, you probably don't need a low latency setup.

Thus, for pro audio and other related multimedia applications, we have three performance variables to optimize and balance: throughput, energy efficiency, and latency (average and jitter, both) for three digital audio tasks: recording, processing and playback, via three processing tasks: feeding the buffer, crunching the numbers, and routing the results to the appropriate destination. These tasks are not mutually exclusive and can occur individually or in any combination, but have different performance requirements. Processing alone depends on throughput. Recording depends on low jitter data synchronization. But when you combine playback and recording with the processing of that data (e.g., via plugins, or virtual instruments), you need both a stable and a minimal latency. Low-latency monitoring of recording is important for overdubs, but latency only needs to be within reasonable ranges of delay that musicians are accustomed to experiencing acoustically.

Sound moves relatively slowly, about 340meters per second (roughly 1200 ft per second). That means it takes sound roughly 3 seconds to travel 1km. Likewise, it takes sound 3ms to travel 1m. Musicians are used to playing with delay upwards of 10 milliseconds, which is a good general latency goal for anyone who needs to overdub (which requires low-latency monitoring of both playback and recording) or play or record a virtual instrument in real time. Any of these tasks take up precious time in the CPU, but processing tasks (digital-analog or analog-digital conversion and signal processing for effects or sound generation from virtual instruments) demand the most CPU time.

Outboard dedicated audio cards that can handle, at a minimum, the DAC and ADC, free up the CPU to focus on digital signal processing (DSP) as well as the more fundamental tasks of feeding data into and routing data from the buffer. An audio card that can also handle the DSP as well as DAC and ADC means that the computer CPU only needs to focus on consistently feeding data into and routing data from the buffer, giving it a greater amount of overhead. That is one reason why dedicated USB audio interfaces often allow for lower latencies and greater stability and reliability in audio processing. Another reason is that they are often designed from the ground up in the hardware for low latency performance. A computer system, likewise, can be designed in this way, for example, by selecting higher quality components that are designed to work consistently at low latencies with little jitter, and ensuring that they are connected together in the computer busses without shared hardware interrupts (which, as I understand it, is becoming an obsolete practice anyway).

===Optimization===
Relatively large optimizations in latency and jitter can be made with relatively small compromises in throughput and/or energy efficiency, allowing for well-rounded system performance in a variety of contexts. We just need to start considering latency (amount and jitter) alongside throughput and energy performance variables. Apart from making systems more versatile, optimizing performance along these three parameters also further improves multimedia performance, as professional audio work can often make significant demands of throughput performance in a low latency context.

For example, hyperthreading represents a classic tradeoff in throughput vs latency. According to Intel, "overall processing latency is significantly increased due to hyper-threading, with the negative effects becoming smaller as there are more simultaneous threads that can effectively use the additional hardware resource utilization provided by hyper-threading." Hyperthreading may increase context-switching, which is resource and time-intensive. By disabling the "virtual cores" the system routes information that it has according to the physical cores available, which decreases context switching and latency, but also lowers the potential throughput performance. See http://techblog.cloudperf.net/2016/07/measuring-intel-hyper-thread-overhead.html, which concludes "HT improved overall throughput by 25% but at a cost of higher latency" for each individual thread. More gets done, but each individual thread takes longer to complete: 8 threads in 1.6 seconds (HT) vs 8 threads in 2 seconds (non-HT), and 1.6 seconds per thread (HT) vs 1 second per thread (non-HT). Low latency currently depends on single thread throughput and response time rather than overall multi-threaded throughput, so there is a trade-off between responsiveness and throughput performance when dealing with highly time-sensitive applications, such as audio or multimedia throughput and synchronization.
CAVEAT: Hyperthreading technology exposes computers to threats. Mitigations for those threats reduce performance. By disabling hyperthreading, users can also safely disable the side-channel timing attack mitigations that exploit hyperthreading. (https://threatpost.com/intel-zombieload-side-channel-attack-10-takeaways/144771/). In addition, disabling HT may reduce heat and power consumption. While certain workloads (such as transcoding, compiling, etc) will take longer to complete, discrete tasks will complete more quickly, with less overhead. This absolutely applies to low latency throughput performance. For example, heat-limited CPUs (such as in laptops) will be able to maintain higher clock speeds for longer, which may translate into real-world low latency DSP performance gains, with higher throughput at lower latency.

The ELK Operating System (https://www.mindmusiclabs.com/) represents an extreme version of latency optimization for dedicated use in embedded hardware. In addition to optimization such as those in this guide, ELK OS strips away many elements of a desktop operating system in order to acheive very low overheads, extremely low latencies and high priority of audio threads. This optimization also means greater throughput potential as the OS does not "get in the way" of audio processing much. This is fantastic for dedicated, embedded hardware systems: it allows for long-service hardware, for example, that can be upgraded or modified via internet or other data connection. However, the sacrifices in the OS design also make it inappropriate for use in multi-tasking, e.g. desktop or laptop systems (see FAQ at https://www.mindmusiclabs.com/#collapse7).

===User Psychology===
Mac OS engineers learned early in the design of the Mac OS X audio system (around the year 2000) that humans are very sensitive to missed samples, because we are very sensitive (even hard-wired) to rhythm and rhythmic flow. You've all experienced it: The music plays, and then it stops. Or stutters. Or pops. It feels emotionally jarring. The same with laggy audio that runs smoothly, but out of sync with other audio or video. It feels confusing and distracting. It creates a significantly-negative emotional experience. Stable, reliable low latency setups are meant to prevent such occurrences at the source, both in the product (the audio itself) and in the process of producing that audio. Thus, there are three areas where glitch-free audio is really important:
1. production process (to create a glitch-free)
2. product result (to allow for glitch-free)
3. playback

Glitch-free = no pauses, skips, drop-outs, pops, loss of synchronization or other unpleasant artifacts in the file, its creation process or its playback.

The first two are the concern of pro audio production. The last is the concern of everyone. In the Linux/Windows world, it has typically been solved with large and/or multiple buffers, resulting in high latencies inappropriate for pro audio work. This has created some fragmentation in computing markets. Apple historically capitalized on this fragmentation by prioritizing professional audio-visual production, which appears to monetize disproportionately. Rather than focusing on growing the biggest user base (Windows/Linux), Apple carefully chose to occupy and dominate a very valuable but relatively marginal niche, and grew from there to a dominant share of the computing market based on a market perception of "quality" user experience.

On the flip side, humans are relatively insensitive to small absolute differences in throughput, even if the differences are relatively significant. For example, it might sound good that a program loads or a task finishes "5 times faster." But 5 times faster than what? If the slower task completion time is 1 second, then the faster completion time is .2 seconds (200ms). After a certain threshold, such a difference in performance doesn't leave a significantly negative impression on a user in most cases. Those differences have significance only in specialized niches that benefit heavily from improved throughput. But many of the tradeoffs in performance tuning for latency are much less significant, along the lines of a difference of, say, a file encoding process taking 20 seconds instead of 18 seconds. An internet benchmark concerned only with throughput can obsess over and inflate the importance of this difference and turn it into a difference of socially-constructed importance, but the actual psychological impact on the user is minimal. Lastly, even in cases where the difference is not minimal, it is still not very noticable to the end user. The user doesn't sit around twiddling their thumbs to wait for an intensive computing operation to finish. They walk away and do something else, and come back. Or they stay on the computer, and multitask with something else while they wait. In this case, they want the computer to remain responsive and reliable. They want to have a pleasant, glitch-free experience while working and waiting for the other task to finish, and they want the other task to finish reliably, without errors or glitches.

At the heart of this, we tend to place a psychological priority on reliability, and a large part of our experience of reliability involves latency. A system that crashes, stutters or craps out unpredictably breaks our trust. And no amount of throughput performance can regain or re-establish broken trust. Imagine that you have a mechanical (powered) hammer capable of hammering several times faster than a manual hammer, and you are building a stick-framed house with nails. Now imagine yourself hammering away at those nails with that new gadget. Now imagine that the hammer head falls off at seemingly-random points in the hammering process. Sometimes it just falls off, sometimes it goes flying. Either way, it drastically slows your progress, because it interrupts your workflow and concentration, or leads you on a wild goose chase to fix the problem, and can even permanently mar a project with mistakes (mis-hit wood and nails), creating more work on the back-end. After a bit, you stop trusting the hammer. You feel uneasy and distracted around it, and your ability and willingness to use it effectively decreases. Now, imagine if you could turn down the speed of the hammer just a little bit, and make the hammer much more reliable and prevent the problems and trust issues that arise. Even though it is "slower," you still finish the house faster than if you tried to keep hammering at top speed. Unfortunately, internet most popular internet benchmarks do not take into account these factors of such immense real-world importance to end users, skewing and misdirecting computer hardware and operating system design and performance tuning. Apple succeeds in the marketplace in spite regularly losing out to these so-called "performance tests" to similarly-spec'd Windows or Linux hardware.
 
Last edited:

ayoungethan

Member
Jun 19, 2019
39
10
13
36
===Low latency throughput===
We can't talk about latency or throughput without talking about their combined need in professional audio. Computer throughput determines how much signal routing and processing it can do before overloading, which leads to audio glitches or dropouts (i.e., discarded audio signals that never made it from source to sink). A DAW will be limited in the number of audio streams it can record, process, route and mix at a given latency based on its digital signal processing (DSP) capacity. A DAW's DSP performance is heavily dependent on its CPU frequency. Higher frequencies, faster processing, more DSP capacity.

Traditionally, CPU performance scaling (sleep states and frequency modulation) allowed CPU performance to change dynamically based on demand. However, CPU performance scaling currently has three limitations that make it unsuitable to dynamically manage CPU performance for low latency contexts:
1. It is relatively insensitive to a processor's DSP load
2. It adds overhead (meaning it takes additional time, energy and CPU cycles to change between faster and slower frequencies, which lowers overall throughput capacity and can add latency and jitter)
3. It occurs much too slowly (e.g., in 10-30ms) to be of use in a context demanding throughput performance at low (<10ms) latencies.

For these reasons, by the time a CPU scales to a faster frequency based on demand, a DSP overload (and thus audio glitches such as pops, crackles, or dropouts) has already likely occurred. Said another way, the CPU scales to a past, rather than current or future demand, and fails to actually meet that demand. By extension, we can conclude that reliable throughput performance of a DAW comes from its lowest sustainable operating frequency, not the fastest theoretical frequency that the CPU might ramp up to based on a past demand. For example, if a CPU is set to have a baseline frequency of 1200mhz but can scale up to 2400mhz "on demand," the more accurate measurement of low latency throughput comes from the 1200mhz number, assuming that the computer can sustain that frequency indefinitely based on its longest load. CPU cooling matters: if a CPU begins to overheat, it will down-throttle its speed. This is a common problem in high-spec'd "ultrabooks," which have impressive hardware capacities on paper and transient load response benchmarks, but often struggle to sustain high throughput due to poor cooling. Cooling depends on the ability to remove heat, which depends on some combination of power (fan, water pump) or circulation (space). Ultrabooks often compromise cooling for light weight, small size and battery life. This is why desktops and larger laptops often perform better than ultrabooks as DAWs than similarly-spec'd laptops: they can sustain higher workloads or minimum frequencies for longer.

We can't currently depend on the "race to sleep" philosophy to provide both power savings and throughput performance within the minute tolerances of a system we are also asking to reliably handle (process and distribute) audio and related data streams at low latencies. Until CPU frequency scaling can occur in microseconds instead of milliseconds, and scaling can occur based on DSP (rather than overall CPU) load, the only way to have both high throughput and reliable low latency is to increase a processor's baseline frequency, which decreases its energy efficiency. This is not needed in all contexts -- only when relying heavily on DSP, e.g., when running many plugins or high quality digital instruments at high polyphony and high sample rates in real time.

Unfortunately, a user must either anticipate the amount of DSP capacity needed, and set their lowest CPU frequency accordingly (all the way up to maximum frequency), or leave the baseline frequency "as is" and stay within the limits of the current baseline frequency to avoid audio glitches by reducing the CPU load for the session (reduce the number of plugins, audio streams or related quality settings). See https://github.com/falkTX/Cadence/issues/250
Caveat: Disabling hyperthreading may make CPU scaling more relevant. The reason for this is that DSP is done by the CPU Floating Point Unit (FPU). Each core has one FPU. Hyperthreading can cause two threads to share a single FPU. This sharing can cause a mild resource conflict and context switching in the FPU between each thread. Regardless, the CPU may indicate it is under-utilized, whereas the FPU may be utilized to its maximum potential, causing overloads and dropouts (stealth spikes in FPU latency). Disabling hyperthreading reduces the CPU to one FPU per core (vs one FPU per two [virtual] cores). As a single core without hyper-threading doesn't need to share FPU resources, and context switching won't occur, DSP utilization may be reported more accurately as overall CPU utilization, causing CPU scaling to occur more proactively regarding DSP load. But it may still occur too slowly to be of real-time use.

Some operations require only low latency and minimal CPU processing capacity, such as recording raw streams of high quality audio. The CPU has to work very little in this situation because it is mostly directing and distributing data streams, rather than processing (changing) them in real time. In such a situation, a relatively slow (energy efficient) CPU speed will suffice. Likewise, high quality digital signal processing of even a few data streams can easily maximize the use of a fast CPU operating at peak frequencies. In such situations, a DSP limit means a tradeoff between either (high) audio quality or (glitch-free) audio reliability, and benefits heavily from maximized DSP capacity.

===Realtime Priorities===
Several system configuration steps contribute immensely to low latency performance, regardless of CPU performance or settings.

Low-latency kernel configuration: The kernel is the core of the operating system. It is one of the most fundamental software interfaces with the hardware. GNU/Linux (including Pop!_OS), Windows (NT) and Mac (Mach) all use kernels. Pop!_OS is just one of many GNU/Linux operating systems. The kernel defines some of the most fundamental performance characteristics and focuses of the operating system.

Windows and MacOS only use one kernel. MacOS is already heavily optimized to prioritize glitch-free audio, in large part because it was designed around that niche market as Apple's last remaining user base in 1998, giving the user land audio library team priority in setting the OS development agenda. The Windows NT kernel and Linux kernels had no such focus. However, Linux kernel development occurs relatively rapidly. Because it is open source, it gets regularly forked from the "mainline" version, and modified for specific operating conditions and parameters. Sometimes those modifications are merged back into the mainline when they are perceived as generally-beneficial. However, many operating systems based on the Linux kernel (such as Pop!_OS) maintain more than one kernel version at a time, for use in different circumstances. In the specific case of Pop!_OS, we have both the linux-generic and linux-lowlatency kernels to consider.

The current linux-lowlatency kernel line is thought to contain the most important optimizations for latency-sensitive computing, still making it appropriate for most general computing circumstances. The differences between the -lowlatency and -generic kernel lines are few but key [insert kernel_diff.txt].

Specifically, the -lowlatency kernel allows for full pre-emption of "lower priority" threads or processes with "higher priority" threads or processes. In lay terms, threads with a high priority attached to them can "cut in line" to ensure that they get executed on-demand. But this cutting in line also has administrative overhead to it, which lowers overall throughput (the number of threads that can get through the queue in a given timeframe). In the -generic kernel, pre-emption is merely "voluntary," meaning that "the running process declares points where it can be preempted (where otherwise it would run until completion)" https://stackoverflow.com/a/5741721/11705382

Second, the -lowlatency kernel operates at a higher tick rate, 1000hz vs 250hz (times per second). That means that the -lowlatency kernel has four times the opportunity per second to interrupt its current queue to allow higher priority processes to cut in line. This again makes the system more responsive to those higher priorities, but at the cost of throughput and energy efficiency. The "NO_HZ" parameter means that a CPU can stop its timing when it's not needed, which allows the CPU to rest in a lower power state. Both kernel lines have this feature enabled by default.

In the near future, no_hz_full (adaptive ticks) may allow additional performance improvements (greater power saving and throughput with less jitter in low latency contexts).

====RTIRQ script====
By using a -lowlatency (or similarly optimized) kernel line instead of the -generic kernel line, we can then program the operating system to make use of those configurations to label certain threads with high priority, interruptable, "cut-in-line" privileges. In our case, we want to configure the computer to prioritize audio and related data threads and processes. We should note that this does not raise the priority of an entire audio-based software program. For example, the user interface and any non-audio data threads and processes should still operate at nominal priority, which means that the system will temporarily ignore them, if necessary to ensure that any audio-related processing gets completed on schedule. Some visual indicators may lag or even freeze to retain priority and glitch-free reliability of audio streams. This is not a malfunction, merely the computer maintaining appropriate priorities when its processing resources are under high demand. It makes sense: do you want a temporary and relatively inconsequential display glitch, or a permanent glitch in audio that might completely ruin a take or track?

There are many ways to accomplish the task of elevating privileges and priority of audio threads. RTIRQ (https://www.rncbc.org/drupal/node/1979) plus a pre-emptable kernel is probably the most accessible and widely-used strategy. The startup script (which also runs on resume from suspend) simply reorders all audio-related IRQ threads at high priority on the realtime scheduler. It does not work with the voluntary pre-emption of the -generic kernel.

As a result of the above -lowlatency RTIRQ configuration, systems can often safely have background processes, such as WiFi, ethernet or bluetooth connections, active and running without fear that they will "fight" for priority with audio threads and processes. Audio will still come first, which might throttle and slow down the efficiency of the lower priority threads and processes, but still allow them to function. Without this step, users can notice severe audio performance degradation while running network services, for example, under voluntary pre-emption or without audio threads gaining high priority (just under system critical threads). Likewise, turning off or disabling networking services and other background processes can yield significant performance increases without needing to further tweak the setup. This is more of an ad hoc approach to low latency configuration.

====Userspace configuration====
The Cadence suite helps with some ad hoc configuration, as well as jack server configuration. Both qjackctl and Cadence automatically install jack2 (aka jackdmp). When JACK is installed, it automatically modifies /etc/security/limits.d/ to allow audio threads to access pre-emptive realtime priorities with both the -generic and -lowlatency kernel lines.

Both processes and threads can be given elevated privileges and priorities: https://stackoverflow.com/a/200543/11705382
In Linux, we can elevate the priority of audio threads via the audio group: https://wiki.ubuntu.com/Audio/TheAudioGroup

You can set the realtime priority of the Jack sound server via its runtime configuration.

==Performance Tuning Steps==
These are actual steps that users can take to optimize their standard Pop!_OS desktop for multimedia production.

Disable hyperthreading for -lowlatency kernel
maxcpus=[4]
or nosmt
https://coreos.com/os/docs/latest/disabling-smt.html

"I highly recommend disabling hyperthreading if one wants work to be done in the order it’s queued with minimal overhead" This applies to all time-sensitive processing tasks, such as low latency DSP.
 
Last edited:
  • Like
Reactions: z1p3ngf

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Thank you! One of the Pop!_OS developers suggested pulling out actionable items into separate bug reports. That will be my next focus. Any suggestions or help appreciated. My first focus will be on suggesting two strategies to make the -lowlatency kernel line more accessible by either allowing it as an optional dependency for the system76-driver and other pop!_os-specific packages (which currently have the -generic kernel line as a hard dependency; uninstalling the -generic kernel also uninstalls many other core OS packages) AND/OR broadening the scope of kernelstub to gracefully handle more than one simultaneously-installed kernel line. It currently works gracefully only with one kernel line, and any additional kernel lines (necessary because of the aforementioned hard dependency on the -generic kernel) create an almost completely manual kernel management workflow which is highly inconvenient to anyone who updates software with any frequency. For a dedicated DAW, this isn't much of an issue, as such computers are often highly-specialized and sometimes even lack internet connections that make security vulnerabilities a problem. But for mixed-use systems as is common in today's musical context, it makes sense to maintain regular security updates. And that means a bit of hassle and lots of opportunity for potentially-catastrophic error with each kernel update cycle, which occurs fairly frequently.
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Here are four potential/priority bug reports I have started to compile to submit to S76/Pop!_OS devs. Any comments are much appreciated.

1. Make linux-lowlatency or another lowlatency kernel (such as Liqorix) available to install without needing any command line configuration or breaking the system.
Option a. Problem: Currently, kernelstub is only able to handle a single kernel line without becoming a mess. While it is technically possible to install a second kernel, it destroys the functionality of the boot menu and the kernel that gets booted is whatever kernel was updated last. Solution: Make kernelstub able to handle more than one kernel gracefully; the user can choose the latest version of -generic or -lowlatency (or liquorix, etc) via menu at boot
Option b. Problem: Currently, S76 and Pop!_OS have hard dependencies on the -generic kernel line, making it impossible to uninstall the -generic kernel without breaking the system. Solution: Make the -lowlatency alternative a drop-in replacement for the -generic kernel, as an alternate dependency on core Pop!_os and S76 components that currently depend strictly on the -generic kernel.

2. Make the Linux audio subsystem and latency (both average and jitter) a performance tuning priority.
Currently, performance is defined almost entirely by a reductionist focus on throughput. But audio latency tuning is critical to key multimedia user experiences, such as audio/video stream synchronization, and network/controller responsiveness while gaming. A well-tuned system is not merely optimized for maximum throughput, but small tradeoffs in throughput for relatively-important gains in latency. This involves tuning PulseAudio for better lowlatency performance out of the box, but also giving audio interrupts permanently-higher priority in kernel scheduling, as per the Mac OS design. Ref Godfrey van der Linden: Mac OS X IOAudio presentation from 2010
https://www.cse.unsw.edu.au/~cs9242/10/lectures/09-OSXAudiox4.pdf
https://www.cse.unsw.edu.au/~cs9242/10/lectures/09-OSXAudio.pdf

We really need to look into to what extent some of these or analagous engineering decisions are feasible in the GNU/Linux paradigm. That is beyond my knowledge/capacity, but I am happy to coordinate with someone on this and learn as we go.

3. Simplify and optimize the S76 power/performance selections: Balanced is a great default for most every situation. "Battery life" has no real life use scenario that I am aware of, and mostly seems to reduce performance and make battery life worse. You can get rid of the "battery life" setting. The final "high performance" setting should increase the minimum frequency of the CPU to its maximum sustainable frequency. Currently, the raw "performance" governor and the System76 "performance" governor accomplish different intersecting tasks, with different HPW_Request QoS (min) frequency values.

Example with my processor (Intel i5 8265U) under the default "balance_performance" EPP (x86_energy_perf_policy) setting:
a. "Balanced" and "battery life" settings in the S76 governor set QoS minimums at 400mhz, whereas
b. "High performance" sets minimums at 2000mhz.
c. "Battery life" caps performance at 1600mhz, whereas
d. "Balanced" and "high performance" have no hard cap (up to maximum burst speed of 3900mhz).

The system76 driver does not currently access or influence the EPP value. But it should. The S76-power driver, when set to "high performance" should change EPP from its default "balance_performance" to raw "performance" mode. Even there are 2 other modes (balance_power and power), they, like the S76-power "battery life" power setting, don't necessarily accomplish their goal, and instead both reduce system performance and battery life by violating the "race to sleep" design. While raw "performance" mode also violates "race to sleep" it does so for a clear lowlatency throughput performance gain.

As per the attached table, there are three potentially-useful settings:
1. a "balanced" mode that adheres most closely to "race to sleep" allows for maximum frequency scaling possible (lowest min and highest max). This is what people will select under most non-multimedia performance applications. It is based on the "balance_performance" EPP and S76-power "Balanced" profile, which is a good default. Then we have two multimedia-specific settings that both minimize jitter:
2. A "conservative" mode that locks the CPU into a single, lower min/max frequency. On my computer, by setting EPP to performance and S76-power to "battery life" it puts the min and max frequencies both at 1600mhz, for example. But this would be much more meaningful to the user as a selectable rather than arbitrary frequency.
3. A "performance" mode that locks the CPU into a single, higher min/max frequency. This can occur by changing EPP from "balance_performance" to "performance" without needing to touch the S76-power setting.

Exactly like the "conservative" mode, "performance" mode would be much more meaningful to the user as a selectable rather than arbitrary frequency. For example, #2 and #3 above can be set via the same process. By setting the min/max frequencies to the same setting, you effectively give the CPU a "target frequency" to operate at, whatever is appropriate for the circumstances. For example, audio recording is not particularly CPU intensive. If someone is using the computer to record on battery, they can set a low target frequency to gain more dependable performance with lower power consumption per unit of time, at the cost of throughput. In contrast, if someone is using the computer in a way that demands both latency and throughput performance, such as when using virtual instruments or realtime effects processing, they may want to set a higher target frequency at the cost of energy consumption per unit of time. Ultimately, the ability for the user to quickly set a stable "target frequency" is the most important factor, and that can be done with a simple slider in the System-76 power interface. The low (conservative)/high (performance) min/max values of the slider are based on range of user-selectable HWP_Request min/max frequency ranges of the hardware exposed by the kernel.

The end result is that the user can switch back and forth between "balanced" and "low latency" modes, and while in the "low latency" mode they can select from a low (conservative) or high (performance) value, or anywhere in between, according to the demands of the tasks they are doing.

Some people might find a "min" and a "max" set at the CPU's burst frequency alarming. But it is important to note that the internal power management of the CPU will not let the CPU overheat, and it will simply downthrottle to the highest thermally-sustainable frequency, which depends on ambient temperature, thermal design and the effectiveness of a CPU's cooling system.

NOTE: It is an open question to me whether disabling turbo boost provides additional low latency throughput performance by minimizing the overhead and jitter of any and all frequency scaling, including scaling to burst speeds. It couldn't hurt to have an "disable turbo boost" option.

4. Add and maintain the latest version of RTIRQ in the repositories, and make sure it gets installed as a dependency or a -recommends with the low-latency kernel.
 

Attachments

  • Like
Reactions: z1p3ngf

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Remove IRQbalance from default desktop installs (perhaps keep for server installs, but even that appears under heavy debate)
 
  • Useful
Reactions: TeamLinux01

ayoungethan

Member
Jun 19, 2019
39
10
13
36
I updated the above posts (mostly the "Computer Performance: An Introduction" section) and had to move the end sections here due to space limits:

==Software Options==

The Pop!_OS repositories contain a lot of great multimedia production tools already, by way of the Ubuntu repositories. However, many of those software are outdated. This occurs in three ways: 1. A software was abandoned and not maintained, but still included in the repositories. 2. A software is still maintained, but not up to date in the repositories. 3. A newer piece of software doesn't make it into the repositories for quite a while.

This is not a problem specific to Pop!_OS or Ubuntu, but to the entire GNU/Linux world. Maintaining a repository against a system architecture and distribution packaging standard requires a lot of work, worse that it is heavily-duplicated work. For example, Ubuntu and Fedora are large distributions with lots of software, but maintain completely parallel repositories under different packaging system standards. This means that a program is often compiled from source and packaged several times, across several different GNU/Linux systems. This is wasted effort that could be used elsewhere, such as in providing support, bug fixes, adding features, documentation, etc for software that is not a core, integrated part of the OS. On top of this, the extra effort means that software slips through the cracks or even outright conflicts with core OS dependencies, and cannot be included or updated. However, some have hypothesized that this software packaging variability also makes GNU/Linux less susceptible to security issues such as various forms of malware, as a Fedora-packaged program will not run in a Debian-packaged environment in some of the same ways that a Mac OS program will not run in a Windows environment. I don't know to what extent a unified software infrastructure would pose a real-world, practical threat to GNU/Linux security.

KXStudio repositories: This probably represents the best effort in the Debian world to provide an up-to-date multimedia software repository. Even still, it is maintained by one person, who also maintains an entire multimedia distribution (KXStudio), and suffers from the same challenges as other repositories.

Future: Flatpak

The move of desktop software to flatpak would drastically shrink the size and number of repositories needed. It would also shrink the amount of time and effort in maintaining those repositories, restricting it to what makes the OS unique. Any "add-on" software that could run on any OS doesn't necessarily need tight, strict integration into the core OS. That means software could be included and updated as it becomes available, without the overhead or challenges of aligning package dependencies, or compiling packages for different distributions. If this does not pose significant real-world security or performance concerns, it appears to be a great opportunity to solve many practical issues with software. The additional time and effort saved in software packaging and maintenance could be used for OS or software development and maintenance, improving the overall OS quality.

==System76 Recommendations==

See https://pop-planet.info/forums/threads/pro-audio-suggestions.250/ as they follow this discussion. There is ample reason to believe that multimedia-tuned computing products monetize disproportionately. Combined with dissatisfaction with Apple and advances in available software and hardware for GNU/Linux systems, it may make sense for System76 to invest some minimal resources so that it can promote itself as "multimedia workstation ready."

Hardware: see http://manual.ardour.org/setting-up-your-system/the-right-computer-system-for-digital-audio/

There is a lot of opportunity to optimize hardware based on component design, selection and driver development, especially as System76 gains greater control over its hardware selection and design.

Make it easier to disable hyperthreading. This would be especially beneficial if the configuration could be tied to the kernel (i.e., instead of a universal BIOS setting). This way, a user can easily switch at boot time between a system optimized for low latency performance and a system optimized for generic desktop use and race to sleep/maximum throughput.

Linux kernel: technically linux-lowlatency is available in the repositories. But kernelstub needs work to accommodate more than one kernel at once. ALTERNATIVE: allow users to swap out linux-generic for linux-lowlatency by making the System76 driver dependent on either package.

===Software===

Maintain up-to-date packages for core software: Cadence suite, RTIRQ, JACK2, and perhaps some core DAW software such as Qtracktor, Ardour* (see http://manual.ardour.org/setting-up-your-system/platform-specifics/ubuntu-linux/), and Non-DAW, and CALF plugins.

Transition to Flatpak to offload the burden of packaging and repository maintenance and ultimately give users greater choice in software options.

In the mean time, poach from up-to-date repositories, such as KXStudio, AND/OR create a volunteer-driven community repository that allows volunteers to contribute to maintenance of non-essential packages for Pop!_OS

System76-power: https://github.com/falkTX/Cadence/issues/250

System76-power configurations do not appear to make sense to me. "Battery Life" is merely a "conservative" governor that violates "race to sleep" so actually often results in relatively poor battery life. Really only two configurations are needed:
1. Fully automatic performance regulation at left (ie., a combination of TLP and powersave governor with full CPU frequency scaling, ie, same as System76-power "balanced" setting with TLP), and
2. defeat of certain power saving features at right with a performance governor that increases the minimum frequency optimized for low-latency throughput.

The third "battery life" option is really superfluous except in marginal situations where a laptop is under high constant load on battery power, such as via a runaway process. This is really just compensation for a poorly tuned system or misbehaving, buggy or poorly-optimized software.

General system tuning: Consider latency (especially audio latency!) and user psychology as important factors alongside throughput and energy efficiency! This general philosophy is very important for quality user experience and to maintain user trust in their computing systems.

https://access.redhat.com/sites/default/files/attachments/201501-perf-brief-low-latency-tuning-rhel7-v1.1.pdf (see in particular p. 8 about performance profiles: "Because tuning for throughput often at odds with tuning for latency, profiles have been split along those boundaries as well providing a “balanced” profile").
--
==Other Resources==

===Pop!_OS Resources===


Mattermost Pro Audio Channel

Telegram: Linux Audio

Pop! Planet


(included in the above post)


===Generic Resources===

http://www.jackaudio.org/ -- good general information about pro audio setup on Linux-based operating systems

linuxaudio.org


ardour user manual: http://manual.ardour.org/

specifically http://manual.ardour.org/recording/monitoring/latency-considerations/
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
The benefits of disabling logical cores? Very workload specific - if you cannot risk two executions occurring on the same core where one task is considered high priority and the other low or idle, the only way to guarantee cpu time gets distributed accurately is to disable logical cores. With logical cores enabled, cpu time will be distributed evenly between both executions regardless of priority.

Throughput will drop but performance will be more predictable. That is the tradeoff you get when disabling hyperthreading, or the technology AMD uses on their Bulldozer cores and newer.
(and also greater security)
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
A description of the differences between kernels: https://askubuntu.com/a/1244714/672975

Liquorix is really compelling. I am learning more about it here: https://techpatterns.com/forums/forum-34.html (specific post will be linked when available)

Hello,

I've been researching performance tuning of Pop!_OS for lowlatency audio production, which you can somewhat follow here: https://pop-planet.info/forums/threads/pro-audio-pop.249

This is the first time I've seen anyone else discuss ALL THREE real-world performance parameters of energy efficiency, throughput AND latency (pick any two), and the compromises between them.

My understanding of Liqourix is that it is the first (and only?) kernel to do what Mac OS X set out to accomplish in its earliest days (https://www.cse.unsw.edu.au/~cs9242/10/lectures/09-OSXAudiox4.pdf) by making small sacrifices to throughput performance to create relatively significant gains in latency performance without necessarily sacrificing energy efficiency. Is this correct?

I believe the -generic kernel line is slowly (and somewhat-begrudgingly) headed in this direction.

If this is all true, it sure is nice to see this sort of leadership that finally prioritizes user experience and psychology over the reductionist throughput benchmark wars that seem to be driving system design and optimization!

Are there any distros that use Liquorix by default? Any meaningful benchmarks or performance notes and comparisons compared to the -generic and -lowlatency lines? I've already seen https://www.phoronix.com/scan.php?page=article&item=linux414-lowlatency-liquorix&num=1 and it seems as irrelevant as most other benchmarks

I am making some recommendations to System76 and am wondering the extent to which Liquorix or a similarly-tuned kernel could replace linux-generic AND linux-lowlatency until mainstream kernel development and system performance tuning catches up to the philosophy that "latency matters, too!" and that small sacrifices in throughput result in big gains for MOST user experiences and workloads in MOST cases, as Mac engineers and users have enjoyed for 2 decades now.
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Bugs submitted:

Eliminate "battery life" mode

Add "target frequency" slider to "performance mode"

Kernelstub fails with more than one kernel line installed

Include Latest version of RTIRQ in repositories

These four bugs take care of most of the most difficult issues with configuring Pop!_OS, and also provide some benefits to general users in terms of prioritizing user experience over slight gains in throughput performance.
 
  • Like
Reactions: TeamLinux01

ayoungethan

Member
Jun 19, 2019
39
10
13
36
A great article on latency: http://whirlwindusa.com/support/tech-articles/opening-pandoras-box/

Everyone heard the initial 20 ms delay as a very short echo or “doubled” sound.

Then, the delay was gradually reduced from 20 ms. The subjects were told to stop us when the delay seemed to disappear. Then this was repeated while the person spoke short, sharp syllables like “check, check!”

Every person tested seemed to think the echo disappeared somewhere between 10 ms to 15 ms. I personally found it to be a rather dramatic change too - as if someone had suddenly bypassed the delay unit.
In my guitar experiment, it seems that the delay isn't noticeable at all up to about 10 ms (again). It becomes slightly noticeable between 10 ms to 15 ms almost like it's not really an echo - just “something's there,” but I could still play in time. The delay started to get difficult to contend with somewhere around 15 ms to 20 ms, and above 20 ms I really struggled with timing.
Comb filtering is particularly problematic at all latencies when combining acoustic ambient sounds due to phase cancellations. Manual adjustments to the system to minimize the comb filtering effect are necessary in situ. But in general, the target latency is <15ms MAX, and more likely <10ms.
 

ayoungethan

Member
Jun 19, 2019
39
10
13
36
Mixbus got back to me with their idea of what an OS would need to do to provide a great low latency experience:
1. SMI - Since system76 selects the hardware, making sure that the motherboard does not have SMI issues would be great.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/7/html/reference_guide/system_management_interrupts

They are a huge issue when trying to create high performance audio systems systems but there is no documentation or specs on these so you have no idea if they are going to be an issue until you purchase the model of hardware and test it. They can steal tons of CPU time when you consider the sample rate of audio.

Knowing before hand that this is not going to be a problem would be awesome.


2. Minimize kernel latency - Use rt-test or cyclictest

These are test tools that measure how well the kernel and underlying hardware are responding to realtime events. Having a system with very good results on these tests would be great.

http://people.redhat.com/williams/latency-howto/rt-latency-howto.txt


3. Proper system tuning
threadirqs enabled
Correct power management handling (ie make sure cpu freq scaling is off when doing audio work)
proper interrupt management (rtirq or similar)

Good tuning information here.

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux_for_real_time/8/html/tuning_guide/index
I asked them specifically what range of latency test results (overall/average latency as well as jitter) they would consider optimal.

All-in-all this project is shaping up as follows:

The SMI hardware component piece is huge and fundamental, even essential. After that, on the software configuration side (kernel and OS) I am working on prioritizing recommendations that will
  1. provide better low latency performance out of the box with no notable general user performance regression (slight tradeoffs in throughput are OK, as Mac OS X has demonstrated)
  2. enhance general user experience (eg responsiveness at the sacrifice of small amounts of throughput) or
  3. make it easier for users to do any remaining system tuning themselves (odds and ends).
In addition to making a clear and concise case for balancing throughput with low latency, I'll then need to categorize performance optimizations based on the above categories.

Elementary OS is also in cahoots with System76. They happen to be my two favorite distributions. So if either one is receptive, that would be a dream and go a long way toward enhancing Linux as a desktop/laptop software, generally, and for multimedia production, specifically.
 
Last edited:

ayoungethan

Member
Jun 19, 2019
39
10
13
36
More great info from Mixbus about latency and SMIs:
Most of our tuning experience is based on our high end Xengine systems which have to achieve single sample buffer sizes. So take that into consideration when looking at the numbers we are going after.

The hardware SMI issues can be huge. We have not tested a large number of systems, but I have seen systems that loose 15us of processing time every second to SMI interrupts. That is completely unacceptable. The ideal is to loose nothing to SMI but I don't see that existing in modern hardware. A good system example would be 4 or 5 SMI's a minute with a maximum length of under 5us each.
[note: to put these two comparisons on equal terms, the "bad" example equates to about 900us processing time lost per minute, or ~180 SMIs per minute assuming an average length of 5us per SMI. The "good" example is 20-25us per minute. This is a magnitude of 36-45 times less interruption in CPU function!]

Latency is also all over the place. Here is a good resource.


Ideally you want a system that has a low Max latency number and is overall consistent. For exact numbers I would look at what your max audio latency that you want to achieve is and try to get the system latency numbers well under that. For example if you want to be able to run your DAW at 32 sample buffers sizre at 96k, then your max cyclictest latency would be best well under 333us. We are generally looking for something under 20us.

We have not tried a Liquorix kernel. We just built a stock RT kernel with our own config. I have heard good things about the Liquorix stuff in the past.
I think this gives me some good information to approach System76 and ask how they currently manage and tune SMIs and whether they do any cyclictest sort of analyses on their hardware (both desktops and laptops).
 
  • Like
Reactions: derpOmattic

Members online

Latest projects

Forum statistics

Threads
791
Messages
3,595
Members
728
Latest member
Marcin