Linux Audio

Updated April 2004

Introduction

This document outlines the present state of audio on Linux. It is not intended to be a comprehensive report on technology or applications. Instead, the goal is to provide a lightweight technological overview, then direct the reader toward some of the more commonly-used audio tools. Links to additional resources are provided at the end of the document.

Audio Core

The audio core consists of (1) the Linux kernel drivers that talk directly to the audio hardware, and (2) a programming interface (API) that developers use to communicate with these drivers.

The audio core is split into these two layers so that someone writing an audio application does not have to write code that talks directly to the hardware; instead they write code that talks to the drivers, and then the drivers talk to the hardware. This is necessary because the "language" used to communicate with different hardware (usually sound cards) changes from model to model, and it would be a great burden if every time a new piece of software were written the developers had to write code to specifically address dozens of different models.

There are two core audio solutions commonly used under Linux: ALSA and OSS. This happens to a be a time of transition for the "standard" Linux audio core, with ALSA gradually replacing OSS.

Sound Servers

A sound server is a software layer that sits atop the audio core. Analogous to the way the kernel's audio API allows developers to avoid talking directly to the hardware, a sound server provides another layer to allow developers to avoid talking directly to the kernel's audio API. Though a performance hit comes with adding this addional layer, the benefits can include a simpler API, software-based sample mixing, and network transparency.

Software-based sample mixing makes it possible to simultaneously play multiple sounds on a single sound card, even if the card does not natively support this behavior. For example, it is possible for one application to sound a warning beep at the same time another application is playing music. Most modern sound cards are capable of doing this internally, but older cards are often unable to play more than one sound at a time. The sound server solves this problem by intercepting all of the sounds that are being sent to the card, mixing them together using the system CPU, and then sending the mixed output to the card. In this way multiple applications can simultaneously output sound and still be heard on a card that only accepts sound from a single source.

Network transparency makes it possible to play sounds on one machine and have them be heard on a another machine. For example, suppose you have two computers: One hosts a music collection, but lacks a sound card and speakers. A second sits in another room, has no music stored on it, but does have a sound card and speakers. If the machine with the sound card is running a sound server, it is possible to play music on the machine without the the speakers and have the sound output on the machine with the speakers.

Sound Server Software

EsounD

EsounD (ESD) is the sound server currently used by the GNOME project. It is important to note that the EsounD home page has not been updated since March 2000, and the original author apparently stopped development at version 0.2.8. The GNOME project has taken over EsounD development (they are now on version 0.2.34) but unfortunately they have not taken over maintenance of the EsounD home page, thus leading to a potential source of confusion in the developer community. The latest standalone EsounD package is from March 2004.

JACK

The Jack sound server has been getting a lot of attention recently, with developer support steadily increasing over the past year. Jack first gained popularity amongst those writing professional audio software requiring high performance and low latency; now we're beginning to see Jack support in common end-user applications as well. Jack is being actively developed, has a well-maintained website and good documentation, and is released as a standalone package.

From the Jack home page:

[JACK] has been designed from the ground up to be suitable for professional audio work. This means that it focuses on two key areas: synchronous execution of all clients, and low latency operation.

Others

There are a few other sound servers floating around, though Jack and EsounD are the most likely to be relevant in the near future. Three others warrant brief mention, but they are really only being presented for the sake of curiosity:

Sound Servers and the Audio Core: A Brief Commentary

We're basically down to two sound servers of relevance now: Jack and EsounD. Support for aRts is fading, and aside from developers building KDE applications it is now largely irrelevant. Jack and EsounD satisfy slightly different needs, and the right one to use depends on the application. Jack is appropriate for a high-end workstation running professional applications, while EsounD is appropriate for a low-end machine that either lacks a multichannel sound card or needs to play audio over a network. For the average desktop system, however, neither sound server is necessary. It is less hassle to simply buy a multichannel sound card and use the native ALSA drivers.

As for the audio core: ALSA is the future and OSS is on the way out. Since ALSA is backwards compatible with OSS, users should simply migrate to ALSA and abandon OSS altogether. The situation is a little stickier for developers, however, as there will likely be a significant number of legacy systems around without ALSA support for a few more years.

Playback Applications

XMMS

XMMS is one of the most popular graphical audio players for Linux. It supports a wide variety of audio formats, as well as four different plugin types (visualization, effects, input, and output).

AlsaPlayer

AlsaPlayer plays audio and CDs, and can be run in command-line, GUI, or daemon mode.

MPlayer

MPlayer plays both audio and video in many formats. It can be run either from the command line or in GUI mode.

Xine

Xine is a graphical audio and video player, very similar to MPlayer.

Ripping/Encoding Applications

CDparanoia

Cdparanoia is the de-facto standard CD "ripper" for Linux. It consists of a set of libraries and a command-line interface. Most graphical programs for Linux that rip CDs actually use cdparanoia behind the scenes. Cdparanoia is fairly sophisticated, and consistently gives high-quality results even from damaged discs.

Ogg Vorbis

"Ogg" is an open-source alternative to the popular mp3 format, and the Ogg Vorbis project provides libraries and a basic tool set for creating and playing ogg files ("oggenc" and "ogg123" respectively).

FLAC

Lossless audio compression. From the FLAC home page:

FLAC stands for Free Lossless Audio Codec. Grossly oversimplified, FLAC is similar to MP3, but lossless.

Speex

Speech encoder. From the Speex home page:

The Speex project aims to build a patent-free, Open Source/Free Software voice codec. Unlike other codecs like MP3 and Ogg Vorbis, Speex is designed to compress voice at bitrates in the 2-45kbps range. Possible applications include VoIP, Internet audio streaming, archiving of speech data (e.g. voice mail), and audio books. In some sense, it is meant to be complementary to the Ogg Vorbis codec.

LAME

LAME is the de-facto standard mp3 encoder under Linux.

Audio Editors

Audacity

From the Audacity home page:

Audacity is a free audio editor. You can record sounds, play sounds, import and export WAV, AIFF, and MP3 files, and more. Use it to edit your sounds using Cut, Copy and Paste (with unlimited Undo), mix tracks together, or apply effects to your recordings. It also has a built-in amplitude envelope editor, a customizable spectrogram mode and a frequency analysis window for audio analysis applications. Built-in effects include Bass Boost, Wahwah, and Noise Removal, and it also supports VST plug-in effects.

Ardour

From the Ardour home page:

Ardour is a multichannel hard disk recorder (HDR) and digital audio workstation (DAW). It is capable of simultaneous recording 24 or more channels of 32 bit audio at 48kHz. Ardour is intended to function as a "professional" HDR system, replacing dedicated hardware solutions such as the Mackie HDR, the Tascam 2424 and more traditional tape systems like the Alesis ADAT series. It is also intended to provide the same or better functionality as software systems such as ProTools, Samplitude, Logic Audio, Nuendo and Cubase VST.

Ecasound

Ecasound is a command-line audio processor, making it ideal for scripting repetitive tasks. From the ecasound home page:

Ecasound is a software package designed for multitrack audio processing. It can be used for simple tasks like audio playback, recording and format conversions, as well as for multitrack effect processing, mixing, recording and signal recycling. Ecasound supports a wide range of audio inputs, outputs and effect algorithms. Effects and audio objects can be combined in various ways, and their parameters can be controlled by operator objects like oscillators and MIDI-CCs.

GLAME

From the GLAME (GNU/Linux Audio Mechanics) home page:

One day, we hope to present you with a powerful, fast, stable and easily extensible sound editor for Linux and compatible systems. GLAME is targeted to be the GIMP for audio processing.

Snd

From the Snd home page:

Snd is a sound editor modeled loosely after Emacs and an old, sorely-missed PDP-10 sound editor named Dpysnd. It can accommodate any number of sounds each with any number of channels, and can be customized and extended using either Guile or Ruby.

SoX

From the SoX home page:

SoX is a command line utility that can convert various formats of computer audio files in to other formats. It can also apply various effects to these sound files during the conversion.

Protux

From the Protux home page:

Protux aims to be the most practical and one of the most powerful audio tools for GNU/Linux. Protux will allow you to use the power of keyboard+mouse combination (with no clicks) to vastly speed up the process of audio production. This concept we call "Jog-Mouse-Board" or JMB, for short.

ReZound

ReZound aims to be a stable, open source, and graphical audio file editor primarily for but not limited to the Linux operating system.

GNUsound

From the GNUsound home page:

GNUsound is a sound editor for Linux. It supports multiple tracks, multichannel output, and 8, 16, or 24/32 bit samples. It can read a number of audio formats through libaudiofile, and saves them as WAV. GNUsound supports a large number of high-quality audio effects through the LADSPA plugin architecture.

Pd (Pure Data)

From the Pd home page:

"Pd" stands for "pure data". Pd is a real-time software system for live musical and multimedia performances.

Sweep

From the Sweep home page:

Sweep is an audio editor and live playback tool for GNU/Linux, BSD and compatible systems. It supports many music and voice formats including WAV, AIFF, Ogg Vorbis, Speex and MP3, with multichannel editing and LADSPA effects plugins.

SpiralSynth Modular

From the SpiralSynth Modular home page:

SpiralSynth Modular is an object orientated music studio with an emphasis on live use. You can use it in a fairly straight forward way to make tracks with, or get very experimental. Audio or control data can be freely passed between the plugins. Data can also be fed back on itself for chaotic effects.

Drum Machines

Hydrogen

Hydrogen is an advanced drum machine for GNU/Linux. It's main goal is to bring professional yet simple and intuitive pattern-based drum programming.

Sound Trackers

SoundTracker

SoundTracker is a music tracking tool for Unix / X11 similar in design to the DOS program FastTracker and the Amiga legend ProTracker. Samples can be lined up on tracks and patterns which are then arranged to a song. Supported module formats are XM and MOD; the player code is the one from OpenCP. A basic sample recorder and editor is also included.

CheeseTronic

Cheesetracker is a portable Impulse Tracker clone. It supports all Impulse Tracker features except a few. For now the main goal is to remain at IT Feature set level, but very soon we might be adding new features to it.

Software Synthesizers

ZynAddSubFX

ZynAddSubFX is a opensource software synthesizer capable of making a countless number of instruments, from some common heared from expensive hardware to interesting sounds that you'll boost to an amazing universe of sounds.

AlsaModularSynth

AlsaModularSynth is a realtime modular synthesizer and effect processor. It features MIDI controlled modular software synthesis, realtime effect processing with input capture, full control of all synthesis and effect parameters via MIDI, and more.

FluidSynth

FluidSynth is a real-time software synthesizer based on the SoundFont 2 specifications. FluidSynth can read MIDI events from a MIDI input device and render them to an audio device using SoundFont instruments. SoundFont files are composed of digital audio "samples" and additional instrument parameters. These files can be created or downloaded off the Internet. FluidSynth also has support for controlling effects in real time and can play MIDI files.

MIDI Applications

MusE

From the MusE (Linux Music Editor) home page:

MusE is a MIDI/Audio sequencer with recording and editing capabilities.

Rosegarden

From the Rosegarden home page:

Rosegarden-4 is an attractive, user-friendly MIDI and audio sequencer, notation editor, and general-purpose music composition and editing application for Unix and Linux. It is currently somewhere approaching beta quality, following two years of sustained active development.

Seq24

From the Seq24 home page:

Seq24 is a real-time midi sequencer. It was created to provide a very simple interface for editing and playing midi 'loops'. After searching for a software based sequencer that would provide the functionality needed for a live techno performance, such as the Akai MPC line, the Kawai Q80 sequencer, or the popular Alesis MMT-8, I found nothing similar in the software realm. I set out to create a very minimal sequencer that excludes the bloated features of the large software sequencers, and includes a small subset of features that I have found usable in performing.

Timidity

From the Timidity website:

TiMidity++ is a software synthesizer. It can play MIDI files by converting them into PCM waveform data; give it a MIDI data along with digital instrument data files, then it synthesizes them in real-time, and plays. It can not only play sounds, but also can save the generated waveforms into hard disks as various audio file formats.

PMIDI

The pmidi program is a straightforward command line program to play midi files through the ALSA sequencer. As you can specify the client and port to connect to on the command line it is also useful for testing ALSA or clients that need to receive sequencer events.

Software Synthesis Language

Csound

From the Csound website:

Csound is a programming language designed and optimized for sound rendering and signal processing... [It] is an incredibly powerful and versatile software synthesis program. Drawing from a toolkit of over 450 signal processing modules, one can use Csound to model virtually any commercial synthesizer or multi-effects processor.

RTcmix

From the RTcmix website:

CMIX is a package of sound-processing, synthesizing, modification and mixing programs that you can use to do virtually anything with any pre-recorded sound, or with which you can synthesize any sound you can imagine. Its workings are very flexible and relatively simple, and, in the right combinations, any musical project can be realized.

CLM

From the CLM home page:

Common Lisp Music is a music synthesis and signal processing package in the Music V family.

SuperCollider

From the SuperCollider home page:

SuperCollider is a state of the art, realtime sound synthesis server as well as an interpreted Object Oriented language which is based on Smalltalk but with C language family syntax. The language functions as a network client to the sound synthesis server.

Other Applications

LilyPond

From the LilyPond home page:

LilyPond prints beautiful sheet music. It produces music notation from a description file. It excels at typesetting classical music, but you can also print pop-songs.

Swami (formerly Smurf)

From the Swami home page:

An instrument editor and so much more. Create and edit sample based instruments in SoundFonts for composing computer music; easily manage and connect instruments to MIDI sequencers; with plans for Python scriptability and multi-peer internet jam sessions.

NoteEdit

NoteEdit is a graphical musical score editor. It provides a GUI for LilyPond (above).

LADSPA

Many audio synthesis and recording packages are in use or in development on Linux. These work in many different ways. LADSPA provides a standard way for 'plugin' audio processors to be used with a wide range of these packages. For instance, this allows a developer to make a reverb program and bundle it into a LADSPA 'plugin library.' Ordinary users can then use this reverb within any LADSPA-friendly audio application.

[Note: LADSPA enables support for VST/VSTi plugins.]


Professional Resources

Audio professionals may be interested in the following resources:

Developer Resources

Audio application developers may be interested in the following: