                                                        JACK user documentation
[jack-logo]
                                               Last update Tuesday 25 February 2003 14:42


What is JACK?

JACK is a low-latency audio server, written primarily for the GNU/Linux
operating system. It can connect a number of different applications to an audio
device, as well as allowing them to share audio between themselves. Its clients
can run in their own processes (ie. as normal applications), or can they can
run within the JACK server (ie. as a "plugin").

JACK is different from other audio server efforts in that it has been designed
from the ground up to be suitable for professional audio work. This means that
it focuses on two key areas: synchronous execution of all clients, and low
latency operation.

This diagram, using ardour as an example, will give you an overview of how a
JACKed Linux audio system works.

Jack has two sets of parameter options. The first part are specific to running
the jack server. The second part are run time options for how jack interfaces
with the sound driver - currently only ALSA.

The easiest way to start jack is to run this command:

jackd -d alsa -d hw:0

Of course that gives you very little control over what jack does to the audio
stream and which device you use. You can specify a card name by setting up an
.asoundrc file. Visit the online ALSA docs for your card/device to get one.

There are many useful options which can be found by typing

jackd -h or jackd -d alsa -h

Example commandlines

Many people with soundblaster live cards find that more appropriate settings
are:

*jackd -v -d alsa -d (cardnamehere) -p 512

People using RME cards have reported success with:

*jackd -v -d alsa -d (cardnamehere) -p 64

A commandline for starting at 44100hz with verbosity, realtime scheduling,
hardware monitoring, and shaped dither enabled:

*jackd -v -R -d alsa -d (cardnamehere) -r 44100 -H -z s

You can use ecasound to generate a pure sine wave tone for testing the sound
quality of your device.

*ecasound -f:32,1,48000 -i null -o jack_alsa,myport -b:1024 -el:sine_fcac,440,1

There are a few other options which you will find useful.

JACK specific options

The default settings for jack are to run at 48000hz with a buffersize of 1024
frames per second and a period size of 2. Jack currently supports two bitrates.
Jack's alsa driver/client tries to use SND_PCM_FMT_S32_LE, which is the format
used by all current 24 bit audio cards except for some USB interfaces that
actually use 24 bits rather than 24-packed-in-32-bits. If the device can't do
that, it tries for SND_PCM_FMT_S16_LE, which every audio interface should/does
support. True 24 bit format wouldn't be a lot of work to support, but its not
trivial either.

The buffersize determines the latency between when the sound is received by
jack and when it is sent to the pcm device (the card output). Obviously the
less the buffersize the more realtime response you will have. Many people have
found that for general purpose use the default setting is more than adequate
but when you are doing recording you should set the buffersize as low as your
card/device can handle without causing sound dropouts (xruns). Some people
advocate using higher latency for recording to ensure smooth audio. This is a
tradeoff between realtime response for monitoring and audio quality. It is
recommended that you test your card and system to find out what the best
setting is for your setup. 64 frames per interrupt is the lowest currently
possible in any PC audio hardware. Due to the binary number system you should
increase the frames in multiples of 2 starting at 64.

For example: 64, 128, 256, 512, 1024, 2048, 4096, 8192....

jackd -v -a -R -P -d

-v means verbose. It will output the actions that jack is performing to a
console. This is very useful for debugging.
-a means to use the inbuilt ASIO support. This can only be enabled on cards
that support ASIO. ASIO is a protocol developed by Steinburg the makers of many
Microsoft audio applications. It allows for much lower latency performance
internal to the soundcard/device.
-R means realtime. This allows you to take full advantage of the low latency
patches for the Linux kernel. You should enable this if you are doing master
recordings or want to ensure the applications will receive the audio stream as
quickly as possible.
-P means Priority. This is superfluous to the -R flag but allows for setting
the priority of jack to the maximum available. Also useful when you need low
latency.
-d means driver. This sets the sound driver which jack intefaces with.
Currently "alsa" is the only option.

Driver specific options

jackd -d alsa -d -r -p -n -H -C -D -C -z

Currently jack only has support for alsa as a sound driver. In the future there
may be more driver options although it is not very likely.

-d means device. This allows you to specify a device other than hw:0
-r means sample rate. Use this to set the number of samples per second that the
audio is streamed at. 44100Hz is cd quality, 48000Hz (the default) is DAT
quality, Anything between 44100Hz and 192000Hz is DVD quality. The higher the
sample rate the more audio data you capture per second and therefore the more
space you use on your HDD. For many people CD quality is fine. The debate rages
as to whether sample rates higher than 44100Hz provide better sound quality or
not. Currently it is at a standoff until someone conducts conclusive double
blind tests in the tradition of Pepsi vs Coke.

Many people only work at 44100Hz because resampling down from a higher sample
rate is known to degrade the audio quality when compared to recording at
44100Hz originally. It is also highly likely that sample libraries you may want
to use are only available at 44100Hz. Saying that, most people agree that
acoustic recordings do generally sound better when recorded at higher sample
rates. Unfortunately CD's are not going to dissapear soon and DVDRW's remain
expensive so if you want to distribute your recordings it is more than likely
that they will be shipped at 44100Hz.

-p means the frames per period. This is the buffer rate which JACK will stream
audio at. See above for an explanation of what this means.
-n means periods per hardware buffer. This sets the number of periods per
interrupt which ALSA polls for your device. Most cards use two periods but some
use 3, 4 or even 8 or 16 (delta 10/10).

What is the exact purpose of the p and n parameters?

There are several kinds of latency:

    input latency
    output latency
    through (or "roundtrip") latency

    p affects input latency: how long from when a piece of data arrives at the
    audio interface connectors until user space software can use it?

    p*n affects output latency: how long from when a piece of data is delivered
    by user space data until it leaves the audio interface connectors?

    Roundtrip latency is combination of these two.

Conventional low latency systems (e.g. ASIO) use n=2 all the time. ALSA is
rather unusual in allowing other values.

-H means Hardware monitoring. This is only available with cards/devices that
support this feature. Usually cards that support ASIO will support hardware
monitoring. It allows you to hear the audio stream flowing through the pcm in/
outs at that very moment. This is very good for hearing what you are recording
as you are recording it.
-C means capture only. This opens the ALSA driver in read only mode which is
useful for people who only want to record audio and don't have a need to hear
what they are recording.
-D means duplex. This opens the ALSA driver in read/write mode which means that
you can play and record at the same time. Most people will only want to use
this which is the default mode anyway.
-P means playback only. This opens the ALSA driver in write only mode which is
useful for people who have no inputs or only want to play audio not record. It
can also reduce latency.

-z means dither. There are currently four options to the dither flag.
-z r means rectangular dither.
-z t means triangular dither.
-z s means shaped dither.
-z - means no dither(the default).

Dither is used to make the audio cleaner. The best way to describe it is to
imagine a painting with many dots. If you view it up close you can see each dot
and the image is not very clear. If you view it from far away the image becomes
clearer because your eyes/brain dither the dots to smooth out the image. It is
a murky subject and obviously a very personal choice as to what dither is the
best. For most people it is just plain magic. Anyone running at 16bit who cares
about quality or has CPU cycles to spare should run with dither. Triangular is
probably the best compromise of quality vs cpu cost (its very fast), but shaped
is the best.

Document prepared by Patrick Shirkey <pshirkey_at_boosthardware.com>
Thanks to everyone who contributes, wittingly or not...

