See Also: AudioFormat
public class AudioFormat : Object
The AudioFormat class is used to access a number of audio format and channel configuration constants. They are for instance used in AudioTrack and AudioRecord, as valid values in individual parameters of constructors like AudioTrack.AudioTrack(IntPtr,JniHandleOwnership), where the fourth parameter is one of the AudioFormat.ENCODING_* constants. The AudioFormat constants are also used in MediaFormat to specify audio related values commonly used in media, such as for MediaFormat.KeyChannelMask.
The NoType:android/media/AudioFormat$Builder;Href=../../../reference/android/media/AudioFormat.Builder.html class can be used to create instances of the AudioFormat format class. Refer to NoType:android/media/AudioFormat$Builder;Href=../../../reference/android/media/AudioFormat.Builder.html for documentation on the mechanics of the configuration and building of such instances. Here we describe the main concepts that the AudioFormat class allow you to convey in each instance, they are:
Closely associated with the AudioFormat is the notion of an , which is used throughout the documentation to represent the minimum size complete unit of audio data.
Expressed in Hz, the sample rate in an AudioFormat instance expresses the number of audio samples for each channel per second in the content you are playing or recording. It is not the sample rate at which content is rendered or produced. For instance a sound at a media sample rate of 8000Hz can be played on a device operating at a sample rate of 48000Hz; the sample rate conversion is automatically handled by the platform, it will not play at 6x speed.
As of API NoType:android/os/Build$VERSION_CODES;Href=../../../reference/android/os/Build.VERSION_CODES.html#M, sample rates up to 192kHz are supported for AudioRecord and AudioTrack, with sample rate conversion performed as needed. To improve efficiency and avoid lossy conversions, it is recommended to match the sample rate for AudioRecord and AudioTrack to the endpoint device sample rate, and limit the sample rate to no more than 48kHz unless there are special device capabilities that warrant a higher rate.
Audio encoding is used to describe the bit representation of audio data, which can be either linear PCM or compressed audio, such as AC3 or DTS.
For linear PCM, the audio encoding describes the sample size, 8 bits, 16 bits, or 32 bits, and the sample representation, integer or float.
- AudioFormat.ENCODING_PCM_8BIT: The audio sample is a 8 bit unsigned integer in the range [0, 255], with a 128 offset for zero. This is typically stored as a Java byte in a byte array or ByteBuffer. Since the Java byte is signed, be careful with math operations and conversions as the most significant bit is inverted.
- AudioFormat.ENCODING_PCM_16BIT: The audio sample is a 16 bit signed integer typically stored as a Java short in a short array, but when the short is stored in a ByteBuffer, it is native endian (as compared to the default Java big endian). The short has full range from [-32768, 32767], and is sometimes interpreted as fixed point Q.15 data.
- AudioFormat.ENCODING_PCM_FLOAT: Introduced in
API VERSION_CODES.Lollipop, this encoding specifies that
the audio sample is a 32 bit IEEE single precision float. The sample can be
manipulated as a Java float in a float array, though within a ByteBuffer
it is stored in native endian byte order.
The nominal range of ENCODING_PCM_FLOAT audio data is [-1.0, 1.0].
It is implementation dependent whether the positive maximum of 1.0 is included
in the interval. Values outside of the nominal range are clamped before
sending to the endpoint device. Beware that
the handling of NaN is undefined; subnormals may be treated as zero; and
infinities are generally clamped just like other values for AudioTrack
– try to avoid infinities because they can easily generate a NaN.
To achieve higher audio bit depth than a signed 16 bit integer short, it is recommended to use ENCODING_PCM_FLOAT for audio capture, processing, and playback. Floats are efficiently manipulated by modern CPUs, have greater precision than 24 bit signed integers, and have greater dynamic range than 32 bit signed integers. AudioRecord as of API NoType:android/os/Build$VERSION_CODES;Href=../../../reference/android/os/Build.VERSION_CODES.html#M and AudioTrack as of API VERSION_CODES.Lollipop support ENCODING_PCM_FLOAT.
For compressed audio, the encoding specifies the method of compression, for example AudioFormat.ENCODING_AC3 and AudioFormat.ENCODING_DTS. The compressed audio data is typically stored as bytes in a byte array or ByteBuffer. When a compressed audio encoding is specified for an AudioTrack, it creates a direct (non-mixed) track for output to an endpoint (such as HDMI) capable of decoding the compressed audio. For (most) other endpoints, which are not capable of decoding such compressed audio, you will need to decode the data first, typically by creating a MediaCodec. Alternatively, one may use MediaPlayer for playback of compressed audio files or streams.
When compressed audio is sent out through a direct AudioTrack, it need not be written in exact multiples of the audio access unit; this differs from MediaCodec input buffers.
Channel masks are used in AudioTrack and AudioRecord to describe
the samples and their arrangement in the audio frame. They are also used in the endpoint (e.g.
a USB audio interface, a DAC connected to headphones) to specify allowable configurations of a
As of API NoType:android/os/Build$VERSION_CODES;Href=../../../reference/android/os/Build.VERSION_CODES.html#M, there are two types of channel masks: channel position masks and channel index masks.
Channel position masksChannel position masks are the original Android channel masks, and are used since API NoType:android/os/Build$VERSION_CODES;Href=../../../reference/android/os/Build.VERSION_CODES.html#BASE. For input and output, they imply a positional nature - the location of a speaker or a microphone for recording or playback.
For a channel position mask, each allowed channel position corresponds to a bit in the channel mask. If that channel position is present in the audio frame, that bit is set, otherwise it is zero. The order of the bits (from lsb to msb) corresponds to the order of that position's sample in the audio frame.
The canonical channel position masks by channel count are as follows:
These masks are an ORed composite of individual channel masks. For example AudioFormat.CHANNEL_OUT_STEREO is composed of AudioFormat.CHANNEL_OUT_FRONT_LEFT and AudioFormat.CHANNEL_OUT_FRONT_RIGHT.
Channel index masksChannel index masks are introduced in API NoType:android/os/Build$VERSION_CODES;Href=../../../reference/android/os/Build.VERSION_CODES.html#M. They allow the selection of a particular channel from the source or sink endpoint by number, i.e. the first channel, the second channel, and so forth. This avoids problems with artificially assigning positions to channels of an endpoint, or figuring what the ith position bit is within an endpoint's channel position mask etc.
Here's an example where channel index masks address this confusion: dealing with a 4 channel USB device. Using a position mask, and based on the channel count, this would be a AudioFormat.CHANNEL_OUT_QUAD device, but really one is only interested in channel 0 through channel 3. The USB device would then have the following individual bit channel masks: AudioFormat.CHANNEL_OUT_FRONT_LEFT, AudioFormat.CHANNEL_OUT_FRONT_RIGHT, AudioFormat.CHANNEL_OUT_BACK_LEFT and AudioFormat.CHANNEL_OUT_BACK_RIGHT. But which is channel 0 and which is channel 3?
For a channel index mask, each channel number is represented as a bit in the mask, from the lsb (channel 0) upwards to the msb, numerically this bit value is 1 . A set bit indicates that channel is present in the audio frame, otherwise it is cleared. The order of the bits also correspond to that channel number's sample order in the audio frame.
For the previous 4 channel USB device example, the device would have a channel index mask 0xF. Suppose we wanted to select only the first and the third channels; this would correspond to a channel index mask 0x5 (the first and third bits set). If an AudioTrack uses this channel index mask, the audio frame would consist of two samples, the first sample of each frame routed to channel 0, and the second sample of each frame routed to channel 2. The canonical channel index masks by channel count are given by the formula (1 .
- Channel position mask for an endpoint:CHANNEL_OUT_FRONT_LEFT, CHANNEL_OUT_FRONT_CENTER, etc. for HDMI home theater purposes.
- Channel position mask for an audio stream: Creating an AudioTrack to output movie content, where 5.1 multichannel output is to be written.
- Channel index mask for an endpoint: USB devices for which input and output do not correspond to left or right speaker or microphone.
- Channel index mask for an audio stream: An AudioRecord may only want the third and fourth audio channels of the endpoint (i.e. the second channel pair), and not care the about position it corresponds to, in which case the channel index mask is 0xC. Multichannel AudioRecord sessions should use channel index masks.
For linear PCM, an audio frame consists of a set of samples captured at the same time, whose count and channel association are given by the , and whose sample contents are specified by the . For example, a stereo 16 bit PCM frame consists of two 16 bit linear PCM samples, with a frame size of 4 bytes. For compressed audio, an audio frame may alternately refer to an access unit of compressed data bytes that is logically grouped together for decoding and bitstream access (e.g. MediaCodec), or a single byte of compressed data (e.g. AudioTrack.BufferSizeInFrames), or the linear PCM frame result from decoding the compressed data (e.g.AudioTrack.PlaybackHeadPosition), depending on the context where audio frame is used.
Assembly: Mono.Android (in Mono.Android.dll)
Assembly Versions: 0.0.0.0
Since: Added in API level 3
The members of Android.Media.AudioFormat are listed below.
See Also: Object