Reggae tutorial: Saving audio in user selected format

From MorphOS Library

Grzegorz Kraszewski

Introduction

Except of automatic media decoding, Reggae has also a feature on automatic encoding. There is one fundamental difference between these two however. In case of decoding, a format of decoded media, codec parameters, stream parameters, metadata, all this comes from the decoded datastream. When media are encoded, format, codec parameters and metadata have to be set by application.

Usually an application wants to offer all available formats to user. It means that application author has to maintain GUI for all codecs and their parameters. Also such a GUI would have to be updated with every new released codec. Reggae changes this and makes application programmer's life easier. The main rule is that multiplexer and encoder classes provide GUI. Reggae gathers those GUIs in a single object and returns it to application. Then application can embed this compound GUI object into its own interface. User can use this object to select encoder and its parameters. This is not all however.

After user selects output format and its parameters, Reggae can create encoder-muxer pair, read parameters and metadata from the GUI, and set them to proper objects. Then application connects the data source at input and output stream class at output. After triggering processing with MMM_Play() method on the output stream object, audio data is written.

Using Reggae media save API, application programmer need not to care about what formats Reggae supports. To say more, even if new codecs are released after the application release, the application will use them without a need for update.

Preparing Source Data

Source audio data may be prepared either as a static buffer, or generated on-demand. Static buffer technique is simplier and will be described here. Realtime generation may be accomplished by writing a custom Reggae class (see source code of U1Synth), or by using datapush.stream class. Source data also may come from Reggae, for example from some decoded audio file. This is the case for an audio converter Zormanita, which also is opensourced.

Data in a buffer should be in one of Reggae common formats. There are three common formats for audio: INT16, INT32 and FLOAT32. The choice depends on purpose and required processing quality. INT16 is the fastest format and is best suited for playback. FLOAT32 has more or less 24-bit resolution and may be handy for many synthesis algorithms, which are often easier to implement with floating point math. INT32 is the slowest one, but provides the best quality, usually better than further stages of processing. All three formats use signed numbers with host native byte order (which is big endian for PowerPC). Also floating point range is [−1.0; +1.0]. While it may be temporarilly exceeded, any Reggae class is allowed to clip data to this range. If audio is multichannel, channels are interleaved (for stereo the order is L, R).

Start address of the buffer should be AltiVec aligned. While this is not a requirement for data fetched via memory.stream object (more about it later), it may speed the processing up a bit. The easiest way to get the alignment it is to alloc the buffer with MediaAllocVec() function of multimedia.class. On the other hand buffer size need not to be aligned other than to the size of a single audio frame (set of samples of all the channels for one time point). Of course filling the buffer with data is up to application. In the example code below a buffer is created with 1 second of monophonic audio sampled at 44.1 kHz. It contains 44 100 INT16 audio samples, so the size of buffer is 88 200 bytes.

WORD *Buffer;

Buffer = (WORD*)MediaAllocVec(88200);

Then the buffer is filled with sound. As an example it is shown how it can be filled by a 1 kHz sine wave.

LONG i;

for (i = 0; i < 44100; i++) Buffer[i] = (WORD)(sin(2000 * M_PI * i / 44100) * 32767.0);

The code is totally unoptimized, but has been left as such to be easy readable.

Creating Format Selection GUI

Creating a MUI object containing format selector and GUIs for all encoders is very easy. It is just one call to MediaGetGuiTagList() function. As the name suggests, a taglist can be used to control the GUI creation. Here is an example of it:

Object *FormatSelector;

FormatSelector = MediaGetGuiTags(
  MGG_Type, MGG_Type_Muxers,
  MGG_Media, MMT_SOUND,
  MGG_Selector, MGG_Selector_List,
  MUIA_Frame, MUIV_Frame_Group,
  MUIA_Background, MUII_GroupBack,
TAG_END);

Let's discuss the tags. The first one, MMG_Type selects type of Reggae classes queried for GUI. There are two possibilities currently. MGG_Type_Muxers is used when one encodes and saves media stream. There is also MGG_Type_Filters, which may be used for create Reggae filter selector for media processing. The second one will be covered in other articles. The next tag, MGG_Media defines kind of saved media. When one has audio data it makes no sense to display image or video formats. The value MMT_SOUND makes sure only audio codecs are shown. The third tag, MGG_Selector influences visual appearance of the whole GUI object. Graphics interfaces provided by codecs are placed as pages of a MUI page group. Then one needs a gadget for flipping those pages. It may be either a list, or a cycle gadget, as shown below:

Mediaformat1.png     Mediaformat2.png

The form shown on the left uses cycle gadget for format selection. This style is more compact. On the other hand, list format selection has an advantage of showing immediately the choice of formats. In case of cycle gadget, user has to click it, to get a popup menu. Cycle selector is recommended only in case where space for GUI is limited (for example in custom filerequesters).

The object returned is a subclass of MUI Group class. As such it can have a frame and background assigned if needed. In the example code it has the standard group frame and group background (the frame is not shown on pictures above). The object is usually added to application's GUI as a value of some MUIA_Group_Child tag. In most cases it is created statically at application initialization. Zormanita and U1Synth create this object statically. It can be also created dynamically, for example as an element of dynamic window. All typical MUI techniques apply.

Building Reggae Processing Chain

Memory Buffer as Data Source

Data contained in a memory buffer need some work to be used as a Reggae source. It need two Reggae objects to handle it. The first object is an instance of memory.stream class. At its output Reggae sees a stream of bytes, without further meaning or structure. Then we have to tell Reggae, it is not just a stream of bytes, but stream of audio samples with given type, number of channels, sampling rate and so on. This is done with rawaudio.filter object.

While it looks unnecessarily complicated, it allows for more freedom. By changing memory.stream to file.stream one can fetch audio data from mass storage, so is not limited by available memory. It also should be noted that both memory.stream and rawaudio.filter are just wrappers. They do not introduce any processing overhead or unnecessary data copying. Reggae stream using a buffer in memory is created as follows:

Object *stream;
QUAD stream_length = 88200;

stream = NewObject(NULL, "memory.stream",
  MMA_Stream_Handle, (IPTR)Buffer,
  MMA_Stream_Length, (IPTR)&stream_length,
TAG_END);

As the MMA_Stream_Length attribute takes QUAD number, it has to be passed via pointer. Length must be specified for memory.stream, as it has no natural end, like for example file.stream. Of course memory.stream class has to be opened previously with OpenLibrary(), as described here.

The next step is rawaudio.filter. Its creation is pretty straightforward:

Object *raw_audio;

raw_audio = NewObject(NULL, "rawaudio.filter",
 MMA_Sound_Channels, 1,
 MMA_Sound_SampleRate, 44100,
TAG_END);

MediaSetPort(raw_audio, 1, MMA_Port_Format, MMFC_AUDIO_INT16);

Note that audio format is not specified as a tag for object constructor, but is just set for the output port. As the last step in this section, we connect these two objects together:

MediaConnectTagList(stream, 0, raw_audio, 0, NULL);

Stream output is now connected to rawaudio.filter input.

Encoder, Muxer and their Setup

This step is automatically performed by Reggae. To set up the encoder and multiplexer duo, Reggae needs two sources of information:

  • GUI object with all selections made by user.
  • Source data stream.

Both are provided as arguments to MediaBuildFromGuiTags() function:

Object *codec;

codec = MediaBuildFromGuiTags(format_selector, raw_audio, 1, TAG_END);

The function does a complex task of setting up media codec. Let's analyse it step by step:

  • Reggae checks in the GUI which format has been selected by user. It opens appropriate multiplexer class and creates its instance.
  • Based on detailed GUI selections, Reggae determines which encoder will cooperate with multiplexer selected before. Again its class is opened and object created.
  • Encoder is connected to the data source passed in arguments (object and port). Encoder parameters are set according to GUI and source data parameters.
  • Multiplexer is connected to encoder. Again, multiplexer parameters (if any) are set according to GUI and parameters of stream.
  • Both the objects (plus any auxiliary objects Reggae may create silently) are packed into a single compound object and returned as such.

At the end of this step, we have an almost complete and connected Reggae processing chain for saving audio. The output port of codec is not connected, waiting for writing stream object.

Data Output

The last thing we need is an object writing encoded data. In most cases it is file.output instance. One creates it with following code:

Object *writer;

writer = NewObject(NULL, "file.output",
  MMA_StreamName, (IPTR)"somefile",
TAG_END);

In this example the file name is hardcoded, of course it is bad style. For shell based programs the file name comes from commandline arguments, for GUI programs it comes from a filerequester, or MUI Popasl object. Of course file.output class accepts complete paths, both absolute (starting with volume name) and relative to the current program directory. After creation, output object is connected with codec output:

MediaConnectTagList(codec, 1, writer, 0, NULL);

Now the whole chain is ready to save data. We just need to pull a trigger...

Saving Audio

An object of file.output class creates a subprocess. All the work related to encoding media and writing it to disk is done by this subprocess. While writing, application is not blocked. Because of this it may be a good idea to disable gadgets triggering media saving for the time of writing.

As saving is asynchronous, there must be a way to notify the application when it is finished. It is done exactly the same way as for audio playback, one in fact uses even the same methods. Then we can either get a signal or a reply for a message, when saving is finished. Waiting for such a signal or message may be added to the MUI event loop.

Notification on end of saving is usually set before starting it. Minimalistic code in a shell based application may look like this:

DoMethod(writer, MMM_SignalAtEnd, (IPTR)FindTask(NULL), SIGBREAKB_CTRL_C);
DoMethod(writer, MMM_Play);
Wait(SIGBREAKF_CTRL_C);

Note that while Wait() takes a signal mask, MMM_SignalAtEnd() takes a signal number. Do not mistake them. Of course such code blocks application for the saving time. More advanced one may for example check progress periodically and display some messages. For example Zormanita updates progressbar while saving.

While saving media may be aborted at any time by executing MMM_Stop() method on the writer object, it is not recommended. Currently it is only safe for raw format (rawaudio.muxer). AIFF and WAVE multiplexers write headers at start and do not update them when writing is stopped. Then aborting the writer will left saved files incomplete, with header values not matching data actually stored. This behaviour may or may not be improved in the future. It is up to the application then to delete such broken files (it also include cases where writing is aborted due to I/O error).

Cleanup

During cleanup application has to close all classes it created and dispose all objects. There are some exceptions however. Once MUI object returned by MediaGetGuiTags() is added to a MUI application, it need not to be disposed separately. It will be automatically disposed, when MUI Application object is disposed.

Reggae objects should be disposed in order from output to source:

DisposeObject(writer);
DisposeObject(codec);
DisposeObject(raw_audio);
DisposeObject(stream);

This order is safe even if writer object is running when being disposed. If the order is not maintained and running chain is being disposed, it is theoretically possible that Reggae accesses already freed memory.

Reggae classes used should be also closed, except of classes for muxer and encoder. As these classes are automatically opened by Reggae, they are also automatically closed. Then, in the example, only memory.stream, rawaudio.filter and file.output classes must be closed by application. To sum it up, what is explicitly opened/created by application, must be explicitly closed/disposed. Anything created automatically by Reggae is handled by Reggae. Finally, MUI stuff is disposed automatically by MUI.

Saver Control Without GUI

While the GUI object provided by Reggae for choosing encoder suits most needs of applications, there are cases, where some lower level control is needed. Some examples include (but are not limited to):

  • shell commands
  • scripting interface
  • application log/undo
  • applications using fixed output format

This lower level control is implemented as an argument string. This string contains the audio format specification in textual, human readable form, very similar to form of arguments of MorphOS shell commands. In fact Reggae argument string is parsed with the same function as shell arguments, namely ReadArgs() from dos.library. To say more, even when application builds a Reggae saver chain from GUI, argument string is used internally as an intermediate stage.

Saver API.png

How one builds a Reggae saver from an argument string? There is a function in multimedia.class, named MediaBuildFromArgsTags(). It works exactly the same as MediaBuildFromGuiTags(), the only difference is it takes argument string instead of GUI object. But how the argument string looks like? Let's look at a simple example:

WAVE BITS=24

Every Reggae saver specification consists of two parts. The first part is multiplexer class name, just without the ".muxer" extension. It is capitalized here, but Reggae argument strings are case insensitive. The second part is the rest of string (after skipping whitespaces after multiplexer name) so it can contain multiple arguments. In some cases the second part may be empty, as most formats have default values for encoder parameters. For example one can use simply

AIFF

to have data stored as 16-bit AIFF file. Of course every multiplexer has its own set of parameters, described in its autodoc.

Applications wanting to log operations done with GUI, or create some undo history can also retrieve the argument string from GUI object with MediaGetArgsFromGui(). It will generate the argument string basing on current gadgets states.

Compared to GUI, the textual interface to Reggae savers has two shortcomings:

  • User (or application, if the argument string is not given by user) has to know which multiplexers are installed in the system and has to know encoder parameters.
  • GUI is designed in a way, that does not allow user to enter senseless or conflicting combinations of parameters (for example rawaudio.muxer GUI automatically disables "Byte Order" gadget, when 8-bit samples are selected). For textual definition parameter validation is done when Reggae chain is built, and MediaBuildFromArgsTags() simply fail when arguments make no sense.

Yet Another Example

The third example application using Reggae saver API is Zgrzytor. Unlike Zormanita and U1Synth, it is a shell command using textual saver description explained in the previous section. Zgrzytor is a simple noise generator using a shift register with EXOR feedback. It is a well known technique for generate periodic sounds (for short register) or pseudorandom noise (for long register). User can specify taps (bits) of the shift register where EXOR gate feeding back the register is connected. Sampling rate of output, output file name and output format can be also specified. Some example Zgrzytor call may be:

Zgrzytor 3 11 22050 somenoise.wav WAVE BITS=32

This will attach EXOR to bits 3 and 11 (bits are counted from feed side and from 1) of the shift register and generate 5 seconds (time is hardcoded inside program) of noise, set sampling rate to 22.05 kHz and save as 32-bit WAVE (Zgrzytor works with 32-bit integers, so it can produce "real" 32-bit sound). The last, emboldened part of the call is just Reggae saver argument string. More about Zgrzytor can be found on its homepage. Of course full source code of Zgrzytor is added to the archive.