This version (2017/11/17 12:20) was approved by stohrendorf.The Previously approved version (2017/10/06 22:34) is available.

Music and Sound

As it was described in the beginning, all sound in TR engines can be separated into two distinctive sections — audio tracks and sounds.

Audio tracks are long separate files which are loaded by streaming, and usually they contain background ambience, music or voiceovers for cutscenes.

Sounds are short audio samples, which are frequently played on eventual basis. While engine can usually play one audio track at a time, it can play lots of sounds in the same time, and also play numerous copies of the same sound (for example, when two similar enemies are roaming around).

Audio Tracks

Audio tracks can be looped or one-shot. Looped tracks are usually contain background ambience (these creepy sounds heard in the beginning of “Caves” and so on), but occasionally they can use music (e. g., “Jeep theme” from TR4). One-shot tracks are used for musical pieces which are usually triggered on certain event, and also for “voice chatting” (e. g. approaching the monk from “Diving Area” in TR2).

As both looped and one-shot tracks use the same audio playing routine, there’s no chance both looped and one-shot tracks could be played simultaneously. This is the reason why background ambience stops and restarts every time another (one-shot) track is triggered. However, this limitation was lifted in TREP and TRNG.

The audio tracks are stored in different fashions in the various versions of the TR series:

TR1 and TR2

TR1 and TR2 used CD-Audio tracks for in-game music, and therefore, they needed auxiliary CD-audio fed into the soundcard. That’s the reason why most contemporary PCs have issues with audiotrack playback in these games — such setup is no longer supported in modern CD/DVD/BD drives, and digital pipeline is not always giving the same result. Currently, various modernized game repacks (such as Steam or GOG releases) officially features no-CD cracks with embedded MP3 player, which takes place of deprecated CD-Audio player.

In the Macintosh versions, the CD audio tracks are separate files in AIFF format.

TR3

In TR3, we have somewhat special audiotrack setup. Audio format was changed to simple WAV (MS-ADPCM codec), but all tracks were embedded into CDAUDIO.WAD file, which also contained a header with a list of all tracks and their durations. So, when game requests an audiotrack to play, it takes info on needed track from CDAUDIO.WAD header, and then goes straight to an offset for this track into it. The format of CDAUDIO.WAD header entry is:

struct tr3_cdaudio_entry    // 0x10C bytes
{
char Name[260];     // C string with track name
uint32_t WavLength;     // Wave file size
uint32_t WavOffset;     // Absolute offset in CDAUDIO.WAD
};

The number of header entries is always 130 (meaning the whole size of header should be 0x10C * 130). Header is immediately followed by embedded audio files in WAV format.

In the Macintosh version of TR3, these tracks are separate files in WAV format. The Macintosh version of TR3 contains an additional file, CDAudio.db, which contains the names of all the track files as 32-byte zero-padded C strings with no extra contents.

TR4 and TR5

In TR4-5, track format remained the same (MS-ADPCM), but tracks were no longer embedded into CDAUDIO.WAD. Instead, each track was saved as simple .WAV file, and file names themselves were embedded into executable. Hence, when TR4-5 plays an audiotracks, it refers to internal filename table, and then loads an audiotrack with corresponding name.

Sounds

In TR engines, sounds appear in a variety of contexts.

They can be either continuous or triggered. Continuous ones are usually produced by sound source object, which make sound in a range around some specific point (range appears to be hardcoded, and is equal to around 8 sectors). Likewise, triggered ones can be triggered by a variety of events. The triggering can be hardcoded in the engine (for example, gunshots) or by reaching some animation frame (footsteps, Lara’s somewhat unladylike sounds). Flipeffects can also be used to play certain sound sample in specific circumstances, and either called by triggers or from animations.

Sounds may be looped or one shot, with looped ones playing until specifically untriggered by some in-game event. For example, Uzi and M16 sounds are looped and will be played until Lara stops firing these weapons — in such case, engine sends a command to stop this particular sound.

Sounds may be global or local. Global sounds have no 3D positioning in world space, they always have constant volume and usually not linked to any object in level. These are primarily menu sounds. Local sounds are usually produced by entities or sound source objects. They have certain coordinates in space, and could be affected by their position and environment. Local sounds, when emitted by entities, may travel along with these entities.

Sounds are referred to by an internal sound index; this is translated into which sound sample with the help of three layers of indexing, to allow for a suitable degree of abstraction. Internal sound indices for various sounds are consistent across all the level files in a game; a gunshot or a passport opening in one level file will have the same internal sound index as in all the others. The highest level of these is the SoundMap[] array, which translates the internal sound index into an index into SoundDetails[]. Each SoundDetails record contains such details as the sound volume, pitch and volume randomization, looping configuration, how many samples to select from, and an index into SampleIndices[]. This allows for selecting among multiple samples to produce variety; that index is the index to the SampleIndices[] value of first of these, with the rest of them being having the next indices in series of that array. Thus, if the number of samples is 4, then the TR engine looks in SampleIndices[] locations Index, Index+1, Index+2, and Index+3. Finally, the SampleIndices[] array references some arrays of sound samples.

Wave format used in TR1/TR2 and TR3/TR4/TR5 is different. While TR1 and TR2 used 8-bit 11 kHz data, TR3 onwards switched to 16-bit 22 kHz data. However, PlayStation versions of TR1 and TR2 used 16-bit samples as well, which generally made PlayStation version sound quality better, without dithering artifacts.

Additionally, TR4 and TR5 introduced usage of MS-ADPCM codec to store sample data, which was compressed and loaded into buffers on level loading. However, TR4 engine version bundled with TRLE used uncompressed wave data, as in TR1-TR3.

In TR1, TR4 and TR5 samples themselves are embedded in the level files. In TR1, SampleIndices[] array contains the displacements of each sample in bytes from the beginning of that embedded block. In TR4 and TR5, way to access sample data was changed — each sample data is preceded by uncompressed size and compressed size uint32_t values, which are used to extract given sample and load it into DirectSound buffer.

struct tr4_sample // (variable length)
{
uint32_t UncompSize;
uint32_t CompSize;
uint8_t SoundData[CompSize]; // zlib-compressed sound data (CompSize bytes)
};
While compressed size defines the whole size of embedded .WAV file, uncompressed size defines the size of raw PCM data size in 16-bit, 22050 kHz, mono format. However, uncompressed size does not necessarily equal to a value derived from compressed size by wave type conversion, because MS-ADPCM codec tends to leave a bit of silence in the end of the file (which produces audible interruption in looped samples).

In TR2 and TR3, these samples are concatenated in the file MAIN.SFX with no additional information; SampleIndices[] contains sequence numbers (0, 1, 2, 3, …) in MAIN.SFX. Finally, the samples themselves are all in Microsoft WAVE format.

Sound Data Structures

Sound Sources

This structure contains the details of continuous-sound sources. Although a SoundSource object has a position, it has no room membership; the sound seems to propagate omnidirectionally for about 8 horizontal-grid sizes without regard for the presence of walls.

struct tr_sound_source // 16 bytes
{
int32_t x;         // absolute X position of sound source (world coordinates)
int32_t y;         // absolute Y position of sound source (world coordinates)
int32_t z;         // absolute Z position of sound source (world coordinates)
uint16_t SoundID;   // internal sound index
uint16_t Flags;     // 0x40, 0x80, or 0xC0
};

Flags field defines sound source behaviour, if it was placed in room which can be flipped to alternate room:

• 0x40: Play sound only when room is in alternate state.
• 0x80: Play sound only when room is in original state.
• 0xC0: Always play sound (in both original and alternate rooms).

Sound Map

SoundMap is used for mapping from internal-sound index to SoundDetails index; it is 256 int16_t in TR1, 370 int16_t in TR2, TR3 and TR4, and 450 int16_t in TR5. A value of -1 (0xFFFF) indicates “none”, meaning the sample is not used in current level.

Each SoundDetails entry can be described as such:

struct tr_sound_details // 8 bytes
{
uint16_t Sample; // (index into SampleIndices)
uint16_t Volume;
uint16_t Chance; // If !=0 and ((rand()&0x7fff) > Chance), this sound is not played
uint16_t Characteristics;
};

Characteristics is a packed field containing various options for this particular sound detail:

Bit Hex Flag Description
0 0x0003 Looping behaviour either normal playback (value 00), one-shot rewound (01 in TR1/TR2, 10 in other games), meaning the sound will be rewound if triggered again, or looped (value 10 in TR1, 11 in other games), meaning the sound will be looped until strictly stopped by an engine event. Since TR3, one-shot wait mode is introduced (value 10), meaning the same sound will be ignored until current one stops
1
2 0x00FC Number of sound samples in this group If there are more than one samples, then engine will select one to play based on randomizer (for example, listen to Lara footstep sounds)
3
4
5
6
7
8
9
10
11
12 0x1000 No panoramic Disables panoramic calculation of a given sample (always plays at center).
13 0x2000 Randomize pitch When this flag is set, sound pitch will be slightly varied with each playback event
14 0x4000 Randomize gain When this flag is set, sound volume (gain) will be slightly varied with each playback event
15

In TR3 onwards, [tr_sound_details] structure was rearranged:

struct tr3_sound_details  // 8 bytes
{
uint16_t Sample; // (index into SampleIndices)
uint8_t Volume;
uint8_t Range;
uint8_t Chance;
uint8_t Pitch;
int16_t Characteristics;
};

Range now defines radius (in sectors), on which this sound can be heard. Previously (in TR1 and TR2), each sound had a predefined range about 8 sectors.

Pitch specifies absolute pitch volume for this sound (may be also varied by bit 13 of Characteristics). Mainly, this value was used to “speed-up” certain samples, thus allowing to keep high-quality samples with lower sample rate, or on contrary, “slow down” sample, making it sound longer than its native sample rate, thus conserving some memory.

Sample Indices

In TR1, this is an offset value array, each offset of which points into the embedded sound-samples block, which follows this array in the level file. In TR2 and TR3, this is a list of indices into the file ‘MAIN.SFX` file; the indices are the index numbers of that file’s embedded sound samples, rather than the samples’ starting locations. That file itself is a set of concatenated sound files with no catalogue info present.

Sample indices are not used in TR4 and TR5, and this data block is missing from level file, replaced by six zero bytes, which finalize zlib compressed level block followed by embedded sound sample data.