This commit is contained in:
Shadowfacts 2022-05-15 21:18:57 -04:00
parent 8d73d7de4a
commit d9b98519cd
1 changed files with 3 additions and 3 deletions

View File

@ -12,7 +12,7 @@ Armed with the ID3 decoder from my [last post](/2020/parsing-id3-tags/), we can
First, a overview of the anatomy of an MP3 file. As we went through in the last post, an MP3 file may optionally start with an ID3 tag containing metadata about the track. After that comes the meat of the file: a sequence of MP3 frames. Each frame contains a specific amount of encoded audio data (the actual amount is governed by a number of properties encoded in the frame header). The total duration of the file is simply the sum of the durations of the individual frames. Each MP3 frame starts with a marker sequence of 11 1-bits followed by three bytes of flags describing the contents of the frame and then the audio data itself.
Based on this information, many posts on various internet formus suggest inspecting the first frame to find its bitrate, and then dividing the bit size of the entire file by the bitrate to get the total duration. There are a few problems with this though. The biggest is that whole MP3 files can generally be divided into two groups: constant bitrate (CBR) files and variable bitrate (VBR) ones. Bitrate refers to the number of bits used to represent audio data for a certain time interval. As the name suggests, files encoded with a constant bitrate use the same number of bits per second to represent audio data throughout the entire file. For the naive length estimation method, this is okay (though not perfect, because it doesn't account for the frame headers and any potential unused space in between frames). In variable bitrate MP3s though, each frame can have a different bitrate, which allows the encoder to work more space-efficiently (because portions of the audio that are less complex can be encoded at a lower bitrate). Because of this, the naive estimation doesn't work at all (unless, by coincidence, the bitrate of the first frame happens to be close to the average bitrate for the entire file). In order to accurately get the duration for a VBR file, we need to go through every single frame in the file and sum their individual durations. So that's what we're gonna do.
Based on this information, many posts on various internet forums suggest inspecting the first frame to find its bitrate, and then dividing the bit size of the entire file by the bitrate to get the total duration. There are a few problems with this though. The biggest is that whole MP3 files can generally be divided into two groups: constant bitrate (CBR) files and variable bitrate (VBR) ones. Bitrate refers to the number of bits used to represent audio data for a certain time interval. As the name suggests, files encoded with a constant bitrate use the same number of bits per second to represent audio data throughout the entire file. For the naive length estimation method, this is okay (though not perfect, because it doesn't account for the frame headers and any potential unused space in between frames). In variable bitrate MP3s though, each frame can have a different bitrate, which allows the encoder to work more space-efficiently (because portions of the audio that are less complex can be encoded at a lower bitrate). Because of this, the naive estimation doesn't work at all (unless, by coincidence, the bitrate of the first frame happens to be close to the average bitrate for the entire file). In order to accurately get the duration for a VBR file, we need to go through every single frame in the file and sum their individual durations. So that's what we're gonna do.
The overall structure of the MP3 decoder is going to be fairly similar to the ID3 one. We can even take advantage of the existing ID3 decoder to skip over the ID3 tag at the beginning of the file, thereby avoiding any false syncs (the `parse_tag` function needs to be amended to return the remaining binary data after the tag in order to do this). From there, it's simply a matter of scanning through the file looking for the magic sequence of 11 1-bits that mark the frame synchronization point and repeating that until we reach the end of the file.
@ -130,7 +130,7 @@ def parse_frame(...) do
end
```
We call the individual lookup functions for each of the pieces of data we need from the header. Using Elixir's `with` statement lets us pattern match on a bunch of values together. If any of the functions return `:invalid`, the pattern match will fail and it will fall through to the `else` part of thewith statement that matches anything. If that happens, we skip the first byte from the binary and recurse, looking for the next potential sync point.
We call the individual lookup functions for each of the pieces of data we need from the header. Using Elixir's `with` statement lets us pattern match on a bunch of values together. If any of the functions return `:invalid`, the pattern match will fail and it will fall through to the `else` part of the with statement that matches anything. If that happens, we skip the first byte from the binary and recurse, looking for the next potential sync point.
Inside the main body of the with statement, we need to find the number of samples in the frame.
@ -183,7 +183,7 @@ defp lookup_slot_size(:layer1), do: 4
defp lookup_slot_size(l) when l in [:layer2, :layer3], do: 1
```
One thing to note is that we `floor` the size before returning it. All of the division changes the value into a floating point, albeit one for which we know the decimal component will be zero. `floor`ing it turns it into an actual integer (`1` rather than `1.0`) because using a floating point value in a binary battern will cause an error.
One thing to note is that we `floor` the size before returning it. All of the division changes the value into a floating point, albeit one for which we know the decimal component will be zero. `floor`ing it turns it into an actual integer (`1` rather than `1.0`) because using a floating point value in a binary pattern will cause an error.
With that implemented, we can call it in the `parse_frame` implementation to get the number of bytes that we need to skip. We also perform the same calculation to get the frame duration. Then we can skip the requisite number of bytes of data, add the values we calculated to the various accumulators and recurse.