Debugging FlacLibSharp

I maintain a .NET library to read and write FLAC metadata, FlacLibSharp which some people find useful.

There's a bug that I'd like to fix. I thought it could be interesting to document the process of finding and - hopefully - solving the problem.

The Bug

FlacLibSharp allows a user to add a "Vorbis Comment". Which is a "human readable name-value pair". For example:

ARTIST = Kraftwerk
ALBUM = The Man Machine

The Vorbis Comment format allows you to define more than one ARTIST tag:

ARTIST = Kraftwerk
ARTIST = Coldplay

I hadn't realized this until a user pointed out they were unable to do this with the library. So I added this feature.

But during some recent testing, I noticed I was able to write two tags, but when reading them back in I only got 1 of the two values back.

The Process

Step 1: A reproducable case

A bug is reproducable when there is a situation where the bug shows up every time. A bug that is not reproducable is really hard to fix because you don't know if it's fixed or just not showing up in your tests.

It helps to have a simple case that demonstrates the bug reliably. Luckily I have this.

I add a unit test (that will fail) to my test suite:

[TestMethod, TestCategory("VorbisCommentTests")]
public void WritingTwoArtistsShouldResultInTwoArtistsRead()
{
    string origFile = Path.Combine("Data", "testfile5.flac");
    string newFile = Path.Combine("Data", "testfile5_temp.flac");
    FileHelper.GetNewFile(origFile, newFile);

    using (FlacFile file = new FlacFile(Path.Combine("Data", "testfile5_temp.flac")))
    {
        var vorbisComment = new VorbisComment();

        vorbisComment["ARTIST"] = new VorbisCommentValues(new string[] { "Artist A", "Artist B" });

        file.Metadata.Add(vorbisComment);

        file.Save();
    }

    using (FlacFile file = new FlacFile(Path.Combine("Data", "testfile5_temp.flac")))
    {
        Assert.IsNotNull(file.VorbisComment);
        var artistValues = file.VorbisComment["ARTIST"];
        Assert.AreEqual(2, artistValues.Count);
        Assert.AreEqual("Artist A", artistValues[0]);
        Assert.AreEqual("Artist B", artistValues[1]);
    }
}

In this code, testfile5 contains no metadata yet. The test adds the ARTIST vorbis comment and writes it to the file. Then the file is opened again and the test will check if both artists are read.

When running the test it fails with the following error:

Message: Assert.AreEqual failed. Expected:<2>. Actual:<1>. 

A few things can be learned already from this result:

  • The newly created Vorbis Comment metadata block is added to the file (because the file had none to start with)
  • The Vorbis Comment block is found and read
  • Only one of the two tags is read
  • I discovered another bug that I won't discuss further (Issue 46)

Step 2: Excluding possible causes

A big question is: does the problem occur when we're writing the metadata to the file, or is it in the code that reads the metadata?

To answer that, I can open the temp file the test created and have a look at the metadata with another viewer. In Windows 10, I can right click on a flac file and go to "Properties", which will list the ARTIST values.

Interestingly, this only lists 1 Artist. But it could be Windows only supports one ARTIST value. To be sure, I use the "official" tool metaflac to double check the metadata.

Metaflac indeed confirms I only have one artist in the file:

metaflac --list testfile5_temp.flac
METADATA block #0
  type: 0 (STREAMINFO)
  is last: false
  length: 34
  minimum blocksize: 4096 samples
  maximum blocksize: 4096 samples
  minimum framesize: 1427 bytes
  maximum framesize: 7211 bytes
  sample_rate: 44100 Hz
  channels: 1
  bits-per-sample: 16
  total samples: 1703592
  MD5 signature: 1d2e54a059ea776787ef66f1f93d3e34
METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 46
  vendor string:
  comments: 1
	comment[0]: ARTIST=Artist A

Now I know I'll have to focus on the code that writes the metadata.

Step 3: Finding the code

The code is organized so that each type of metadata block (like "VORBIS_COMMENT") has it's own class. All of those classes inherit from the base class MetadataBlock. Each metadata class must override the WriteBlockData method:

/// <summary>
/// When overridden in a derived class, will write the data describing this metadata block to the given stream.
/// </summary>
/// <param name="targetStream">Stream to write the data to.</param>
public abstract void WriteBlockData(Stream targetStream);

So I'm pretty sure now the problem is somewhere in the WriteBlockData method of the VorbisComment class.

The code that writes the tags looks as follows:

foreach (var comment in this.comments)
{
    foreach (var value in comment.Value)
    {
        string commentText = string.Format("{0}={1}", comment.Key, value);
        byte[] commentData = System.Text.Encoding.UTF8.GetBytes(commentText);
        number = BinaryDataHelper.GetBytesUInt32((uint)commentData.Length);
        targetStream.Write(BinaryDataHelper.SwitchEndianness(number, 0, 4), 0, 4);
        targetStream.Write(commentData, 0, commentData.Length);
        totalLength += 4 + (uint)commentData.Length;
    }
}

When just reading it, it looks ok to me. For each comment, it writes each value to the output stream.

Step 4: Stepping through the code

Visual Studio comes with a step-by-step debugger. It can put a breakpoint in the comments loop and step through the loop.

To run the code, I can run the unit test in "Debug" mode: "Debug Test".

I can see in the code two values being written, in this order:

 ARTIST=Artist A
 ARTIST=Artist B

Also, the "total length" is updated twice by 15 (length of each tag) + 4 (length of the number indicating the length of the tag).

the "total length" before the loop was 8, after the loop it was 46. Which is indeed 8 + 15*2 + 4*2 = 8 + 30 + 8 = 46.

Interestingly, this is also the length indicated by the metaflac output:

METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 46
  vendor string:
  comments: 1
	comment[0]: ARTIST=Artist A

Now I'm thinking of two possible causes:

  • Option A: FLAC doesn't support more than one tag with the same key
  • Option B: I somehow never write the second ARTIST value to the file

Step 5: Excluding Option A

Does FLAC not support multiple tags?

On the flac website describing the format we can read the following:

VORBIS_COMMENT: This block is for storing a list of human-readable name/value pairs. Values are encoded using UTF-8. It is an implementation of the Vorbis comment specification (without the framing bit). This is the only officially supported tagging mechanism in FLAC. There may be only one VORBIS_COMMENT block in a stream. In some external documentation, Vorbis comments are called FLAC tags to lessen confusion. - xiph.org

Nothing special is being said about tags, and it is stated they use the Vorbis Comment specification, but without the framing bit.

Looking at the Vorbis Comment specification, I read the following:

Field names are not required to be unique (occur once) within a comment header. As an example, assume a track was recorded by three well know artists; the following is permissible, and encouraged:
ARTIST=Dizzy Gillespie
ARTIST=Sonny Rollins
ARTIST=Sonny Stitt

Given that this practice is even "encouraged", I think it's unlikely FLAC doesn't support it. But just to be sure, I try and add two artists using metaflac:

metaflac --set-tag="ARTIST=Artist A" --set-tag="ARTIST=Artist B" testfile5_temp.flac

The metadata then looks like this:

METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 78
  vendor string: reference libFLAC 1.3.2 20170101
  comments: 2
    comment[0]: ARTIST=Artist A
    comment[1]: ARTIST=Artist B

So we can exclude the possibility that Flac doesn't support multiple tags, because we see that it works fine.

Note the length of 78 instead of 46. This is because the vendor string "reference libFLAC 1.3.2 20170101" is 32 bytes long. FlacLibSharp leaves it empty.

Step 6: Excluding Option B

Option B: I somehow never write the second ARTIST value to the file

This also seems strange, because I can see the code that writes the data run live. To investigate this, I can open the flac file in a text editor. I could use a "hex editor", to look at binary data, but in the case of FLAC metadata, the Vorbis Comments are text anyway. So opening it in a text editor should be sufficient.

When opening the file that was written to with metaflac, I can see the following:

With the file when it's written to by FlacLibSharp it looks like this:

Indeed very similar. But we can see I do in fact write that second tag to the file.

At this moment, I think the problem may be that I am not writing the tags correctly according to the Vorbis Comment format: Option C.

Step 7: Excluding Option C

Option C: I am not writing the tags correctly according to the Vorbis Comment format.

To investigate, I reconsider this loop:

foreach (var comment in this.comments)
{
    foreach (var value in comment.Value)
    {
        string commentText = string.Format("{0}={1}", comment.Key, value);
        byte[] commentData = System.Text.Encoding.UTF8.GetBytes(commentText);
        number = BinaryDataHelper.GetBytesUInt32((uint)commentData.Length);
        targetStream.Write(BinaryDataHelper.SwitchEndianness(number, 0, 4), 0, 4);
        targetStream.Write(commentData, 0, commentData.Length);
        totalLength += 4 + (uint)commentData.Length;
    }
}

I'm pretty sure everything works when I add two different tags, instead of twice the same tag. For example ARTIST and TITLE.

If we look at this loop, the resulting data for this situation should be the same. The code in the second foreach is run for each tag value, but the outer foreach has no other code.

To test, I change the test code to add two comments, each with one value:

vorbisComment["TITLE_"] = new VorbisCommentValues("Artist A");
vorbisComment["ARTIST"] = new VorbisCommentValues(new string[] { "Artist B" });

I add TITLE_ with an additional underscore because then everything remains the same length. In theory, this should be the same as having two values for ARTIST, except the first value has a different key now.

Metaflac reflects this correctly:

METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 46
  vendor string:
  comments: 2
    comment[0]: TITLE_=Artist A
    comment[1]: ARTIST=Artist B

In the text editor, it looks like this:

The two pieces of binary content look the same. I can go a step further and change the "TITLE_" in the file to "ARTIST", directly in the text editor. And sure enough I then see the following metadata:

METADATA block #1
  type: 4 (VORBIS_COMMENT)
  is last: true
  length: 46
  vendor string:
  comments: 2
    comment[0]: ARTIST=Artist A
    comment[1]: ARTIST=Artist B

The binary data looks the same, almost the same. I noticed one subtle different, highlighted below:

! C:\Users\user\Documents\Projects\Blog\FlacLibSharpDebug\2018-04-04 01_32_14-FlacLibSharp.Test.Core_b.png !

The first image is the one that doesn't work, the second is the one that works.

A byte is SOH when it should be STX. To know what that character actually is, I use a plugin I have in Notepad++ called "Convert" which allows me to convert a selected character "to hex" or "to ascii". When converting the STX to hex, the value is 02.

Converting hex 01 to ascii however, results in SOH.

It looks like this by indicates how many tags should be expected: the correct answer is 2.

Step 8: Localizing the bug

I can now conclude that my issue is the following: when I have a single tag with two values, FlacLibSharp incorrectly writes to the flac file that there is "1 comment", when in fact there are 2.

In FlacLibSharp, you give multiple values to a single key, but in FLAC that translates to 2 comments that happen to have the same key.

So where I now write how many vorbis comments there are, I have to write how many values there are in total, over all comments.

The code looks like this, currently:

number = BinaryDataHelper.GetBytesUInt32((uint)this.comments.Count);
targetStream.Write(BinaryDataHelper.SwitchEndianness(number, 0, 4), 0, 4);

Here we can see I just count the amount of comments, forgetting the fact that a single comment with 10 values actually results in 10 comments in the FLAC format.

Step 9: Solving the bug

Now that I know the problem, and I know the code that has the problem, solving the actual bug seems less intimidating.

I already have a test that will see if my fix does the trick.

What I can do is count the total number of values in all comments combined. I'm chosing to do this with Linq since I think the performance impact will be small and looks ok:

var totalValueCount = this.comments.Sum(comment => comment.Value.Count);
number = BinaryDataHelper.GetBytesUInt32((uint)totalValueCount);

For each comment, I count the number of values and keep a sum of all those counts.

This indeed makes the test pass and also metaflac shows two artists now. However, when rebuilding the project itself, I get an error.

I forgot for a second that I also still target .NET Framework 2.0, for some reason. This doesn't support Linq. I should probably move away from supporting that, but I won't do it now. So I'll have to rewrite the code again:

var totalValueCount = 0;
foreach(var comment in this.comments)
{
    foreach (var value in comment.Value)
    {
        totalValueCount++;
    }
}
number = BinaryDataHelper.GetBytesUInt32((uint)totalValueCount);

This is a little more verbose. Even better would be to count while we are writing values, but then the code would get more complicated because we'd have to navigate back in the stream to update the "total comments" value.

Now everything builds and all other tests still pass.

Conclusion

Sometimes you immediatly know where a bug is, but sometimes you have to eliminate options until you narrow it down to the actual cause of the issue.

This is something I can really enjoy, but I rarely take the time to document in such detail all the steps I do during a debugging session.

Also, the above illustrates that even though a bugfix is just a few lines of code the hardest part is finding what is actually causing the problem.

A reproducable case helps a lot. If you can put the case into a Unit Test, that's even better. This will prevent the bug from re-appearing later when other changes are being made.

Unit tests can help you be more confident that a bugfix doesn't introduce "regression" issues: you break something unrelated that used to work.

But Unit Tests don't always cover everything: some time ago I added this support for multiple tags, but no unit test existed to test for this super important use-case. During the debug session, I also uncovered another bug, but that one is less severe.

Knowing what tools to use can help a lot. In my case I got away with using Notepad++ and a plugin there, and the metaflac tool. But every bug is different and may require other tools.