Get original length from a Base 64 string

A question was posed today: can I get the exact length in bytes of the input data, if I have a Base64 string?

The answer is yes.

Summary (TL;DR)

Base64 encodes three bytes to four characters. Sometimes, padding is added in the form of one or two '=' characters.

To get the length of bytes we use the formula:

(3 * (LengthInCharacters / 4)) - (numberOfPaddingCharacters)

Here's a function in C#:

public int GetOriginalLengthInBytes(string base64string)
{
    if (string.IsNullOrEmpty(base64string)) { return 0; }

    var characterCount = base64string.Length;
    var paddingCount = base64string.Substring(characterCount - 2, 2)
                                   .Count(c => c == '=');
    return (3 * (characterCount / 4)) - paddingCount;
}

Base 64

Base64 is a way to encode binary data in ascii data. The goal is to make it easier to transfer. Putting binary data inside an xml file, for example, could act weird, but a Base64 string acts just like normal text.

A base64 string looks like this:

TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=

Length of data

To know the length of the original data, we must understand how Base64 works.

The input is a set of bytes: [10, 12, 13, 14]

The output is an ascii string.

Base64 uses 4 ascii characters to encode 24-bits (3 bytes) of data.

To encode, it splits up the three bytes into 4 6-bit numbers. A 6-bit number can represent 64 possible value.

Each possible value corresponds to an ascii character. For example: 0 = 'A'; 1 = 'B'; and so on ...

So four characters of base64 represent 3 bytes of data.

To calculate the length of the original data, you must count how many sets of 4 characters are in the Base64 string. For each of those sets, you know you have 3 bytes of original data

3 * (LengthInCharacters / 4) = length in bytes

Padding

This is almost correct, except: what if the length of the data doesn't nicely line up to a multitude of 3? Then you cannot get a nice set of four 6-bit numbers.

In that case you need to add padding. Base64 adds padding using the '=' character.

You can have three cases:

  • You have three bytes: no padding
  • You have two bytes: you use three characters, the fourth is set to '='
  • You have one byte: you use two characters, the third and fourth are set to '=='

So now our formula is:

(3 * (LengthInCharacters / 4)) - (numberOfPaddingCharacters) = length in bytes