Get original length from a Base 64 string
A question was posed today: can I get the exact length in bytes of the input data, if I have a Base64 string?
The answer is yes.
Summary (TL;DR)
Base64 encodes three bytes to four characters. Sometimes, padding is added in the form of one or two '=' characters.
To get the length of bytes we use the formula:
(3 * (LengthInCharacters / 4)) - (numberOfPaddingCharacters)
Here's a function in C#:
public int GetOriginalLengthInBytes(string base64string)
{
if (string.IsNullOrEmpty(base64string)) { return 0; }
var characterCount = base64string.Length;
var paddingCount = base64string.Substring(characterCount - 2, 2)
.Count(c => c == '=');
return (3 * (characterCount / 4)) - paddingCount;
}
Base 64
Base64 is a way to encode binary data in ascii data. The goal is to make it easier to transfer. Putting binary data inside an xml file, for example, could act weird, but a Base64 string acts just like normal text.
A base64 string looks like this:
TWFuIGlzIGRpc3Rpbmd1aXNoZWQsIG5vdCBvbmx5IGJ5IGhpcyByZWFzb24sIGJ1dCBieSB0aGlzIHNpbmd1bGFyIHBhc3Npb24gZnJvbSBvdGhlciBhbmltYWxzLCB3aGljaCBpcyBhIGx1c3Qgb2YgdGhlIG1pbmQsIHRoYXQgYnkgYSBwZXJzZXZlcmFuY2Ugb2YgZGVsaWdodCBpbiB0aGUgY29udGludWVkIGFuZCBpbmRlZmF0aWdhYmxlIGdlbmVyYXRpb24gb2Yga25vd2xlZGdlLCBleGNlZWRzIHRoZSBzaG9ydCB2ZWhlbWVuY2Ugb2YgYW55IGNhcm5hbCBwbGVhc3VyZS4=
Length of data
To know the length of the original data, we must understand how Base64 works.
The input is a set of bytes: [10, 12, 13, 14]
The output is an ascii string.
Base64 uses 4 ascii characters to encode 24-bits (3 bytes) of data.
To encode, it splits up the three bytes into 4 6-bit numbers. A 6-bit number can represent 64 possible value.
Each possible value corresponds to an ascii character. For example: 0 = 'A'; 1 = 'B'; and so on ...
So four characters of base64 represent 3 bytes of data.
To calculate the length of the original data, you must count how many sets of 4 characters are in the Base64 string. For each of those sets, you know you have 3 bytes of original data
3 * (LengthInCharacters / 4) = length in bytes
Padding
This is almost correct, except: what if the length of the data doesn't nicely line up to a multitude of 3? Then you cannot get a nice set of four 6-bit numbers.
In that case you need to add padding. Base64 adds padding using the '=' character.
You can have three cases:
- You have three bytes: no padding
- You have two bytes: you use three characters, the fourth is set to '='
- You have one byte: you use two characters, the third and fourth are set to '=='
So now our formula is:
(3 * (LengthInCharacters / 4)) - (numberOfPaddingCharacters) = length in bytes