Decoding base64 in Python

This is a small tutorial for beginners on how to decode base64 text strings in Python3. While Python does have a function to directly encode and decode base64, it is always good practice to try and write one yourself if you are a new programmer.

Head over to Wikipedia to see how base64 is decoded. First, each character in the encoded string is assigned a number according to the base64 table. This number is then translated to 6 bit binary.

You can do this in Python by including the base64 table as a dictionary, and iterating through all characters in the encoded string, like this:

for char in string:
    if char in index.keys():
        bin_string += 
            "{0:06b}".format(index.get(char))

The last line formats the decimal value to 6-bit binary ({0:06b}). As the highest value in the base64 table is 63 (for the / character), 6 bits are exactly enough to hold it and no information will be lost (0b111111 = 63).

The above loop creates one long binary string (bin_string) which should now be partitioned in bytes (8 bits) and converted to ASCII. This can be accomplished with the following loop:

while len(bin_string) >= 8:
    byte = bin_string[1:8]
    char = chr(int(byte, 2))
    output += char
    bin_string = bin_string[8:]

This code iterates through the binary string until it’s less than 8 bits long (which should be the end), takes the first byte and converts it to ASCII (actually, Unicode) using the chr function. This could also be done by including an ASCII table and using that to convert, just like we did in the beginning with the base64 table. I’ll leave that as an exercise.
Finally, it removes the first byte from the binary string which we just converted, and the loop continues to the next byte.

If you use print(output) now, it should display the decoded string.

To make the script a little more intuitive you can incorporate the argparse module which allows the coded string to be included as an argument while running the script, like this:

python base64_decode.py --TWFu

Where ‘TWFu’ is an encoded string.

The code for argparse is as follows:

import argparse

parser = argparse.ArgumentParser(Description="")
parser.add_argument('--string', dest='string',     
    required=True)
args = parser.parse_args()

The string to decode then becomes args.string when called in the subsequent code.

Happy coding!

The full script can be found at github.

Leave a Reply

Your email address will not be published.