Intel's Ronler Acres Plant

Silicon Forest
If the type is too small, Ctrl+ is your friend

Sunday, October 26, 2008

Sarah Base64

A couple of days ago I was talking about my Aunt Lucy and base64 encoding. I started looking into this and what I wanted to know was the nuts and bolts (or bits and bytes) of how this encoding is done. Found many items purporting to explain all, but you know, I don't really want a dissertation. Finally found the source code for a program that will do encoding or decoding. Was able to compile it on my windows system, ran it against the data from file attached to my Aunt's email, and lo and behold. Maybe being a flaming liberal runs in the family.

Now that I know the program works, let's see what it's doing. Hoo boy, what a lot of complicated bull puckey. Cut all that out and what we are left with is a program very similar to uuencode in that it:
  • takes (3) 8-bit bytes from the file to be encoded,
  • cuts them into (4) 6-bit values,
  • uses some magic rule to generate a printable character for each of those 6-bit values
  • and writes them to the encoded file.
The difference between base64 and uuencode is in the magic rule. What it finally comes down to is this:
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/"
uuencode uses a different ordering of characters. Decoding is the inverse. Take a letter, find it's position in the string. That position will be a number from 0..63, which can be represented in 6-bits. Decode four letters, pack them together in 24 bits, and them cut them into (3) 8-bit bytes.

Source code here. (The link now points to github.)

Because I am trying to learn something about Linux, I copied it to my Linux system. It compiled and ran fine, but I did learn a few things.
The file I received from my Aunt Lucy was full of html and email gobble-de-gook, until you scroll down a couple of pages and then you come to a point where the "readable" text stops, and you start getting just line after line of real gooble-de-gook. It continues on in this vein for another thousand lines or so. This block of mystery text is the encoded image. Cut off everything before it and the few lines or garbage at the end and you have an encoded image that you can feed to base64.

--0-1766255347-1224876755=:38851
Content-Type: image/jpeg; name="image.jpg"
Content-Transfer-Encoding: base64
Content-Id: 2165402007

/9j/4AAQSkZJRgABAQAAAQABAAD/2wBDAAUDBAQEAwUEBAQFBQUGBwwIBwcHBw8LCwkMEQ8SEhEP
ERETFhwXExQaFRERGCEYGh0dHx8fExciJCIeJBweHx7/2wBDAQUFBQcGBw4ICA4eFBEUHh4eHh4e
Hh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh4eHh7/wAARCAINAaQDASIA
AhEBAxEB/8QAHwAAAQUBAQEBAQEAAAAAAAAAAAECAwQFBgcICQoL/8QAtRAAAgEDAwIEAwUFBAQA
AAF9AQIDAAQRBRIhMUEGE1FhByJxFDKBkaEII0KxwRVS0fAkM2JyggkKFhcYGRolJicoKSo0NTY3
ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqDhIWGh4iJipKTlJWWl5iZmqKjpKWm
p6ipqrKztLW2t7i5usLDxMXGx8jJytLT1NXW19jZ2uHi4+Tl5ufo6erx8vP09fb3+Pn6/8QAHwEA
AwEBAQEBAQEBAQAAAAAAAAECAwQFBgcICQoL/8QAtREAAgECBAQDBAcFBAQAAQJ3AAECAxEEBSEx
BhJBUQdhcRMiMoEIFEKRobHBCSMzUvAVYnLRChYkNOEl8RcYGRomJygpKjU2Nzg5OkNERUZHSElK
U1RVVldYWVpjZGVmZ2hpanN0dXZ3eHl6goOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3
uLm6wsPExcbHyMnK0tPU1dbX2Nna4uPk5ebn6Onq8vP09fb3+Pn6/9oADAMBAAIRAxEAPwDABPXH
UemKTcQc96kYH1BGPWoyvOcn25r7tnjk6H5Rk0u45xTBkAcdhS496oB+73zmlB74GP5UzkfKB1NI
AeCeeaCSTI7c+/pRkDBpuSO9BHcc0AKcH8aTaPpSEc5waM0AKvAGSOfenZA6ce9NBz0NGM//AK6A


Update December 2016 replaced missing picture, updated source file link. Fixed broken html on Content-Id. Id is probably no longer correct, but who cares?

No comments: