Feike IT Consulting Munich

Decoding base64 coded #cdata sections in XML streams and saving them to binary files using JAVA

These days XML is the first choice when defining a protocol syntax. If one wants to transmit binary data like images in PNG or JPEG format, one would use XML and encode that binary data in base64 blocks. Also as attachment technology in emails (i.e. PDF data is binary) this is the choice of the day.
This is a short articel about how to extract base64 data from XML files and store the decoded binary in a file. You will find a complete JAVA program in the end.


Loading the XML document and parse it into a DOM

Just to not re-invent the wheel, we use the JDOM (org.jdom) library here. Also, to make the point more clear, we do not handle any exceptions (of course we do in the later java program).
	SAXBuilder builder = new SAXBuilder();
	Document document = builder.build( args[0] );
SAXBuilder and Document are classes from org.jdom. We expect the name of the xml file as first commandline parameter (args[0]). After this, when the file was well-formed and everything else (check out the exceptions) went good we have the xml document in the "document" variable.

Getting the right Element from the DOM document

As the third parameter of this program we expect the path to the XML element that holds the base64 data. Example, if we have given a structure like this:
<ROOT>
   <PERSONAL>
      <EMPLOYE>
         <PHOTO>
            ... G8+ucqUNx4DzTJ4vv1xz8QFnRVB9gj85He3jeve3p6MNc ...
         </PHOTO>
      </EMPLOYE>
   </PERSONAL>
</ROOT>
Now we want to extract the photo inside the TAG, the user must specify it like: "PERSONAL/EMPLOYE/PHOTO".
Well, it is like a path from root. The code to traverse the xml document to the matching element can look like this:
	String path[] = args[2].split("/");
	Element element = document.getRootElement();
	int i = 0;
	while ((null != element) && (i < path.length)) {
		element = element.getChild(path[i++]);
	}
After that, "element" is null (in case of errors) or keeps the DOM-Element that contains the base64 data. (btw. in xml-schema it is called xs:base64Binary). If you have elements with a cardinality other then 1 on your path, you must re-think this. But that's not the topic here.

Decode the xs:base64binary with the sun decoder

There is already a decoder availible in the java space. It is from SUN Micro. and the class is sun.misc.BASE64Decoder. So we use it (take care, it's not part of the standard API). While using the decoder you may experience some problems, specially with blanks. Well, write your own or use the decoder from the apache project.
Anyway, here is the code (needs exception handling for sure):
	String base64 = element.getTextNormalize();
	byte decoded[] = new sun.misc.BASE64Decoder().decodeBuffer(base64);
Now, in byte[] decoded we have the decoded binary data. All thats missing is storing it now.

Storing the binary data in byte[] to a file on disk

Thats trivial in JAVA (we want the output-filename as second parameter on the commandline):
	FileOutputStream  fos = null;
	fos = new FileOutputStream(args[1]);
	fos.write(decoded);
	fos.close();

And here is the complete (and working) programing example


import java.io.FileOutputStream;
import java.io.IOException;
import java.util.logging.Logger;

import org.jdom.Document;
import org.jdom.Element;
import org.jdom.JDOMException;
import org.jdom.input.SAXBuilder;

public class Base64DecodeFromXML {

	static public Logger logger = Logger.getLogger("Base64DecodeFromXML");
	static private Document document = null;  // the doc

	/**
	 * @param args inputFile outputFile "path/to/base64element"
	 */
	public static void main(String[] args) {
		// startup
		logger.info("Base64DecodeFromXML 1.0 - starting up");
		if (args.length < 3) {
			logger.severe("Base64DecodeFromXML got not enough parameters - filenames needed");
			System.err.println("Usage: Base64DecodeFromXML inputFile outputFile path/to/base64element");
			System.exit(-1);
		}
		// load xml-doc into document object
		logger.info("loading input file " + args[0]);
		SAXBuilder builder = new SAXBuilder();
		// try to build the doc
		try {
			Base64DecodeFromXML.document = builder.build( args[0] );
		} catch (JDOMException e) {
			e.printStackTrace();
			document = null;
		} catch (IOException e) {
			e.printStackTrace();
			document = null;
		}
		if (null == Base64DecodeFromXML.document) {
			logger.severe("Base64DecodeFromXML got bad XML file - not exiting or not wellformed");
			System.exit(-1);
		}
		// extract the path to element that contains the base64 encoded binary from the commandline
		String path[] = args[2].split("/");
		// get the root element of the XML Doc
		Element ele = Base64DecodeFromXML.document.getRootElement();
		// recurse into the xml structure until we find the base64 element
		int i = 0;
		while ((null != ele) && (i < path.length)) {
			logger.info("iterating into " + path[i]);
			ele = ele.getChild(path[i]);
			if (null == ele) {
				logger.severe("Element not found: " + path[i]);
				break;
			}
			++i;
		}
		if (null != ele) {
			// get the base64 content into String
			String base64 = ele.getTextNormalize();
			
			// and decode the content
			try {
				byte decoded[] = new sun.misc.BASE64Decoder().decodeBuffer(base64);
				// save it to a binary stream
				logger.info("saving output file " + args[1]);
				FileOutputStream  fos = null;
				try {
					fos = new FileOutputStream(args[1]);
				} catch (IOException e) {
					e.printStackTrace();
				}
				fos.write(decoded);
				fos.close();
			} catch (IOException e1) {
				e1.printStackTrace();
			}
		}
	}

}