2 - Getting Started with Mini-XML

This chapter describes how to write programs that use Mini-XML to access data in an XML file.

The Basics

Mini-XML provides a single header file which you include:

    #include <mxml.h>

The Mini-XML library is included with your program using the -lmxml option:

    gcc -o myprogram myprogram.c -lmxml ENTER

If you have the pkg-config(1) software installed, you can use it to determine the proper compiler and linker options for your installation:

    pkg-config --cflags mxml ENTER
    pkg-config --libs mxml ENTER

Nodes

Every piece of information in an XML file (elements, text, numbers) is stored in memory in "nodes". Nodes are defined by the mxml_node_t structure. The type member defines the node type (element, integer, opaque, real, or text) which determines which value you want to look at in the value union.

New nodes can be created using the mxmlNewElement(), mxmlNewInteger(), mxmlNewOpaque(), mxmlNewReal(), and mxmlNewText() functions. Only elements can have child nodes, and the top node must be an element, usually "?xml".

Each node has pointers for the node above (parent), below (child), to the left (prev), and to the right (next) of the current node. If you have an XML file like the following:

    <?xml version="1.0"?>
    <data>
        <node>val1</node>
        <node>val2</node>
        <node>val3</node>
        <group>
            <node>val4</node>
            <node>val5</node>
            <node>val6</node>
        </group>
        <node>val7</node>
        <node>val8</node>
        <node>val9</node>
    </data>

the node tree returned by mxmlLoadFile() would look like the following in memory:

    ?xml
      |
    data
      |
    node - node - node - group - node - node - node
      |      |      |      |       |      |      |
    val1   val2   val3     |     val7   val8   val9
                           |
                         node - node - node
                           |      |      |
                         val4   val5   val6

where "-" is a pointer to the next node and "|" is a pointer to the first child node.

Once you are done with the XML data, use the mxmlDelete() function to recursively free the memory that is used for a particular node or the entire tree:

    mxmlDelete(tree);

Loading XML

You load an XML file using the mxmlLoadFile() function:

    FILE *fp;
    mxml_node_t *tree;

    fp = fopen("filename.xml", "r");
    tree = mxmlLoadFile(NULL, fp, MXML_NO_CALLBACK);
    fclose(fp);

The first argument specifies an existing XML parent node, if any. Normally you will pass NULL for this argument unless you are combining multiple XML sources. The XML file must contain a complete XML document including the ?xml element if the parent node is NULL.

The second argument specifies the stdio file to read from, as opened by fopen() or popen(). You can also use stdin if you are implementing an XML filter program.

The third argument specifies a callback function which returns the value type of the immediate children for a new element node: MXML_INTEGER, MXML_OPAQUE, MXML_REAL, or MXML_TEXT. Load callbacks are described in detail in Chapter 3. The example code uses the MXML_NO_CALLBACK constant which specifies that all data nodes in the document contain whitespace-separated text values.

The mxmlLoadString() function loads XML node trees from a string:

    char buffer[8192];
    mxml_node_t *tree;

    ...
    tree = mxmlLoadString(NULL, buffer, MXML_NO_CALLBACK);

The first and third arguments are the same as used for mxmlLoadFile(). The second argument specifies the string or character buffer to load and must be a complete XML document including the ?xml element if the parent node is NULL.

Saving XML

You save an XML file using the mxmlSaveFile() function:

    FILE *fp;
    mxml_node_t *tree;

    fp = fopen("filename.xml", "w");
    mxmlSaveFile(tree, fp, MXML_NO_CALLBACK);
    fclose(fp);

The first argument is the XML node tree to save. It should normally be a pointer to the top-level ?xml node in your XML document.

The second argument is the stdio file to write to, as opened by fopen() or popen(). You can also use stdout if you are implementing an XML filter program.

The third argument is the whitespace callback to use when saving the file. Whitespace callbacks are covered in detail in Chapter 3. The example code above uses the MXML_NO_CALLBACK constant to specify that no special whitespace handling is required.

The mxmlSaveAllocString(), and mxmlSaveString() functions save XML node trees to strings:

    char buffer[8192];
    char *ptr;
    mxml_node_t *tree;

    ...
    mxmlSaveString(tree, buffer, sizeof(buffer), MXML_NO_CALLBACK);

    ...
    ptr = mxmlSaveAllocString(tree, MXML_NO_CALLBACK);

The first and last arguments are the same as used for mxmlSaveFile(). The mxmlSaveString() function takes pointer and size arguments for saving the XML document to a fixed-size buffer, while mxmlSaveAllocString() returns a string buffer that was allocated using malloc().

Finding and Iterating Nodes

The mxmlWalkPrev() and mxmlWalkNext()functions can be used to iterate through the XML node tree:

    mxml_node_t *node = mxmlWalkPrev(current, tree, MXML_DESCEND);

    mxml_node_t *node = mxmlWalkNext(current, tree, MXML_DESCEND);

In addition, you can find a named element/node using the mxmlFindElement() function:

    mxml_node_t *node = mxmlFindElement(tree, tree, "name", "attr",
                                	"value", MXML_DESCEND);

The name, attr, and value arguments can be passed as NULL to act as wildcards, e.g.:

    /* Find the first "a" element */
    node = mxmlFindElement(tree, tree, "a", NULL, NULL, MXML_DESCEND);

    /* Find the first "a" element with "href" attribute */
    node = mxmlFindElement(tree, tree, "a", "href", NULL, MXML_DESCEND);

    /* Find the first "a" element with "href" to a URL */
    node = mxmlFindElement(tree, tree, "a", "href",
                	   "http://www.easysw.com/~mike/mxml/", MXML_DESCEND);

    /* Find the first element with a "src" attribute*/
    node = mxmlFindElement(tree, tree, NULL, "src", NULL, MXML_DESCEND);

    /* Find the first element with a "src" = "foo.jpg" */
    node = mxmlFindElement(tree, tree, NULL, "src", "foo.jpg", MXML_DESCEND);

You can also iterate with the same function:

    mxml_node_t *node;

    for (node = mxmlFindElement(tree, tree, "name", NULL, NULL, MXML_DESCEND);
         node != NULL;
         node = mxmlFindElement(node, tree, "name", NULL, NULL, MXML_DESCEND))
    {
      ... do something ...
    }

The MXML_DESCEND argument can actually be one of three constants: