2Getting Started with Mini-XML

This chapter describes how to write programs that use Mini-XML to access data in an XML file. Mini-XML provides the following functionality:

Mini-XML doesn't do validation or other types of processing on the data based upon schema files or other sources of definition information, nor does it support character entities other than those required by the XML specification.

The Basics

Mini-XML provides a single header file which you include:

    #include <mxml.h>

The Mini-XML library is included with your program using the -lmxml option:

    gcc -o myprogram myprogram.c -lmxml ENTER

If you have the pkg-config(1) software installed, you can use it to determine the proper compiler and linker options for your installation:

    pkg-config --cflags mxml ENTER
    pkg-config --libs mxml ENTER

Nodes

Every piece of information in an XML file is stored in memory in "nodes". Nodes are defined by the mxml_node_t structure. Each node has a typed value, optional user data, a parent node, sibling nodes (previous and next), and potentially child nodes.

For example, if you have an XML file like the following:

    <?xml version="1.0" encoding="utf-8"?>
    <data>
        <node>val1</node>
        <node>val2</node>
        <node>val3</node>
        <group>
            <node>val4</node>
            <node>val5</node>
            <node>val6</node>
        </group>
        <node>val7</node>
        <node>val8</node>
    </data>

the node tree for the file would look like the following in memory:

    ?xml version="1.0" encoding="utf-8"?
      |
    data
      |
    node - node - node - group - node - node
      |      |      |      |       |      |
    val1   val2   val3     |     val7   val8
                           |
                         node - node - node
                           |      |      |
                         val4   val5   val6

where "-" is a pointer to the sibling node and "|" is a pointer to the first child or parent node.

The mxmlGetType function gets the type of a node, one of MXML_CUSTOM, MXML_ELEMENT, MXML_INTEGER, MXML_OPAQUE, MXML_REAL, or MXML_TEXT. The parent and sibling nodes are accessed using the mxmlGetParent, mxmlGetNext, and mxmlGetPrevious functions. The mxmlGetUserData function gets any user data associated with the node.

CDATA Nodes

CDATA (MXML_ELEMENT) nodes are created using the mxmlNewCDATA function. The mxmlGetCDATA function retrieves the CDATA string pointer for a node.

Note:

CDATA nodes are currently stored in memory as special elements. This will be changed in a future major release of Mini-XML.

Custom Nodes

Custom (MXML_CUSTOM) nodes are created using the mxmlNewCustom function or using a custom load callback specified using the mxmlSetCustomHandlers function. The mxmlGetCustom function retrieves the custom value pointer for a node.

Comment Nodes

Comment (MXML_ELEMENT) nodes are created using the mxmlNewElement function. The mxmlGetElement function retrieves the comment string pointer for a node, including the surrounding "!--" and "--" characters.

Note:

Comment nodes are currently stored in memory as special elements. This will be changed in a future major release of Mini-XML.

Element Nodes

Element (MXML_ELEMENT) nodes are created using the mxmlNewElement function. The mxmlGetElement function retrieves the element name, the mxmlElementGetAttr function retrieves the value string for a named attribute associated with the element, and the mxmlGetFirstChild and mxmlGetLastChild functions retrieve the first and last child nodes for the element, respectively.

Integer Nodes

Integer (MXML_INTEGER) nodes are created using the mxmlNewInteger function. The mxmlGetInteger function retrieves the integer value for a node.

Opaque Nodes

Opaque (MXML_OPAQUE) nodes are created using the mxmlNewOpaque function. The mxmlGetOpaque function retrieves the opaque string pointer for a node. Opaque nodes are like string nodes but preserve all whitespace between nodes.

Text Nodes

Text (MXML_TEXT) nodes are created using the mxmlNewText and mxmlNewTextf functions. Each text node consists of a text string and (leading) whitespace value - the mxmlGetText function retrieves the text string pointer and whitespace value for a node.

Processing Instruction Nodes

Processing instruction (MXML_ELEMENT) nodes are created using the mxmlNewElement function. The mxmlGetElement function retrieves the processing instruction string for a node, including the surrounding "?" characters.

Note:

Processing instruction nodes are currently stored in memory as special elements. This will be changed in a future major release of Mini-XML.

Real Number Nodes

Real number (MXML_REAL) nodes are created using the mxmlNewReal function. The mxmlGetReal function retrieves the CDATA string pointer for a node.

XML Declaration Nodes

XML declaration (MXML_ELEMENT) nodes are created using the mxmlNewXML function. The mxmlGetElement function retrieves the XML declaration string for a node, including the surrounding "?" characters.

Note:

XML declaration nodes are currently stored in memory as special elements. This will be changed in a future major release of Mini-XML.

Creating XML Documents

You can create and update XML documents in memory using the various mxmlNew functions. The following code will create the XML document described in the previous section:

    mxml_node_t *xml;    /* <?xml ... ?> */
    mxml_node_t *data;   /* <data> */
    mxml_node_t *node;   /* <node> */
    mxml_node_t *group;  /* <group> */

    xml = mxmlNewXML("1.0");

    data = mxmlNewElement(xml, "data");

        node = mxmlNewElement(data, "node");
        mxmlNewText(node, 0, "val1");
        node = mxmlNewElement(data, "node");
        mxmlNewText(node, 0, "val2");
        node = mxmlNewElement(data, "node");
        mxmlNewText(node, 0, "val3");

        group = mxmlNewElement(data, "group");

            node = mxmlNewElement(group, "node");
            mxmlNewText(node, 0, "val4");
            node = mxmlNewElement(group, "node");
            mxmlNewText(node, 0, "val5");
            node = mxmlNewElement(group, "node");
            mxmlNewText(node, 0, "val6");

        node = mxmlNewElement(data, "node");
        mxmlNewText(node, 0, "val7");
        node = mxmlNewElement(data, "node");
        mxmlNewText(node, 0, "val8");

We start by creating the declaration node common to all XML files using the mxmlNewXML function:

    xml = mxmlNewXML("1.0");

We then create the <data> node used for this document using the mxmlNewElement function. The first argument specifies the parent node (xml) while the second specifies the element name (data):

    data = mxmlNewElement(xml, "data");

Each <node>...</node> in the file is created using the mxmlNewElement and mxmlNewText functions. The first argument of mxmlNewText specifies the parent node (node). The second argument specifies whether whitespace appears before the text - 0 or false in this case. The last argument specifies the actual text to add:

    node = mxmlNewElement(data, "node");
    mxmlNewText(node, 0, "val1");

The resulting in-memory XML document can then be saved or processed just like one loaded from disk or a string.

Loading XML

You load an XML file using the mxmlLoadFile function:

    FILE *fp;
    mxml_node_t *tree;

    fp = fopen("filename.xml", "r");
    tree = mxmlLoadFile(NULL, fp,
                        MXML_TEXT_CALLBACK);
    fclose(fp);

The first argument specifies an existing XML parent node, if any. Normally you will pass NULL for this argument unless you are combining multiple XML sources. The XML file must contain a complete XML document including the ?xml element if the parent node is NULL.

The second argument specifies the stdio file to read from, as opened by fopen() or popen(). You can also use stdin if you are implementing an XML filter program.

The third argument specifies a callback function which returns the value type of the immediate children for a new element node: MXML_CUSTOM, MXML_IGNORE, MXML_INTEGER, MXML_OPAQUE, MXML_REAL, or MXML_TEXT. Load callbacks are described in detail in Chapter 3. The example code uses the MXML_TEXT_CALLBACK constant which specifies that all data nodes in the document contain whitespace-separated text values. Other standard callbacks include MXML_IGNORE_CALLBACK, MXML_INTEGER_CALLBACK, MXML_OPAQUE_CALLBACK, and MXML_REAL_CALLBACK.

The mxmlLoadString function loads XML node trees from a string:

    char buffer[8192];
    mxml_node_t *tree;

    ...
    tree = mxmlLoadString(NULL, buffer,
                          MXML_TEXT_CALLBACK);

The first and third arguments are the same as used for mxmlLoadFile(). The second argument specifies the string or character buffer to load and must be a complete XML document including the ?xml element if the parent node is NULL.

Saving XML

You save an XML file using the mxmlSaveFile function:

    FILE *fp;
    mxml_node_t *tree;

    fp = fopen("filename.xml", "w");
    mxmlSaveFile(tree, fp, MXML_NO_CALLBACK);
    fclose(fp);

The first argument is the XML node tree to save. It should normally be a pointer to the top-level ?xml node in your XML document.

The second argument is the stdio file to write to, as opened by fopen() or popen(). You can also use stdout if you are implementing an XML filter program.

The third argument is the whitespace callback to use when saving the file. Whitespace callbacks are covered in detail in Chapter 3. The previous example code uses the MXML_NO_CALLBACK constant to specify that no special whitespace handling is required.

The mxmlSaveAllocString, and mxmlSaveString functions save XML node trees to strings:

    char buffer[8192];
    char *ptr;
    mxml_node_t *tree;

    ...
    mxmlSaveString(tree, buffer, sizeof(buffer),
                   MXML_NO_CALLBACK);

    ...
    ptr = mxmlSaveAllocString(tree, MXML_NO_CALLBACK);

The first and last arguments are the same as used for mxmlSaveFile(). The mxmlSaveString function takes pointer and size arguments for saving the XML document to a fixed-size buffer, while mxmlSaveAllocString() returns a string buffer that was allocated using malloc().

Controlling Line Wrapping

When saving XML documents, Mini-XML normally wraps output lines at column 75 so that the text is readable in terminal windows. The mxmlSetWrapMargin function overrides the default wrap margin:

    /* Set the margin to 132 columns */
    mxmlSetWrapMargin(132);

    /* Disable wrapping */
    mxmlSetWrapMargin(0);

Memory Management

Once you are done with the XML data, use the mxmlDelete function to recursively free the memory that is used for a particular node or the entire tree:

    mxmlDelete(tree);

You can also use reference counting to manage memory usage. The mxmlRetain and mxmlRelease functions increment and decrement a node's use count, respectively. When the use count goes to 0, mxmlRelease will automatically call mxmlDelete to actually free the memory used by the node tree. New nodes automatically start with a use count of 1.

Finding and Iterating Nodes

The mxmlWalkPrev and mxmlWalkNextfunctions can be used to iterate through the XML node tree:

    mxml_node_t *node;
    
    node = mxmlWalkPrev(current, tree,
                        MXML_DESCEND);

    node = mxmlWalkNext(current, tree,
                        MXML_DESCEND);

In addition, you can find a named element/node using the mxmlFindElement function:

    mxml_node_t *node;
    
    node = mxmlFindElement(tree, tree, "name",
                           "attr", "value",
                           MXML_DESCEND);

The name, attr, and value arguments can be passed as NULL to act as wildcards, e.g.:

    /* Find the first "a" element */
    node = mxmlFindElement(tree, tree, "a",
                           NULL, NULL,
                           MXML_DESCEND);
    /* Find the first "a" element with "href"
       attribute */
    node = mxmlFindElement(tree, tree, "a",
                           "href", NULL,
                           MXML_DESCEND);
    /* Find the first "a" element with "href"
       to a URL */
    node = mxmlFindElement(tree, tree, "a",
                           "href",
                           "http://www.easysw.com/",
                           MXML_DESCEND);
    /* Find the first element with a "src"
       attribute */
    node = mxmlFindElement(tree, tree, NULL,
                           "src", NULL,
                           MXML_DESCEND);
    /* Find the first element with a "src"
       = "foo.jpg" */
    node = mxmlFindElement(tree, tree, NULL,
                           "src", "foo.jpg",
                           MXML_DESCEND);

You can also iterate with the same function:

    mxml_node_t *node;

    for (node = mxmlFindElement(tree, tree,
                                "name",
                                NULL, NULL,
                                MXML_DESCEND);
         node != NULL;
         node = mxmlFindElement(node, tree,
                                "name",
                                NULL, NULL,
                                MXML_DESCEND))
    {
      ... do something ...
    }

The MXML_DESCEND argument can actually be one of three constants:

Finding Specific Nodes

You can find specific nodes in the tree using the mxmlFindPath, for example:

    mxml_node_t *value;

    value = mxmlFindPath(tree, "path/to/*/foo/bar");

The second argument is a "path" to the parent node. Each component of the path is separated by a slash (/) and represents a named element in the document tree or a wildcard (*) path representing 0 or more intervening nodes.