mxml/www/docfiles/advanced.html

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>
<TITLE>Mini-XML Programmers Manual, Version 2.7</TITLE>
<META NAME="author" CONTENT="Michael R. Sweet">
<META NAME="copyright" CONTENT="Copyright 2003-2011">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-iso-8859-1">
<LINK REL="Start" HREF="index.html">
<LINK REL="Contents" HREF="index.html">
<LINK REL="Prev" HREF="basics.html">
<LINK REL="Next" HREF="mxmldoc.html">
<STYLE TYPE="text/css"><!--
BODY { font-family: sans-serif }
H1 { font-family: sans-serif }
H2 { font-family: sans-serif }
H3 { font-family: sans-serif }
H4 { font-family: sans-serif }
H5 { font-family: sans-serif }
H6 { font-family: sans-serif }
SUB { font-size: smaller }
SUP { font-size: smaller }
PRE { font-family: monospace }
A { text-decoration: none }
--></STYLE>
</HEAD>
<BODY>
<A HREF="index.html">Contents</A>
<A HREF="basics.html">Previous</A>
<A HREF="mxmldoc.html">Next</A>
<HR NOSHADE>
<H1 align="right"><A name="ADVANCED"><IMG align="right" alt="3" height="100"
hspace="10" src="3.gif" width="100"></A>More Mini-XML Programming
 Techniques</H1>
<P>This chapter shows additional ways to use the Mini-XML library in
 your programs.</P>
<H2><A name="LOAD_CALLBACKS">Load Callbacks</A></H2>
<P><A href="#LOAD_XML">Chapter 2</A> introduced the <A href="reference.html#mxmlLoadFile">
<TT>mxmlLoadFile()</TT></A> and <A href="reference.html#mxmlLoadString"><TT>
mxmlLoadString()</TT></A> functions. The last argument to these
 functions is a callback function which is used to determine the value
 type of each data node in an XML document.</P>
<P>Mini-XML defines several standard callbacks for simple XML data
 files:</P>
<UL>
<LI><TT>MXML_INTEGER_CALLBACK</TT> - All data nodes contain
 whitespace-separated integers.</LI>
<LI><TT>MXML_OPAQUE_CALLBACK</TT> - All data nodes contain opaque
 strings (&quot;CDATA&quot;).</LI>
<LI><TT>MXML_REAL_CALLBACK</TT> - All data nodes contain
 whitespace-separated floating-point numbers.</LI>
<LI><TT>MXML_TEXT_CALLBACK</TT> - All data nodes contain
 whitespace-separated strings.</LI>
</UL>
<P>You can provide your own callback functions for more complex XML
 documents. Your callback function will receive a pointer to the current
 element node and must return the value type of the immediate children
 for that element node: <TT>MXML_INTEGER</TT>, <TT>MXML_OPAQUE</TT>, <TT>
MXML_REAL</TT>, or <TT>MXML_TEXT</TT>. The function is called<I> after</I>
 the element and its attributes have been read, so you can look at the
 element name, attributes, and attribute values to determine the proper
 value type to return.</P>

<!-- NEED 2in -->
<P>The following callback function looks for an attribute named &quot;type&quot;
 or the element name to determine the value type for its child nodes:</P>
<PRE>
    mxml_type_t
    type_cb(mxml_node_t *node)
    {
      const char *type;

     /*
      * You can lookup attributes and/or use the
      * element name, hierarchy, etc...
      */

      type = mxmlElementGetAttr(node, &quot;type&quot;);
      if (type == NULL)
	type = mxmlGetElement(node);

      if (!strcmp(type, &quot;integer&quot;))
	return (MXML_INTEGER);
      else if (!strcmp(type, &quot;opaque&quot;))
	return (MXML_OPAQUE);
      else if (!strcmp(type, &quot;real&quot;))
	return (MXML_REAL);
      else
	return (MXML_TEXT);
    }
</PRE>
<P>To use this callback function, simply use the name when you call any
 of the load functions:</P>
<PRE>
    FILE *fp;
    mxml_node_t *tree;

    fp = fopen(&quot;filename.xml&quot;, &quot;r&quot;);
    tree = mxmlLoadFile(NULL, fp, <B>type_cb</B>);
    fclose(fp);
</PRE>
<H2><A name="SAVE_CALLBACKS">Save Callbacks</A></H2>
<P><A href="#LOAD_XML">Chapter 2</A> also introduced the <A href="reference.html#mxmlSaveFile">
<TT>mxmlSaveFile()</TT></A>, <A href="reference.html#mxmlSaveString"><TT>
mxmlSaveString()</TT></A>, and <A href="reference.html#mxmlSaveAllocString">
<TT>mxmlSaveAllocString()</TT></A> functions. The last argument to these
 functions is a callback function which is used to automatically insert
 whitespace in an XML document.</P>
<P>Your callback function will be called up to four times for each
 element node with a pointer to the node and a &quot;where&quot; value of <TT>
MXML_WS_BEFORE_OPEN</TT>, <TT>MXML_WS_AFTER_OPEN</TT>, <TT>
MXML_WS_BEFORE_CLOSE</TT>, or <TT>MXML_WS_AFTER_CLOSE</TT>. The callback
 function should return <TT>NULL</TT> if no whitespace should be added
 and the string to insert (spaces, tabs, carriage returns, and newlines)
 otherwise.</P>
<P>The following whitespace callback can be used to add whitespace to
 XHTML output to make it more readable in a standard text editor:</P>
<PRE>
    const char *
    whitespace_cb(mxml_node_t *node,
                  int where)
    {
      const char *name;

     /*
      * We can conditionally break to a new line
      * before or after any element. These are
      * just common HTML elements...
      */

      name = mxmlGetElement(node);

      if (!strcmp(name, &quot;html&quot;) ||
          !strcmp(name, &quot;head&quot;) ||
          !strcmp(name, &quot;body&quot;) ||
	  !strcmp(name, &quot;pre&quot;) ||
          !strcmp(name, &quot;p&quot;) ||
	  !strcmp(name, &quot;h1&quot;) ||
          !strcmp(name, &quot;h2&quot;) ||
          !strcmp(name, &quot;h3&quot;) ||
	  !strcmp(name, &quot;h4&quot;) ||
          !strcmp(name, &quot;h5&quot;) ||
          !strcmp(name, &quot;h6&quot;))
      {
       /*
	* Newlines before open and after
        * close...
	*/

	if (where == MXML_WS_BEFORE_OPEN ||
            where == MXML_WS_AFTER_CLOSE)
	  return (&quot;\n&quot;);
      }
      else if (!strcmp(name, &quot;dl&quot;) ||
               !strcmp(name, &quot;ol&quot;) ||
               !strcmp(name, &quot;ul&quot;))
      {
       /*
	* Put a newline before and after list
        * elements...
	*/

	return (&quot;\n&quot;);
      }
      else if (!strcmp(name, &quot;dd&quot;) ||
               !strcmp(name, &quot;dt&quot;) ||
               !strcmp(name, &quot;li&quot;))
      {
       /*
	* Put a tab before &lt;li&gt;'s, * &lt;dd&gt;'s,
        * and &lt;dt&gt;'s, and a newline after them...
	*/

	if (where == MXML_WS_BEFORE_OPEN)
	  return (&quot;\t&quot;);
	else if (where == MXML_WS_AFTER_CLOSE)
	  return (&quot;\n&quot;);
      }

     /*
      * Return NULL for no added whitespace...
      */

      return (NULL);
    }
</PRE>
<P>To use this callback function, simply use the name when you call any
 of the save functions:</P>
<PRE>
    FILE *fp;
    mxml_node_t *tree;

    fp = fopen(&quot;filename.xml&quot;, &quot;w&quot;);
    mxmlSaveFile(tree, fp, <B>whitespace_cb</B>);
    fclose(fp);
</PRE>

<!-- NEED 10 -->
<H2><A NAME="4_3">Custom Data Types</A></H2>
<P>Mini-XML supports custom data types via global load and save
 callbacks. Only a single set of callbacks can be active at any time,
 however your callbacks can store additional information in order to
 support multiple custom data types as needed. The <TT>MXML_CUSTOM</TT>
 node type identifies custom data nodes.</P>
<P>The load callback receives a pointer to the current data node and a
 string of opaque character data from the XML source with character
 entities converted to the corresponding UTF-8 characters. For example,
 if we wanted to support a custom date/time type whose value is encoded
 as &quot;yyyy-mm-ddThh:mm:ssZ&quot; (ISO format), the load callback would look
 like the following:</P>
<PRE>
    typedef struct
    {
      unsigned      year,    /* Year */
                    month,   /* Month */
                    day,     /* Day */
                    hour,    /* Hour */
                    minute,  /* Minute */
                    second;  /* Second */
      time_t        unix;    /* UNIX time */
    } iso_date_time_t;

    int
    load_custom(mxml_node_t *node,
                const char *data)
    {
      iso_date_time_t *dt;
      struct tm tmdata;

     /*
      * Allocate data structure...
      */

      dt = calloc(1, sizeof(iso_date_time_t));

     /*
      * Try reading 6 unsigned integers from the
      * data string...
      */

      if (sscanf(data, &quot;%u-%u-%uT%u:%u:%uZ&quot;,
                 &amp;(dt-&gt;year), &amp;(dt-&gt;month),
                 &amp;(dt-&gt;day), &amp;(dt-&gt;hour),
                 &amp;(dt-&gt;minute),
                 &amp;(dt-&gt;second)) != 6)
      {
       /*
        * Unable to read numbers, free the data
        * structure and return an error...
        */

        free(dt);

        return (-1);
      }

     /*
      * Range check values...
      */

      if (dt-&gt;month &lt;1 || dt-&gt;month &gt; 12 ||
          dt-&gt;day  &lt;1 || dt-&gt;day &gt; 31 ||
          dt-&gt;hour  &lt;0 || dt-&gt;hour &gt; 23 ||
          dt-&gt;minute  &lt;0 || dt-&gt;minute &gt; 59 ||
          dt-&gt;second  &lt;0 || dt-&gt;second &gt; 59)
      {
       /*
        * Date information is out of range...
        */

        free(dt);

        return (-1);
      }

     /*
      * Convert ISO time to UNIX time in
      * seconds...
      */

      tmdata.tm_year = dt-&gt;year - 1900;
      tmdata.tm_mon  = dt-&gt;month - 1;
      tmdata.tm_day  = dt-&gt;day;
      tmdata.tm_hour = dt-&gt;hour;
      tmdata.tm_min  = dt-&gt;minute;
      tmdata.tm_sec  = dt-&gt;second;

      dt-&gt;unix = gmtime(&amp;tmdata);

     /*
      * Assign custom node data and destroy
      * function pointers...
      */

      mxmlSetCustom(node, data, destroy);

     /*
      * Return with no errors...
      */

      return (0);
    }
</PRE>
<P>The function itself can return 0 on success or -1 if it is unable to
 decode the custom data or the data contains an error. Custom data nodes
 contain a <TT>void</TT> pointer to the allocated custom data for the
 node and a pointer to a destructor function which will free the custom
 data when the node is deleted.</P>

<!-- NEED 15 -->
<P>The save callback receives the node pointer and returns an allocated
 string containing the custom data value. The following save callback
 could be used for our ISO date/time type:</P>
<PRE>
    char *
    save_custom(mxml_node_t *node)
    {
      char data[255];
      iso_date_time_t *dt;


      dt = (iso_date_time_t *)mxmlGetCustom(node);

      snprintf(data, sizeof(data),
               &quot;%04u-%02u-%02uT%02u:%02u:%02uZ&quot;,
               dt-&gt;year, dt-&gt;month, dt-&gt;day,
               dt-&gt;hour, dt-&gt;minute, dt-&gt;second);

      return (strdup(data));
    }
</PRE>
<P>You register the callback functions using the <A href="reference.html#mxmlSetCustomHandlers">
<TT>mxmlSetCustomHandlers()</TT></A> function:</P>
<PRE>
    mxmlSetCustomHandlers(<B>load_custom</B>,
                          <B>save_custom</B>);
</PRE>

<!-- NEED 20 -->
<H2><A NAME="4_4">Changing Node Values</A></H2>
<P>All of the examples so far have concentrated on creating and loading
 new XML data nodes. Many applications, however, need to manipulate or
 change the nodes during their operation, so Mini-XML provides functions
 to change node values safely and without leaking memory.</P>
<P>Existing nodes can be changed using the <A href="reference.html#mxmlSetElement">
<TT>mxmlSetElement()</TT></A>, <A href="reference.html#mxmlSetInteger"><TT>
mxmlSetInteger()</TT></A>, <A href="reference.html#mxmlSetOpaque"><TT>
mxmlSetOpaque()</TT></A>, <A href="reference.html#mxmlSetReal"><TT>
mxmlSetReal()</TT></A>, <A href="reference.html#mxmlSetText"><TT>
mxmlSetText()</TT></A>, and <A href="reference.html#mxmlSetTextf"><TT>
mxmlSetTextf()</TT></A> functions. For example, use the following
 function call to change a text node to contain the text &quot;new&quot; with
 leading whitespace:</P>
<PRE>
    mxml_node_t *node;

    mxmlSetText(node, 1, &quot;new&quot;);
</PRE>
<H2><A NAME="4_5">Formatted Text</A></H2>
<P>The <A href="reference.html#mxmlNewTextf"><TT>mxmlNewTextf()</TT></A>
 and <A href="reference.html#mxmlSetTextf"><TT>mxmlSetTextf()</TT></A>
 functions create and change text nodes, respectively, using <TT>printf</TT>
-style format strings and arguments. For example, use the following
 function call to create a new text node containing a constructed
 filename:</P>
<PRE>
    mxml_node_t *node;

    node = mxmlNewTextf(node, 1, &quot;%s/%s&quot;,
                        path, filename);
</PRE>
<H2><A NAME="4_6">Indexing</A></H2>
<P>Mini-XML provides functions for managing indices of nodes. The
 current implementation provides the same functionality as <A href="reference.html#mxmlFindElement">
<TT>mxmlFindElement()</TT></A>. The advantage of using an index is that
 searching and enumeration of elements is significantly faster. The only
 disadvantage is that each index is a static snapshot of the XML
 document, so indices are not well suited to XML data that is updated
 more often than it is searched. The overhead of creating an index is
 approximately equal to walking the XML document tree. Nodes in the
 index are sorted by element name and attribute value.</P>
<P>Indices are stored in <A href="reference.html#mxml_index_t"><TT>
mxml_index_t</TT></A> structures. The <A href="reference.html#mxmlIndexNew">
<TT>mxmlIndexNew()</TT></A> function creates a new index:</P>
<PRE>
    mxml_node_t *tree;
    mxml_index_t *ind;

    ind = mxmlIndexNew(tree, &quot;element&quot;,
                       &quot;attribute&quot;);
</PRE>
<P>The first argument is the XML node tree to index. Normally this will
 be a pointer to the <TT>?xml</TT> element.</P>
<P>The second argument contains the element to index; passing <TT>NULL</TT>
 indexes all element nodes alphabetically.</P>
<P>The third argument contains the attribute to index; passing <TT>NULL</TT>
 causes only the element name to be indexed.</P>
<P>Once the index is created, the <A href="reference.html#mxmlIndexEnum">
<TT>mxmlIndexEnum()</TT></A>, <A href="reference.html#mxmlIndexFind"><TT>
mxmlIndexFind()</TT></A>, and <A href="reference.html#mxmlIndexReset"><TT>
mxmlIndexReset()</TT></A> functions are used to access the nodes in the
 index. The <A href="reference.html#mxmlIndexReset"><TT>mxmlIndexReset()</TT>
</A> function resets the &quot;current&quot; node pointer in the index, allowing
 you to do new searches and enumerations on the same index. Typically
 you will call this function prior to your calls to <A href="reference.html#mxmlIndexEnum">
<TT>mxmlIndexEnum()</TT></A> and <A href="reference.html#mxmlIndexFind"><TT>
mxmlIndexFind()</TT></A>.</P>
<P>The <A href="reference.html#mxmlIndexEnum"><TT>mxmlIndexEnum()</TT></A>
 function enumerates each of the nodes in the index and can be used in a
 loop as follows:</P>
<PRE>
    mxml_node_t *node;

    mxmlIndexReset(ind);

    while ((node = mxmlIndexEnum(ind)) != NULL)
    {
      // do something with node
    }
</PRE>
<P>The <A href="reference.html#mxmlIndexFind"><TT>mxmlIndexFind()</TT></A>
 function locates the next occurrence of the named element and attribute
 value in the index. It can be used to find all matching elements in an
 index, as follows:</P>
<PRE>
    mxml_node_t *node;

    mxmlIndexReset(ind);

    while ((node = mxmlIndexFind(ind, &quot;element&quot;,
                                 &quot;attr-value&quot;))
                != NULL)
    {
      // do something with node
    }
</PRE>
<P>The second and third arguments represent the element name and
 attribute value, respectively. A <TT>NULL</TT> pointer is used to
 return all elements or attributes in the index. Passing <TT>NULL</TT>
 for both the element name and attribute value is equivalent to calling <TT>
mxmlIndexEnum</TT>.</P>
<P>When you are done using the index, delete it using the <A href="#mxmlIndexDelete()">
<TT>mxmlIndexDelete()</TT></A> function:</P>
<PRE>
    mxmlIndexDelete(ind);
</PRE>
<H2><A NAME="4_7">SAX (Stream) Loading of Documents</A></H2>
<P>Mini-XML supports an implementation of the Simple API for XML (SAX)
 which allows you to load and process an XML document as a stream of
 nodes. Aside from allowing you to process XML documents of any size,
 the Mini-XML implementation also allows you to retain portions of the
 document in memory for later processing.</P>
<P>The <A href="#mxmlSAXLoad"><TT>mxmlSAXLoadFd</TT></A>, <A href="reference.html#mxmlSAXLoadFile">
<TT>mxmlSAXLoadFile</TT></A>, and <A href="reference.html#mxmlSAXLoadString">
<TT>mxmlSAXLoadString</TT></A> functions provide the SAX loading APIs.
 Each function works like the corresponding <TT>mxmlLoad</TT> function
 but uses a callback to process each node as it is read.</P>
<P>The callback function receives the node, an event code, and a user
 data pointer you supply:</P>
<PRE>
    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      ... do something ...
    }
</PRE>
<P>The event will be one of the following:</P>
<UL>
<LI><TT>MXML_SAX_CDATA</TT> - CDATA was just read</LI>
<LI><TT>MXML_SAX_COMMENT</TT> - A comment was just read</LI>
<LI><TT>MXML_SAX_DATA</TT> - Data (custom, integer, opaque, real, or
 text) was just read</LI>
<LI><TT>MXML_SAX_DIRECTIVE</TT> - A processing directive was just read</LI>
<LI><TT>MXML_SAX_ELEMENT_CLOSE</TT> - A close element was just read (<TT>
&lt;/element&gt;</TT>)</LI>
<LI><TT>MXML_SAX_ELEMENT_OPEN</TT> - An open element was just read (<TT>
&lt;element&gt;</TT>)</LI>
</UL>
<P>Elements are<EM> released</EM> after the close element is processed.
 All other nodes are released after they are processed. The SAX callback
 can<EM> retain</EM> the node using the <A href="reference.html#mxmlRetain">
<TT>mxmlRetain</TT></A> function. For example, the following SAX
 callback will retain all nodes, effectively simulating a normal
 in-memory load:</P>
<PRE>
    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      if (event != MXML_SAX_ELEMENT_CLOSE)
        mxmlRetain(node);
    }
</PRE>
<P>More typically the SAX callback will only retain a small portion of
 the document that is needed for post-processing. For example, the
 following SAX callback will retain the title and headings in an XHTML
 file. It also retains the (parent) elements like <TT>&lt;html&gt;</TT>, <TT>
&lt;head&gt;</TT>, and <TT>&lt;body&gt;</TT>, and processing directives like <TT>
&lt;?xml ... ?&gt;</TT> and <TT>&lt;!DOCTYPE ... &gt;</TT>:</P>

<!-- NEED 10 -->
<PRE>
    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      if (event == MXML_SAX_ELEMENT_OPEN)
      {
       /*
        * Retain headings and titles...
        */

        char *name = mxmlGetElement(node);

        if (!strcmp(name, &quot;html&quot;) ||
            !strcmp(name, &quot;head&quot;) ||
            !strcmp(name, &quot;title&quot;) ||
            !strcmp(name, &quot;body&quot;) ||
            !strcmp(name, &quot;h1&quot;) ||
            !strcmp(name, &quot;h2&quot;) ||
            !strcmp(name, &quot;h3&quot;) ||
            !strcmp(name, &quot;h4&quot;) ||
            !strcmp(name, &quot;h5&quot;) ||
            !strcmp(name, &quot;h6&quot;))
          mxmlRetain(node);
      }
      else if (event == MXML_SAX_DIRECTIVE)
        mxmlRetain(node);
      else if (event == MXML_SAX_DATA)
      {
        if (mxmlGetRefCount(mxmlGetParent(node)) &gt; 1)
        {
         /*
          * If the parent was retained, then retain
          * this data node as well.
          */

          mxmlRetain(node);
        }
      }
    }
</PRE>
<P>The resulting skeleton document tree can then be searched just like
 one loaded using the <TT>mxmlLoad</TT> functions. For example, a filter
 that reads an XHTML document from stdin and then shows the title and
 headings in the document would look like:</P>
<PRE>
    mxml_node_t *doc, *title, *body, *heading;

    doc = mxmlSAXLoadFd(NULL, 0,
                        MXML_TEXT_CALLBACK,
                        <B>sax_cb</B>, NULL);

    title = mxmlFindElement(doc, doc, &quot;title&quot;,
                            NULL, NULL,
                            MXML_DESCEND);

    if (title)
      print_children(title);

    body = mxmlFindElement(doc, doc, &quot;body&quot;,
                           NULL, NULL,
                           MXML_DESCEND);

    if (body)
    {
      for (heading = mxmlGetFirstChild(body);
           heading;
           heading = mxmlGetNextSibling(heading))
        print_children(heading);
    }
</PRE>
<HR NOSHADE>
<A HREF="index.html">Contents</A>
<A HREF="basics.html">Previous</A>
<A HREF="mxmldoc.html">Next</A>
</BODY>
</HTML>