mxml/www/docfiles/advanced.html
2009-05-17 17:17:46 +00:00

579 lines
20 KiB
HTML

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<HTML>
<HEAD>
<TITLE>Mini-XML Programmers Manual, Version 2.6</TITLE>
<META NAME="author" CONTENT="Michael R. Sweet">
<META NAME="copyright" CONTENT="Copyright 2003-2009">
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-iso-8859-1">
<LINK REL="Start" HREF="index.html">
<LINK REL="Contents" HREF="index.html">
<LINK REL="Prev" HREF="basics.html">
<LINK REL="Next" HREF="mxmldoc.html">
<STYLE TYPE="text/css"><!--
BODY { font-family: sans-serif }
H1 { font-family: sans-serif }
H2 { font-family: sans-serif }
H3 { font-family: sans-serif }
H4 { font-family: sans-serif }
H5 { font-family: sans-serif }
H6 { font-family: sans-serif }
SUB { font-size: smaller }
SUP { font-size: smaller }
PRE { font-family: monospace }
A { text-decoration: none }
--></STYLE>
</HEAD>
<BODY>
<A HREF="index.html">Contents</A>
<A HREF="basics.html">Previous</A>
<A HREF="mxmldoc.html">Next</A>
<HR NOSHADE>
<H1 align="right"><A name="ADVANCED"><IMG align="right" alt="3" height="100"
hspace="10" src="3.gif" width="100"></A>More Mini-XML Programming
Techniques</H1>
<P>This chapter shows additional ways to use the Mini-XML library in
your programs.</P>
<H2><A name="LOAD_CALLBACKS">Load Callbacks</A></H2>
<P><A href="#LOAD_XML">Chapter 2</A> introduced the <A href="reference.html#mxmlLoadFile">
<TT>mxmlLoadFile()</TT></A> and <A href="reference.html#mxmlLoadString"><TT>
mxmlLoadString()</TT></A> functions. The last argument to these
functions is a callback function which is used to determine the value
type of each data node in an XML document.</P>
<P>Mini-XML defines several standard callbacks for simple XML data
files:</P>
<UL>
<LI><TT>MXML_INTEGER_CALLBACK</TT> - All data nodes contain
whitespace-separated integers.</LI>
<LI><TT>MXML_OPAQUE_CALLBACK</TT> - All data nodes contain opaque
strings (&quot;CDATA&quot;).</LI>
<LI><TT>MXML_REAL_CALLBACK</TT> - All data nodes contain
whitespace-separated floating-point numbers.</LI>
<LI><TT>MXML_TEXT_CALLBACK</TT> - All data nodes contain
whitespace-separated strings.</LI>
</UL>
<P>You can provide your own callback functions for more complex XML
documents. Your callback function will receive a pointer to the current
element node and must return the value type of the immediate children
for that element node: <TT>MXML_INTEGER</TT>, <TT>MXML_OPAQUE</TT>, <TT>
MXML_REAL</TT>, or <TT>MXML_TEXT</TT>. The function is called<I> after</I>
the element and its attributes have been read, so you can look at the
element name, attributes, and attribute values to determine the proper
value type to return.</P>
<!-- NEED 2in -->
<P>The following callback function looks for an attribute named &quot;type&quot;
or the element name to determine the value type for its child nodes:</P>
<PRE>
mxml_type_t
type_cb(mxml_node_t *node)
{
const char *type;
/*
* You can lookup attributes and/or use the
* element name, hierarchy, etc...
*/
type = mxmlElementGetAttr(node, &quot;type&quot;);
if (type == NULL)
type = node-&gt;value.element.name;
if (!strcmp(type, &quot;integer&quot;))
return (MXML_INTEGER);
else if (!strcmp(type, &quot;opaque&quot;))
return (MXML_OPAQUE);
else if (!strcmp(type, &quot;real&quot;))
return (MXML_REAL);
else
return (MXML_TEXT);
}
</PRE>
<P>To use this callback function, simply use the name when you call any
of the load functions:</P>
<PRE>
FILE *fp;
mxml_node_t *tree;
fp = fopen(&quot;filename.xml&quot;, &quot;r&quot;);
tree = mxmlLoadFile(NULL, fp, <B>type_cb</B>);
fclose(fp);
</PRE>
<H2><A name="SAVE_CALLBACKS">Save Callbacks</A></H2>
<P><A href="#LOAD_XML">Chapter 2</A> also introduced the <A href="reference.html#mxmlSaveFile">
<TT>mxmlSaveFile()</TT></A>, <A href="reference.html#mxmlSaveString"><TT>
mxmlSaveString()</TT></A>, and <A href="reference.html#mxmlSaveAllocString">
<TT>mxmlSaveAllocString()</TT></A> functions. The last argument to these
functions is a callback function which is used to automatically insert
whitespace in an XML document.</P>
<P>Your callback function will be called up to four times for each
element node with a pointer to the node and a &quot;where&quot; value of <TT>
MXML_WS_BEFORE_OPEN</TT>, <TT>MXML_WS_AFTER_OPEN</TT>, <TT>
MXML_WS_BEFORE_CLOSE</TT>, or <TT>MXML_WS_AFTER_CLOSE</TT>. The callback
function should return <TT>NULL</TT> if no whitespace should be added
and the string to insert (spaces, tabs, carriage returns, and newlines)
otherwise.</P>
<P>The following whitespace callback can be used to add whitespace to
XHTML output to make it more readable in a standard text editor:</P>
<PRE>
const char *
whitespace_cb(mxml_node_t *node,
int where)
{
const char *name;
/*
* We can conditionally break to a new line
* before or after any element. These are
* just common HTML elements...
*/
name = node-&gt;value.element.name;
if (!strcmp(name, &quot;html&quot;) ||
!strcmp(name, &quot;head&quot;) ||
!strcmp(name, &quot;body&quot;) ||
!strcmp(name, &quot;pre&quot;) ||
!strcmp(name, &quot;p&quot;) ||
!strcmp(name, &quot;h1&quot;) ||
!strcmp(name, &quot;h2&quot;) ||
!strcmp(name, &quot;h3&quot;) ||
!strcmp(name, &quot;h4&quot;) ||
!strcmp(name, &quot;h5&quot;) ||
!strcmp(name, &quot;h6&quot;))
{
/*
* Newlines before open and after
* close...
*/
if (where == MXML_WS_BEFORE_OPEN ||
where == MXML_WS_AFTER_CLOSE)
return (&quot;\n&quot;);
}
else if (!strcmp(name, &quot;dl&quot;) ||
!strcmp(name, &quot;ol&quot;) ||
!strcmp(name, &quot;ul&quot;))
{
/*
* Put a newline before and after list
* elements...
*/
return (&quot;\n&quot;);
}
else if (!strcmp(name, &quot;dd&quot;) ||
!strcmp(name, &quot;dt&quot;) ||
!strcmp(name, &quot;li&quot;))
{
/*
* Put a tab before &lt;li&gt;'s, * &lt;dd&gt;'s,
* and &lt;dt&gt;'s, and a newline after them...
*/
if (where == MXML_WS_BEFORE_OPEN)
return (&quot;\t&quot;);
else if (where == MXML_WS_AFTER_CLOSE)
return (&quot;\n&quot;);
}
/*
* Return NULL for no added whitespace...
*/
return (NULL);
}
</PRE>
<P>To use this callback function, simply use the name when you call any
of the save functions:</P>
<PRE>
FILE *fp;
mxml_node_t *tree;
fp = fopen(&quot;filename.xml&quot;, &quot;w&quot;);
mxmlSaveFile(tree, fp, <B>whitespace_cb</B>);
fclose(fp);
</PRE>
<!-- NEED 10 -->
<H2><A NAME="4_3">Custom Data Types</A></H2>
<P>Mini-XML supports custom data types via global load and save
callbacks. Only a single set of callbacks can be active at any time,
however your callbacks can store additional information in order to
support multiple custom data types as needed. The <TT>MXML_CUSTOM</TT>
node type identifies custom data nodes.</P>
<P>The load callback receives a pointer to the current data node and a
string of opaque character data from the XML source with character
entities converted to the corresponding UTF-8 characters. For example,
if we wanted to support a custom date/time type whose value is encoded
as &quot;yyyy-mm-ddThh:mm:ssZ&quot; (ISO format), the load callback would look
like the following:</P>
<PRE>
typedef struct
{
unsigned year, /* Year */
month, /* Month */
day, /* Day */
hour, /* Hour */
minute, /* Minute */
second; /* Second */
time_t unix; /* UNIX time */
} iso_date_time_t;
int
load_custom(mxml_node_t *node,
const char *data)
{
iso_date_time_t *dt;
struct tm tmdata;
/*
* Allocate data structure...
*/
dt = calloc(1, sizeof(iso_date_time_t));
/*
* Try reading 6 unsigned integers from the
* data string...
*/
if (sscanf(data, &quot;%u-%u-%uT%u:%u:%uZ&quot;,
&amp;(dt-&gt;year), &amp;(dt-&gt;month),
&amp;(dt-&gt;day), &amp;(dt-&gt;hour),
&amp;(dt-&gt;minute),
&amp;(dt-&gt;second)) != 6)
{
/*
* Unable to read numbers, free the data
* structure and return an error...
*/
free(dt);
return (-1);
}
/*
* Range check values...
*/
if (dt-&gt;month &lt;1 || dt-&gt;month &gt; 12 ||
dt-&gt;day &lt;1 || dt-&gt;day &gt; 31 ||
dt-&gt;hour &lt;0 || dt-&gt;hour &gt; 23 ||
dt-&gt;minute &lt;0 || dt-&gt;minute &gt; 59 ||
dt-&gt;second &lt;0 || dt-&gt;second &gt; 59)
{
/*
* Date information is out of range...
*/
free(dt);
return (-1);
}
/*
* Convert ISO time to UNIX time in
* seconds...
*/
tmdata.tm_year = dt-&gt;year - 1900;
tmdata.tm_mon = dt-&gt;month - 1;
tmdata.tm_day = dt-&gt;day;
tmdata.tm_hour = dt-&gt;hour;
tmdata.tm_min = dt-&gt;minute;
tmdata.tm_sec = dt-&gt;second;
dt-&gt;unix = gmtime(&amp;tmdata);
/*
* Assign custom node data and destroy
* function pointers...
*/
node-&gt;value.custom.data = dt;
node-&gt;value.custom.destroy = free;
/*
* Return with no errors...
*/
return (0);
}
</PRE>
<P>The function itself can return 0 on success or -1 if it is unable to
decode the custom data or the data contains an error. Custom data nodes
contain a <TT>void</TT> pointer to the allocated custom data for the
node and a pointer to a destructor function which will free the custom
data when the node is deleted.</P>
<P>The save callback receives the node pointer and returns an allocated
string containing the custom data value. The following save callback
could be used for our ISO date/time type:</P>
<PRE>
char *
save_custom(mxml_node_t *node)
{
char data[255];
iso_date_time_t *dt;
dt = (iso_date_time_t *)node-&gt;custom.data;
snprintf(data, sizeof(data),
&quot;%04u-%02u-%02uT%02u:%02u:%02uZ&quot;,
dt-&gt;year, dt-&gt;month, dt-&gt;day,
dt-&gt;hour, dt-&gt;minute, dt-&gt;second);
return (strdup(data));
}
</PRE>
<P>You register the callback functions using the <A href="reference.html#mxmlSetCustomHandlers">
<TT>mxmlSetCustomHandlers()</TT></A> function:</P>
<PRE>
mxmlSetCustomHandlers(<B>load_custom</B>,
<B>save_custom</B>);
</PRE>
<!-- NEED 20 -->
<H2><A NAME="4_4">Changing Node Values</A></H2>
<P>All of the examples so far have concentrated on creating and loading
new XML data nodes. Many applications, however, need to manipulate or
change the nodes during their operation, so Mini-XML provides functions
to change node values safely and without leaking memory.</P>
<P>Existing nodes can be changed using the <A href="reference.html#mxmlSetElement">
<TT>mxmlSetElement()</TT></A>, <A href="reference.html#mxmlSetInteger"><TT>
mxmlSetInteger()</TT></A>, <A href="reference.html#mxmlSetOpaque"><TT>
mxmlSetOpaque()</TT></A>, <A href="reference.html#mxmlSetReal"><TT>
mxmlSetReal()</TT></A>, <A href="reference.html#mxmlSetText"><TT>
mxmlSetText()</TT></A>, and <A href="reference.html#mxmlSetTextf"><TT>
mxmlSetTextf()</TT></A> functions. For example, use the following
function call to change a text node to contain the text &quot;new&quot; with
leading whitespace:</P>
<PRE>
mxml_node_t *node;
mxmlSetText(node, 1, &quot;new&quot;);
</PRE>
<H2><A NAME="4_5">Formatted Text</A></H2>
<P>The <A href="reference.html#mxmlNewTextf"><TT>mxmlNewTextf()</TT></A>
and <A href="reference.html#mxmlSetTextf"><TT>mxmlSetTextf()</TT></A>
functions create and change text nodes, respectively, using <TT>printf</TT>
-style format strings and arguments. For example, use the following
function call to create a new text node containing a constructed
filename:</P>
<PRE>
mxml_node_t *node;
node = mxmlNewTextf(node, 1, &quot;%s/%s&quot;,
path, filename);
</PRE>
<H2><A NAME="4_6">Indexing</A></H2>
<P>Mini-XML provides functions for managing indices of nodes. The
current implementation provides the same functionality as <A href="reference.html#mxmlFindElement">
<TT>mxmlFindElement()</TT></A>. The advantage of using an index is that
searching and enumeration of elements is significantly faster. The only
disadvantage is that each index is a static snapshot of the XML
document, so indices are not well suited to XML data that is updated
more often than it is searched. The overhead of creating an index is
approximately equal to walking the XML document tree. Nodes in the
index are sorted by element name and attribute value.</P>
<P>Indices are stored in <A href="reference.html#mxml_index_t"><TT>
mxml_index_t</TT></A> structures. The <A href="reference.html#mxmlIndexNew">
<TT>mxmlIndexNew()</TT></A> function creates a new index:</P>
<PRE>
mxml_node_t *tree;
mxml_index_t *ind;
ind = mxmlIndexNew(tree, &quot;element&quot;,
&quot;attribute&quot;);
</PRE>
<P>The first argument is the XML node tree to index. Normally this will
be a pointer to the <TT>?xml</TT> element.</P>
<P>The second argument contains the element to index; passing <TT>NULL</TT>
indexes all element nodes alphabetically.</P>
<P>The third argument contains the attribute to index; passing <TT>NULL</TT>
causes only the element name to be indexed.</P>
<P>Once the index is created, the <A href="reference.html#mxmlIndexEnum">
<TT>mxmlIndexEnum()</TT></A>, <A href="reference.html#mxmlIndexFind"><TT>
mxmlIndexFind()</TT></A>, and <A href="reference.html#mxmlIndexReset"><TT>
mxmlIndexReset()</TT></A> functions are used to access the nodes in the
index. The <A href="reference.html#mxmlIndexReset"><TT>mxmlIndexReset()</TT>
</A> function resets the &quot;current&quot; node pointer in the index, allowing
you to do new searches and enumerations on the same index. Typically
you will call this function prior to your calls to <A href="reference.html#mxmlIndexEnum">
<TT>mxmlIndexEnum()</TT></A> and <A href="reference.html#mxmlIndexFind"><TT>
mxmlIndexFind()</TT></A>.</P>
<P>The <A href="reference.html#mxmlIndexEnum"><TT>mxmlIndexEnum()</TT></A>
function enumerates each of the nodes in the index and can be used in a
loop as follows:</P>
<PRE>
mxml_node_t *node;
mxmlIndexReset(ind);
while ((node = mxmlIndexEnum(ind)) != NULL)
{
// do something with node
}
</PRE>
<P>The <A href="reference.html#mxmlIndexFind"><TT>mxmlIndexFind()</TT></A>
function locates the next occurrence of the named element and attribute
value in the index. It can be used to find all matching elements in an
index, as follows:</P>
<PRE>
mxml_node_t *node;
mxmlIndexReset(ind);
while ((node = mxmlIndexFind(ind, &quot;element&quot;,
&quot;attr-value&quot;))
!= NULL)
{
// do something with node
}
</PRE>
<P>The second and third arguments represent the element name and
attribute value, respectively. A <TT>NULL</TT> pointer is used to
return all elements or attributes in the index. Passing <TT>NULL</TT>
for both the element name and attribute value is equivalent to calling <TT>
mxmlIndexEnum</TT>.</P>
<P>When you are done using the index, delete it using the <A href="#mxmlIndexDelete()">
<TT>mxmlIndexDelete()</TT></A> function:</P>
<PRE>
mxmlIndexDelete(ind);
</PRE>
<H2><A NAME="4_7">SAX (Stream) Loading of Documents</A></H2>
<P>Mini-XML supports an implementation of the Simple API for XML (SAX)
which allows you to load and process an XML document as a stream of
nodes. Aside from allowing you to process XML documents of any size,
the Mini-XML implementation also allows you to retain portions of the
document in memory for later processing.</P>
<P>The <A href="#mxmlSAXLoad"><TT>mxmlSAXLoadFd</TT></A>, <A href="reference.html#mxmlSAXLoadFile">
<TT>mxmlSAXLoadFile</TT></A>, and <A href="reference.html#mxmlSAXLoadString">
<TT>mxmlSAXLoadString</TT></A> functions provide the SAX loading APIs.
Each function works like the corresponding <TT>mxmlLoad</TT> function
but uses a callback to process each node as it is read.</P>
<P>The callback function receives the node, an event code, and a user
data pointer you supply:</P>
<PRE>
void
sax_cb(mxml_node_t *node,
mxml_sax_event_t event,
void *data)
{
... do something ...
}
</PRE>
<P>The event will be one of the following:</P>
<UL>
<LI><TT>MXML_SAX_CDATA</TT> - CDATA was just read</LI>
<LI><TT>MXML_SAX_COMMENT</TT> - A comment was just read</LI>
<LI><TT>MXML_SAX_DATA</TT> - Data (custom, integer, opaque, real, or
text) was just read</LI>
<LI><TT>MXML_SAX_DIRECTIVE</TT> - A processing directive was just read</LI>
<LI><TT>MXML_SAX_ELEMENT_CLOSE</TT> - A close element was just read (<TT>
&lt;/element&gt;</TT>)</LI>
<LI><TT>MXML_SAX_ELEMENT_OPEN</TT> - An open element was just read (<TT>
&lt;element&gt;</TT>)</LI>
</UL>
<P>Elements are<EM> released</EM> after the close element is processed.
All other nodes are released after they are processed. The SAX callback
can<EM> retain</EM> the node using the <A href="reference.html#mxmlRetain">
<TT>mxmlRetain</TT></A> function. For example, the following SAX
callback will retain all nodes, effectively simulating a normal
in-memory load:</P>
<PRE>
void
sax_cb(mxml_node_t *node,
mxml_sax_event_t event,
void *data)
{
if (event != MXML_SAX_ELEMENT_CLOSE)
mxmlRetain(node);
}
</PRE>
<P>More typically the SAX callback will only retain a small portion of
the document that is needed for post-processing. For example, the
following SAX callback will retain the title and headings in an XHTML
file. It also retains the (parent) elements like <TT>&lt;html&gt;</TT>, <TT>
&lt;head&gt;</TT>, and <TT>&lt;body&gt;</TT>, and processing directives like <TT>
&lt;?xml ... ?&gt;</TT> and <TT>&lt;!DOCTYPE ... &gt;</TT>:</P>
<!-- NEED 10 -->
<PRE>
void
sax_cb(mxml_node_t *node,
mxml_sax_event_t event,
void *data)
{
if (event == MXML_SAX_ELEMENT_OPEN)
{
/*
* Retain headings and titles...
*/
char *name = node-&gt;value.element.name;
if (!strcmp(name, &quot;html&quot;) ||
!strcmp(name, &quot;head&quot;) ||
!strcmp(name, &quot;title&quot;) ||
!strcmp(name, &quot;body&quot;) ||
!strcmp(name, &quot;h1&quot;) ||
!strcmp(name, &quot;h2&quot;) ||
!strcmp(name, &quot;h3&quot;) ||
!strcmp(name, &quot;h4&quot;) ||
!strcmp(name, &quot;h5&quot;) ||
!strcmp(name, &quot;h6&quot;))
mxmlRetain(node);
}
else if (event == MXML_SAX_DIRECTIVE)
mxmlRetain(node);
else if (event == MXML_SAX_DATA &amp;&amp;
node-&gt;parent-&gt;ref_count &gt; 1)
{
/*
* If the parent was retained, then retain
* this data node as well.
*/
mxmlRetain(node);
}
}
</PRE>
<P>The resulting skeleton document tree can then be searched just like
one loaded using the <TT>mxmlLoad</TT> functions. For example, a filter
that reads an XHTML document from stdin and then shows the title and
headings in the document would look like:</P>
<PRE>
mxml_node_t *doc, *title, *body, *heading;
doc = mxmlSAXLoadFd(NULL, 0,
MXML_TEXT_CALLBACK,
<B>sax_cb</B>, NULL);
title = mxmlFindElement(doc, doc, &quot;title&quot;,
NULL, NULL,
MXML_DESCEND);
if (title)
print_children(title);
body = mxmlFindElement(doc, doc, &quot;body&quot;,
NULL, NULL,
MXML_DESCEND);
if (body)
{
for (heading = body-&gt;child;
heading;
heading = heading-&gt;next)
print_children(heading);
}
</PRE>
<HR NOSHADE>
<A HREF="index.html">Contents</A>
<A HREF="basics.html">Previous</A>
<A HREF="mxmldoc.html">Next</A>
</BODY>
</HTML>