mirror of
https://github.com/michaelrsweet/mxml.git
synced 2025-05-11 15:32:08 +00:00
165 lines
5.7 KiB
HTML
165 lines
5.7 KiB
HTML
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
|
|
<HTML>
|
|
<HEAD>
|
|
<TITLE>Mini-XML Programmers Manual, Version 2.3</TITLE>
|
|
<META NAME="author" CONTENT="Michael R. Sweet">
|
|
<META NAME="copyright" CONTENT="Copyright 2003-2007">
|
|
<META HTTP-EQUIV="Content-Type" CONTENT="text/html; CHARSET=iso-iso-8859-1">
|
|
<LINK REL="Start" HREF="index.html">
|
|
<LINK REL="Contents" HREF="index.html">
|
|
<LINK REL="Prev" HREF="Indexing.html">
|
|
<LINK REL="Next" HREF="UsingthemxmldocUtility.html">
|
|
<STYLE TYPE="text/css"><!--
|
|
BODY { font-family: sans-serif }
|
|
H1 { font-family: sans-serif }
|
|
H2 { font-family: sans-serif }
|
|
H3 { font-family: sans-serif }
|
|
H4 { font-family: sans-serif }
|
|
H5 { font-family: sans-serif }
|
|
H6 { font-family: sans-serif }
|
|
SUB { font-size: smaller }
|
|
SUP { font-size: smaller }
|
|
PRE { font-family: monospace }
|
|
A { text-decoration: none }
|
|
--></STYLE>
|
|
</HEAD>
|
|
<BODY>
|
|
<A HREF="index.html">Contents</A>
|
|
<A HREF="Indexing.html">Previous</A>
|
|
<A HREF="UsingthemxmldocUtility.html">Next</A>
|
|
<HR NOSHADE>
|
|
<H2><A NAME="4_7">SAX (Stream) Loading of Documents</A></H2>
|
|
<P>Mini-XML supports an implementation of the Simple API for XML (SAX)
|
|
which allows you to load and process an XML document as a stream of
|
|
nodes. Aside from allowing you to process XML documents of any size,
|
|
the Mini-XML implementation also allows you to retain portions of the
|
|
document in memory for later processing.</P>
|
|
<P>The <A href="#mxmlSAXLoad"><TT>mxmlSAXLoadFd</TT></A>, <A href="MiniXML23mxmlSAXLoadFile.html#mxmlSAXLoadFile">
|
|
<TT>mxmlSAXLoadFile</TT></A>, and <A href="MiniXML23mxmlSAXLoadString.html#mxmlSAXLoadString">
|
|
<TT>mxmlSAXLoadString</TT></A> functions provide the SAX loading APIs.
|
|
Each function works like the corresponding <TT>mxmlLoad</TT> function
|
|
but uses a callback to process each node as it is read.</P>
|
|
<P>The callback function receives the node, an event code, and a user
|
|
data pointer you supply:</P>
|
|
<PRE>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
... do something ...
|
|
}
|
|
</PRE>
|
|
<P>The event will be one of the following:</P>
|
|
<UL>
|
|
<LI><TT>MXML_SAX_CDATA</TT> - CDATA was just read</LI>
|
|
<LI><TT>MXML_SAX_COMMENT</TT> - A comment was just read</LI>
|
|
<LI><TT>MXML_SAX_DATA</TT> - Data (custom, integer, opaque, real, or
|
|
text) was just read</LI>
|
|
<LI><TT>MXML_SAX_DIRECTIVE</TT> - A processing directive was just read</LI>
|
|
<LI><TT>MXML_SAX_ELEMENT_CLOSE</TT> - An open element was just read (<TT>
|
|
<element></TT>)</LI>
|
|
<LI><TT>MXML_SAX_ELEMENT_OPEN</TT> - A close element was just read (<TT>
|
|
</element></TT>)</LI>
|
|
</UL>
|
|
<P>Elements are<EM> released</EM> after the close element is processed.
|
|
All other nodes are released after they are processed. The SAX callback
|
|
can<EM> retain</EM> the node using the <A href="MiniXML23mxmlRetain.html#mxmlRetain">
|
|
<TT>mxmlRetain</TT></A> function. For example, the following SAX
|
|
callback will retain all nodes, effectively simulating a normal
|
|
in-memory load:</P>
|
|
<PRE>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
if (event != MXML_SAX_ELEMENT_CLOSE)
|
|
mxmlRetain(node);
|
|
}
|
|
</PRE>
|
|
<P>More typically the SAX callback will only retain a small portion of
|
|
the document that is needed for post-processing. For example, the
|
|
following SAX callback will retain the title and headings in an XHTML
|
|
file. It also retains the (parent) elements like <TT><html></TT>, <TT>
|
|
<head></TT>, and <TT><body></TT>, and processing directives like <TT>
|
|
<?xml ... ?></TT> and <TT><!DOCTYPE ... ></TT>:</P>
|
|
|
|
<!-- NEED 10 -->
|
|
<PRE>
|
|
void
|
|
sax_cb(mxml_node_t *node,
|
|
mxml_sax_event_t event,
|
|
void *data)
|
|
{
|
|
if (event == MXML_SAX_ELEMENT_OPEN)
|
|
{
|
|
/*
|
|
* Retain headings and titles...
|
|
*/
|
|
|
|
char *name = node->value.element.name;
|
|
|
|
if (!strcmp(name, "html") ||
|
|
!strcmp(name, "head") ||
|
|
!strcmp(name, "title") ||
|
|
!strcmp(name, "body") ||
|
|
!strcmp(name, "h1") ||
|
|
!strcmp(name, "h2") ||
|
|
!strcmp(name, "h3") ||
|
|
!strcmp(name, "h4") ||
|
|
!strcmp(name, "h5") ||
|
|
!strcmp(name, "h6"))
|
|
mxmlRetain(node);
|
|
}
|
|
else if (event == MXML_SAX_DIRECTIVE)
|
|
mxmlRetain(node);
|
|
else if (event == MXML_SAX_DATA &&
|
|
node->parent->ref_count > 1)
|
|
{
|
|
/*
|
|
* If the parent was retained, then retain
|
|
* this data node as well.
|
|
*/
|
|
|
|
mxmlRetain(node);
|
|
}
|
|
}
|
|
</PRE>
|
|
<P>The resulting skeleton document tree can then be searched just like
|
|
one loaded using the <TT>mxmlLoad</TT> functions. For example, a filter
|
|
that reads an XHTML document from stdin and then shows the title and
|
|
headings in the document would look like:</P>
|
|
<PRE>
|
|
mxml_node_t *doc, *title, *body, *heading;
|
|
|
|
doc = mxmlSAXLoadFd(NULL, 0,
|
|
MXML_TEXT_CALLBACK,
|
|
<B>sax_cb</B>, NULL);
|
|
|
|
title = mxmlFindElement(doc, doc, "title",
|
|
NULL, NULL,
|
|
MXML_DESCEND);
|
|
|
|
if (title)
|
|
print_children(title);
|
|
|
|
body = mxmlFindElement(doc, doc, "body",
|
|
NULL, NULL,
|
|
MXML_DESCEND);
|
|
|
|
if (body)
|
|
{
|
|
for (heading = body->child;
|
|
heading;
|
|
heading = heading->next)
|
|
print_children(heading);
|
|
}
|
|
</PRE>
|
|
<HR NOSHADE>
|
|
<A HREF="index.html">Contents</A>
|
|
<A HREF="Indexing.html">Previous</A>
|
|
<A HREF="UsingthemxmldocUtility.html">Next</A>
|
|
</BODY>
|
|
</HTML>
|