From bf73da67822076ab6246871a7b15c778a11b0712 Mon Sep 17 00:00:00 2001
From: Michael R Sweet
All of the examples so far have concentrated on creating and @@ -483,5 +484,147 @@ function:
mxmlIndexDelete(ind); +Mini-XML supports an implementation of the Simple API for XML +(SAX) which allows you to load and process an XML document as a +stream of nodes. Aside from allowing you to process XML documents of +any size, the Mini-XML implementation also allows you to retain +portions of the document in memory for later processing.
+ +The mxmlSAXLoadFd, mxmlSAXLoadFile, and mxmlSAXLoadString functions +provide the SAX loading APIs. Each function works like the +corresponding mxmlLoad function but uses a callback to +process each node as it is read.
+ +The callback function receives the node, an event code, and +a user data pointer you supply:
+ ++ void + sax_cb(mxml_node_t *node, mxml_sax_event_t event, + void *data) + { + ... do something ... + } ++ +
The event will be one of the following:
+ +Elements are released after the close element is +processed. All other nodes are released after they are processed. +The SAX callback can retain the node using the mxmlRetain function. For example, +the following SAX callback will retain all nodes, effectively +simulating a normal in-memory load:
+ ++ void + sax_cb(mxml_node_t *node, mxml_sax_event_t event, + void *data) + { + if (event != MXML_SAX_ELEMENT_CLOSE) + mxmlRetain(node); + } ++ +
More typically the SAX callback will only retain a small portion +of the document that is needed for post-processing. For example, the +following SAX callback will retain the title and headings in an +XHTML file. It also retains the (parent) elements like <html>, <head>, and <body>, and processing +directives like <?xml ... ?> and <!DOCTYPE ... >:
+ + ++ void + sax_cb(mxml_node_t *node, + mxml_sax_event_t event, + void *data) + { + if (event == MXML_SAX_ELEMENT_OPEN) + { + /* + * Retain headings and titles... + */ + + const char *name = node->value.element.name; + + if (!strcmp(name, "html") || + !strcmp(name, "head") || + !strcmp(name, "title") || + !strcmp(name, "body") || + !strcmp(name, "h1") || + !strcmp(name, "h2") || + !strcmp(name, "h3") || + !strcmp(name, "h4") || + !strcmp(name, "h5") || + !strcmp(name, "h6")) + mxmlRetain(node); + } + else if (event == MXML_SAX_DIRECTIVE) + mxmlRetain(node); + else if (event == MXML_SAX_DATA && + node->parent->ref_count > 1) + { + /* + * If the parent was retained, then retain + * this data node as well. + */ + + mxmlRetain(node); + } + } ++ +
The resulting skeleton document tree can then be searched just +like one loaded using the mxmlLoad functions. For example, +a filter that reads an XHTML document from stdin and then shows the +title and headings in the document would look like:
+ ++ mxml_node_t *doc, *title, *body, *heading; + + doc = mxmlSAXLoadFd(NULL, 0, + MXML_TEXT_CALLBACK, + sax_cb, NULL); + + title = mxmlFindElement(doc, doc, "title", + NULL, NULL, + MXML_DESCEND); + + if (title) + print_children(title); + + body = mxmlFindElement(doc, doc, "body", + NULL, NULL, + MXML_DESCEND); + + if (body) + { + for (heading = body->child; + heading; + heading = heading->next) + print_children(heading); + } ++