Contents Previous Next

SAX (Stream) Loading of Documents

Mini-XML supports an implementation of the Simple API for XML (SAX) which allows you to load and process an XML document as a stream of nodes. Aside from allowing you to process XML documents of any size, the Mini-XML implementation also allows you to retain portions of the document in memory for later processing.

The mxmlSAXLoadFd, mxmlSAXLoadFile, and mxmlSAXLoadString functions provide the SAX loading APIs. Each function works like the corresponding mxmlLoad function but uses a callback to process each node as it is read.

The callback function receives the node, an event code, and a user data pointer you supply:

    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      ... do something ...
    }

The event will be one of the following:

Elements are released after the close element is processed. All other nodes are released after they are processed. The SAX callback can retain the node using the mxmlRetain function. For example, the following SAX callback will retain all nodes, effectively simulating a normal in-memory load:

    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      if (event != MXML_SAX_ELEMENT_CLOSE)
        mxmlRetain(node);
    }

More typically the SAX callback will only retain a small portion of the document that is needed for post-processing. For example, the following SAX callback will retain the title and headings in an XHTML file. It also retains the (parent) elements like <html>, <head>, and <body>, and processing directives like <?xml ... ?> and <!DOCTYPE ... >:

    void
    sax_cb(mxml_node_t *node,
           mxml_sax_event_t event,
           void *data)
    {
      if (event == MXML_SAX_ELEMENT_OPEN)
      {
       /*
        * Retain headings and titles...
        */

        char *name = node->value.element.name;

        if (!strcmp(name, "html") ||
            !strcmp(name, "head") ||
            !strcmp(name, "title") ||
            !strcmp(name, "body") ||
            !strcmp(name, "h1") ||
            !strcmp(name, "h2") ||
            !strcmp(name, "h3") ||
            !strcmp(name, "h4") ||
            !strcmp(name, "h5") ||
            !strcmp(name, "h6"))
          mxmlRetain(node);
      }
      else if (event == MXML_SAX_DIRECTIVE)
        mxmlRetain(node);
      else if (event == MXML_SAX_DATA &&
               node->parent->ref_count > 1)
      {
       /*
        * If the parent was retained, then retain
        * this data node as well.
        */

        mxmlRetain(node);
      }
    }

The resulting skeleton document tree can then be searched just like one loaded using the mxmlLoad functions. For example, a filter that reads an XHTML document from stdin and then shows the title and headings in the document would look like:

    mxml_node_t *doc, *title, *body, *heading;

    doc = mxmlSAXLoadFd(NULL, 0,
                        MXML_TEXT_CALLBACK,
                        sax_cb, NULL);

    title = mxmlFindElement(doc, doc, "title",
                            NULL, NULL,
                            MXML_DESCEND);

    if (title)
      print_children(title);

    body = mxmlFindElement(doc, doc, "body",
                           NULL, NULL,
                           MXML_DESCEND);

    if (body)
    {
      for (heading = body->child;
           heading;
           heading = heading->next)
        print_children(heading);
    }

Contents Previous Next