You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
yhirose c01c780044 Updated README. 9 years ago
example Major refactoring. 9 years ago
test Changed to allow simpler notation in actions. 9 years ago
.gitignore Uploaded files. 9 years ago
LICENSE Initial commit 9 years ago
README.md Updated README. 9 years ago
peglib.h Changed to allow simpler notation in actions. 9 years ago

README.md

cpp-peglib

C++11 header-only PEG (Parsing Expression Grammars) library.

cpp-peglib tries to provide more expressive parsing experience than common regular expression libraries such as std::regex. This library depends on only one header file. So, you can start using it right away just by including peglib.h in your project.

The PEG syntax is well described on page 2 in the document.

How to use

What if we want to extract only tag names in brackets from [tag1] [tag2] [tag3] [tag4]...? It's a bit hard to do it with std::regex, since it doesn't support Repeated Captures. PEG can, however, handle the repetition pretty easily.

PEG grammar for this task could be like this:

ROOT      <-  _ ('[' TAG_NAME ']' _)*
TAG_NAME  <-  (!']' .)+
_         <-  [ \t]*

Here is how to parse text with the PEG syntax and retrieve tag names:

// (1) Include the header file
#include "peglib.h"

// (2) Make a parser
auto parser = peglib::make_parser(R"(
    ROOT      <-  _ ('[' TAG_NAME ']' _)*
    TAG_NAME  <-  (!']' .)+
    _         <-  [ \t]*
)");

// (3) Setup an action
std::vector<std::string> tags;
parser["TAG_NAME"] = [&](const char* s, size_t l) {
    tags.push_back(std::string(s, l));
};

// (4) Parse
auto ret = parser.parse(" [tag1] [tag:2] [tag-3] ");

assert(ret     == true);
assert(tags[0] == "tag1");
assert(tags[1] == "tag:2");
assert(tags[2] == "tag-3");

You may have a question regarding '(3) Setup an action'. When the parser recognizes the definition 'TAG_NAME', it calls back the action [&](const char* s, size_t l) where const char* s, size_t l refers to the matched string, so that the user could use the string for something else.

We can do more with actions.

#include <peglib.h>
#include <assert.h>

using namespace peglib;
using namespace std;

int main(void) {
  auto syntax = R"(
  # Grammar for Calculator...
  Additive  <- Multitive '+' Additive / Multitive
  Multitive <- Primary '*' Multitive / Primary
  Primary   <- '(' Additive ')' / Number
  Number    <- [0-9]+
  )";

  auto parser = make_parser(syntax);

  parser["Additive"] = {
    nullptr, // Default action
    [](const vector<Any>& v) { return (int)v[0] + (int)v[1]; }, // For 1st choice
    [](const vector<Any>& v) { return v[0]; } // For 2nd choice
  };
  parser["Multitive"] = {
    nullptr,
    [](const vector<Any>& v) { return (int)v[0] * (int)v[1]; },
    [](const vector<Any>& v) { return v[0]; }
  };
  parser["Primary"] = [](const vector<Any>& v) { return v.size() == 1 ? v[0] : v[1]; };
  parser["Number"] = [](const char* s, size_t l) { return stoi(string(s, l), nullptr, 10); };

  int val;
  parser.parse("1+2*3", val);

  assert(val == 7);
}

A more complex example is here:

// Calculator example
using namespace peglib;
using namespace std;

auto parser = make_parser(R"(
    # Grammar for Calculator...
    EXPRESSION       <-  TERM (TERM_OPERATOR TERM)*
    TERM             <-  FACTOR (FACTOR_OPERATOR FACTOR)*
    FACTOR           <-  NUMBER / '(' EXPRESSION ')'
    TERM_OPERATOR    <-  [-+]
    FACTOR_OPERATOR  <-  [/*]
    NUMBER           <-  [0-9]+
)");

auto reduce = [](const vector<Any>& v) -> long {
    long ret = v[0].get<long>();
    for (auto i = 1u; i < v.size(); i += 2) {
        auto num = v[i + 1].get<long>();
        switch (v[i].get<char>()) {
            case '+': ret += num; break;
            case '-': ret -= num; break;
            case '*': ret *= num; break;
            case '/': ret /= num; break;
        }
    }
    return ret;
};

parser["EXPRESSION"]      = reduce;
parser["TERM"]            = reduce;
parser["TERM_OPERATOR"]   = [](const char* s, size_t l) { return (char)*s; };
parser["FACTOR_OPERATOR"] = [](const char* s, size_t l) { return (char)*s; };
parser["NUMBER"]          = [](const char* s, size_t l) { return stol(string(s, l), nullptr, 10); };

long val;
auto ret = parser.parse("1+2*3*(4-5+6)/7-8", val);

assert(ret == true);
assert(val == -3);

It may be helpful to keep in mind that the action behavior is similar to the YACC semantic action model ($$, $1, $2, ...).

In this example, the actions return values. These samentic values will be pushed up to the parent definition which can be referred to in the parent action [](const vector<Any>& v). In other words, when a certain definition has been accepted, we can find all semantic values which are associated with the child definitions in const vector<Any>& v. The values are wrapped by peglib::Any class which is like boost::any. We can retrieve the value by using get<T> method where T is the actual type of the value. If no value is returned in an action, an undefined Any will be pushed up to the parent. Finally, the resulting value of the root definition is received in the out parameter of parse method in the parser. long val is the resulting value in this case.

Here are available user actions:

[](const char* s, size_t l, const std::vector<peglib::Any>& v, const std::vector<std::string>& n)
[](const char* s, size_t l, const std::vector<peglib::Any>& v)
[](const char* s, size_t l)
[](const std::vector<peglib::Any>& v, const std::vector<std::string>& n)
[](const std::vector<peglib::Any>& v)
[]()

const std::vector<std::string>& n holds names of child definitions that could be helpful when we want to check what are the actual child definitions.

Make a parser with parser operators

Instead of makeing a parser by parsing PEG syntax text, we can also construct a parser by hand with parser operators. Here is an example:

using namespace peglib;
using namespace std;

vector<string> tags;

Definition ROOT, TAG_NAME, _;
ROOT     = seq(_, zom(seq(chr('['), TAG_NAME, chr(']'), _)));
TAG_NAME = oom(seq(npd(chr(']')), any())), [&](const char* s, size_t l) { tags.push_back(string(s, l)); };
_        = zom(cls(" \t"));

auto ret = ROOT.parse(" [tag1] [tag:2] [tag-3] ");

In fact, the PEG parser generator is made with the parser operators. You can see the code at make_peg_grammar function in peglib.h.

The following are available operators:

Operator Description
seq Sequence
cho Prioritized Choice
grp Grouping
zom Zero or More
oom One or More
opt Optional
apd And predicate
npd Not predicate
lit Literal string
cls Character class
chr Character
any Any character

Sample codes

Tested Compilers

  • Visual Studio 2013
  • Clang 3.5

TODO

  • Linear-time parsing (Packrat parsing)
  • Optimization of grammars
  • Unicode support

Other C++ PEG parser libraries

Thanks to the authors of the libraries that inspired cpp-peglib.

  • Boost Spirit X3 - A set of C++ libraries for parsing and output generation implemented as Domain Specific Embedded Languages (DSEL) using Expression templates and Template Meta-Programming
  • PEGTL - Parsing Expression Grammar Template Library
  • lars::Parser - A header-only linear-time c++ parsing expression grammar (PEG) parser generator supporting left-recursion and grammar ambiguity

License

MIT license (© 2015 Yuji Hirose)