cpp-peglib/README.md

cpp-peglib
==========

C++11 header-only [PEG](http://en.wikipedia.org/wiki/Parsing_expression_grammar) (Parsing Expression Grammars) library.

*cpp-peglib* tries to provide more expressive parsing experience in a simple way. This library depends on only one header file. So, you can start using it right away just by including `peglib.h` in your project.

The PEG syntax is well described on page 2 in the [document](http://www.brynosaurus.com/pub/lang/peg.pdf). *cpp-peglib* also supports the following additional syntax for now:

  * `<` ... `>` (Token boundary operator)
  * `~` (Ignore operator)
  * `\x20` (Hex number char)
  * `$<` ... `>` (Capture operator)
  * `$name<` ... `>` (Named capture operator)

This library also supports the linear-time parsing known as the [*Packrat*](http://pdos.csail.mit.edu/~baford/packrat/thesis/thesis.pdf) parsing.

How to use
----------

This is a simple calculator sample. It shows how to define grammar, associate samantic actions to the grammar and handle semantic values.

```cpp
// (1) Include the header file
#include <peglib.h>
#include <assert.h>

using namespace peg;
using namespace std;

int main(void) {
    // (2) Make a parser
    auto syntax = R"(
        # Grammar for Calculator...
        Additive  <- Multitive '+' Additive / Multitive
        Multitive <- Primary '*' Multitive / Primary
        Primary   <- '(' Additive ')' / Number
        Number    <- [0-9]+
    )";

    parser parser(syntax);

    // (3) Setup an action
    parser["Additive"] = [](const SemanticValues& sv) {
        switch (sv.choice) {
        case 0:  // "Multitive '+' Additive"
            return sv[0].get<int>() + sv[1].get<int>();
        default: // "Multitive"
            return sv[0].get<int>();
        }
    };

    parser["Multitive"] = [](const SemanticValues& sv) {
        switch (sv.choice) {
        case 0:  // "Primary '*' Multitive"
            return sv[0].get<int>() * sv[1].get<int>();
        default: // "Primary"
            return sv[0].get<int>();
        }
    };

    parser["Number"] = [](const SemanticValues& sv) {
        return stoi(sv.str(), nullptr, 10);
    };

    // (4) Parse
    parser.packrat_parsing(); // Enable packrat parsing.

    int val;
    parser.parse("(1+2)*3", val);

    assert(val == 9);
}
```

Here are available actions:

```cpp
[](const SemanticValues& sv, any& dt)
[](const SemanticValues& sv)
```

`const SemanticValues& sv` contains semantic values. `SemanticValues` structure is defined as follows.

```cpp
struct SemanticValue {
    any         val;  // Semantic value
    const char* name; // Definition name for the sematic value
    const char* s;    // Token start for the semantic value
    size_t      n;    // Token length for the semantic value

    // Cast semantic value
    template <typename T> T& get();
    template <typename T> const T& get() const;

    // Get token
    std::string str() const;
};

struct SemanticValues : protected std::vector<SemanticValue>
{
    const char* s;      // Token start
    size_t      n;      // Token length
    size_t      choice; // Choice number (0 based index)

    // Get token
    std::string str() const;

    // Transform the semantic value vector to another vector
    template <typename T> vector<T> transform(size_t beg = 0, size_t end = -1) const;
}
```

`peg::any` class is very similar to [boost::any](http://www.boost.org/doc/libs/1_57_0/doc/html/any.html). You can obtain a value by castning it to the actual type. In order to determine the actual type, you have to check the return value type of the child action for the semantic value.

`const char* s, size_t n` gives a pointer and length of the matched string. This is same as `sv.s` and `sv.n`.

`any& dt` is a data object which can be used by the user for whatever purposes.

The following example uses `<` ... ` >` operators. They are the *token boundary* operators. Each token boundary operator creates a semantic value that contains `const char*` of the position. It could be useful to eliminate unnecessary characters.

```cpp
auto syntax = R"(
    ROOT  <- _ TOKEN (',' _ TOKEN)*
    TOKEN <- < [a-z0-9]+ > _
    _     <- [ \t\r\n]*
)";

peg pg(syntax);

pg["TOKEN"] = [](const SemanticValues& sv) {
    // 'token' doesn't include trailing whitespaces
    auto token = sv.str();
};

auto ret = pg.parse(" token1, token2 ");
```

We can ignore unnecessary semantic values from the list by using `~` operator.

```cpp
peg::pegparser parser(
    "  ROOT  <-  _ ITEM (',' _ ITEM _)*  "
    "  ITEM  <-  ([a-z])+                "
    "  ~_    <-  [ \t]*                  "
);

parser["ROOT"] = [&](const SemanticValues& sv) {
    assert(sv.size() == 2); // should be 2 instead of 5.
};

auto ret = parser.parse(" item1, item2 ");
```

The following grammar is same as the above.

```cpp
peg::parser parser(
    "  ROOT  <-  ~_ ITEM (',' ~_ ITEM ~_)*  "
    "  ITEM  <-  ([a-z])+                   "
    "  _     <-  [ \t]*                     "
);
```

*Semantic predicate* support is available. We can do it by throwing a `peg::parse_error` exception in a semantic action.

```cpp
peg::parser parser("NUMBER  <-  [0-9]+");

parser["NUMBER"] = [](const SemanticValues& sv) {
    auto val = stol(sv.str(), nullptr, 10);
    if (val != 100) {
        throw peg::parse_error("value error!!");
    }
    return val;
};

long val;
auto ret = parser.parse("100", val);
assert(ret == true);
assert(val == 100);

ret = parser.parse("200", val);
assert(ret == false);
```

*before* and *after* actions are also avalable.

```cpp
parser["RULE"].before = [](any& dt) {
    std::cout << "before" << std::cout;
};

parser["RULE"] = [](const SemanticValues& sv, any& dt) {
    std::cout << "action!" << std::cout;
};

parser["RULE"].after = [](any& dt) {
    std::cout << "after" << std::cout;
};
```

Simple interface
----------------

*cpp-peglib* provides std::regex-like simple interface for trivial tasks.

`peg::peg_match` tries to capture strings in the `$< ... >` operator and store them into `peg::match` object.

```cpp
peg::match m;

auto ret = peg::peg_match(
    R"(
        ROOT      <-  _ ('[' $< TAG_NAME > ']' _)*
        TAG_NAME  <-  (!']' .)+
        _         <-  [ \t]*
    )",
    " [tag1] [tag:2] [tag-3] ",
    m);

assert(ret == true);
assert(m.size() == 4);
assert(m.str(1) == "tag1");
assert(m.str(2) == "tag:2");
assert(m.str(3) == "tag-3");
```

It also supports named capture with the `$name<` ... `>` operator.

```cpp
peg::match m;

auto ret = peg::peg_match(
    R"(
        ROOT      <-  _ ('[' $test< TAG_NAME > ']' _)*
        TAG_NAME  <-  (!']' .)+
        _         <-  [ \t]*
    )",
    " [tag1] [tag:2] [tag-3] ",
    m);

auto cap = m.named_capture("test");

REQUIRE(ret == true);
REQUIRE(m.size() == 4);
REQUIRE(cap.size() == 3);
REQUIRE(m.str(cap[2]) == "tag-3");
```

There are some ways to *search* a peg pattern in a document.

```cpp
using namespace peg;

auto syntax = R"(
    ROOT <- '[' $< [a-z0-9]+ > ']'
)";

auto s = " [tag1] [tag2] [tag3] ";

// peg::peg_search
parser pg(syntax);
size_t pos = 0;
auto n = strlen(s);
match m;
while (peg_search(pg, s + pos, n - pos, m)) {
    cout << m.str()  << endl; // entire match
    cout << m.str(1) << endl; // submatch #1
    pos += m.length();
}

// peg::peg_token_iterator
peg_token_iterator it(syntax, s);
while (it != peg_token_iterator()) {
    cout << it->str()  << endl; // entire match
    cout << it->str(1) << endl; // submatch #1
    ++it;
}

// peg::peg_token_range
for (auto& m: peg_token_range(syntax, s)) {
    cout << m.str()  << endl; // entire match
    cout << m.str(1) << endl; // submatch #1
}
```

Make a parser with parser combinators
-------------------------------------

Instead of makeing a parser by parsing PEG syntax text, we can also construct a parser by hand with *parser combinatorss*. Here is an example:

```cpp
using namespace peg;
using namespace std;

vector<string> tags;

Definition ROOT, TAG_NAME, _;
ROOT     <= seq(_, zom(seq(chr('['), TAG_NAME, chr(']'), _)));
TAG_NAME <= oom(seq(npd(chr(']')), dot())), [&](const SemanticValues& sv) {
                tags.push_back(sv.str());
            };
_        <= zom(cls(" \t"));

auto ret = ROOT.parse(" [tag1] [tag:2] [tag-3] ");
```

The following are available operators:

| Operator |     Description       |
| :------- | :-------------------- |
| seq      | Sequence              |
| cho      | Prioritized Choice    |
| zom      | Zero or More          |
| oom      | One or More           |
| opt      | Optional              |
| apd      | And predicate         |
| npd      | Not predicate         |
| lit      | Literal string        |
| cls      | Character class       |
| chr      | Character             |
| dot      | Any character         |
| tok      | Token boundary        |
| ign      | Ignore semantic value |
| cap      | Capture character     |
| usr      | User defiend parser   |

Adjust definitions
------------------

It's possible to add/override definitions.

```cpp
auto syntax = R"(
    ROOT <- _ 'Hello' _ NAME '!' _
)";

Rules additional_rules = {
    {
        "NAME", usr([](const char* s, size_t n, SemanticValues& sv, any& c) -> size_t {
            static vector<string> names = { "PEG", "BNF" };
            for (const auto& name: names) {
                if (name.size() <= n && !name.compare(0, name.size(), s, name.size())) {
                    return name.size(); // processed length
                }
            }
            return -1; // parse error
        })
    },
    {
        "~_", zom(cls(" \t\r\n"))
    }
};

auto g = parser(syntax, additional_rules);

assert(g.parse(" Hello BNF! "));
```

Unicode support
---------------

Since cpp-peglib only accepts 8 bits characters, it probably accepts UTF-8 text. But `.` matches only a byte, not a Unicode character. Also, it dosn't support `\u????`. 

Sample codes
------------

  * [Calculator](https://github.com/yhirose/cpp-peglib/blob/master/example/calc.cc)
  * [Calculator (with parser operators)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc2.cc)
  * [Calculator (AST version)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc3.cc)
  * [PEG syntax Lint utility](https://github.com/yhirose/cpp-peglib/blob/master/lint/cmdline/peglint.cc)
  * [PL/0 Interpreter](https://github.com/yhirose/cpp-peglib/blob/master/language/pl0/pl0.cc)

Tested compilers
----------------

  * Visual Studio 2015
  * Visual Studio 2013 with Update 5
  * Clang 3.5

TODO
----

  * ٍSemantic predicate (`&{ expr }` and `!{ expr }`)
  * Unicode support (`.` matches a Unicode char. `\u????`, `\p{L}`)
  * Ignore white spaces after string literals and tokens
  * Allow `←` and `ε`

License
-------

MIT license (© 2015 Yuji Hirose)
-												Uploaded files.

											
										
										
											9 years ago
+								cpp-peglib
 								==========
 								C++11 header-only [PEG](http://en.wikipedia.org/wiki/Parsing_expression_grammar) (Parsing Expression Grammars) library.
-												Updated documentation.

											
										
										
											9 years ago
+								*cpp-peglib* tries to provide more expressive parsing experience in a simple way. This library depends on only one header file. So, you can start using it right away just by including `peglib.h` in your project.
-												Uploaded files.

											
										
										
											9 years ago
-												Updated README.

											
										
										
											9 years ago
+								The PEG syntax is well described on page 2 in the [document](http://www.brynosaurus.com/pub/lang/peg.pdf). *cpp-peglib* also supports the following additional syntax for now:
-												Updated documentation.

											
										
										
											9 years ago
-												Changed namespace/class names.

											
										
										
											9 years ago
+								  * `<` ... `>` (Token boundary operator)
-												Added 'ignore' operator.

											
										
										
											9 years ago
+								  * `~` (Ignore operator)
-												Updated README.

											
										
										
											9 years ago
+								  * `\x20` (Hex number char)
-												Updated README.

											
										
										
											9 years ago
+								  * `$<` ... `>` (Capture operator)
 								  * `$name<` ... `>` (Named capture operator)
-												Uploaded files.

											
										
										
											9 years ago
-												Updated public interface.

											
										
										
											9 years ago
+								This library also supports the linear-time parsing known as the [*Packrat*](http://pdos.csail.mit.edu/~baford/packrat/thesis/thesis.pdf) parsing.
-												Moved 'choice' property to SemanticValues.

											
										
										
											9 years ago
-												Uploaded files.

											
										
										
											9 years ago
+								How to use
 								----------
-												Added simple interface.

											
										
										
											9 years ago
+								This is a simple calculator sample. It shows how to define grammar, associate samantic actions to the grammar and handle semantic values.
-												Uploaded files.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Added simple interface.

											
										
										
											9 years ago
+								// (1) Include the header file
-												Updated README.

											
										
										
											9 years ago
+								#include <peglib.h>
-												Updated documentation and examples.

											
										
										
											9 years ago
+								#include <assert.h>
-												Updated README.

											
										
										
											9 years ago
-												Changed namespace/class names.

											
										
										
											9 years ago
+								using namespace peg;
-												Updated README.

											
										
										
											9 years ago
+								using namespace std;
 								int main(void) {
-												Added simple interface.

											
										
										
											9 years ago
+								    // (2) Make a parser
-												Updated documentation.

											
										
										
											9 years ago
+								    auto syntax = R"(
 								        # Grammar for Calculator...
 								        Additive  <- Multitive '+' Additive / Multitive
 								        Multitive <- Primary '*' Multitive / Primary
 								        Primary   <- '(' Additive ')' / Number
 								        Number    <- [0-9]+
 								    )";
-												Changed namespace/class names.

											
										
										
											9 years ago
+								    parser parser(syntax);
-												Updated documentation.

											
										
										
											9 years ago
-												Added simple interface.

											
										
										
											9 years ago
+								    // (3) Setup an action
-												Simplefied code.

											
										
										
											9 years ago
+								    parser["Additive"] = [](const SemanticValues& sv) {
 								        switch (sv.choice) {
 								        case 0:  // "Multitive '+' Additive"
 								            return sv[0].get<int>() + sv[1].get<int>();
 								        default: // "Multitive"
 								            return sv[0].get<int>();
 								        }
-												Updated documentation.

											
										
										
											9 years ago
+								    };
-												Moved 'choice' property to SemanticValues.

											
										
										
											9 years ago
+								    parser["Multitive"] = [](const SemanticValues& sv) {
 								        switch (sv.choice) {
-												Simplefied code.

											
										
										
											9 years ago
+								        case 0:  // "Primary '*' Multitive"
-												Name refactoring.

											
										
										
											9 years ago
+								            return sv[0].get<int>() * sv[1].get<int>();
-												Simplefied code.

											
										
										
											9 years ago
+								        default: // "Primary"
-												Name refactoring.

											
										
										
											9 years ago
+								            return sv[0].get<int>();
-												Moved 'choice' property to SemanticValues.

											
										
										
											9 years ago
+								        }
-												Updated documentation.

											
										
										
											9 years ago
+								    };
-												Simplefiled API.

											
										
										
											9 years ago
+								    parser["Number"] = [](const SemanticValues& sv) {
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								        return stoi(sv.str(), nullptr, 10);
-												Updated documentation.

											
										
										
											9 years ago
+								    };
-												Added simple interface.

											
										
										
											9 years ago
+								    // (4) Parse
-												Simplefied code.

											
										
										
											9 years ago
+								    parser.packrat_parsing(); // Enable packrat parsing.
-												Updated documentation and examples.

											
										
										
											9 years ago
-												Updated documentation.

											
										
										
											9 years ago
+								    int val;
-												Name refactoring.

											
										
										
											9 years ago
+								    parser.parse("(1+2)*3", val);
-												Updated documentation.

											
										
										
											9 years ago
-												Fixed sample.

											
										
										
											9 years ago
+								    assert(val == 9);
-												Updated README.

											
										
										
											9 years ago
+								}
 								```
-												Uploaded files.

											
										
										
											9 years ago
-												Simplefiled API.

											
										
										
											9 years ago
+								Here are available actions:
-												Added simple interface.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								[](const SemanticValues& sv, any& dt)
 								[](const SemanticValues& sv)
-												Added simple interface.

											
										
										
											9 years ago
+								```
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								`const SemanticValues& sv` contains semantic values. `SemanticValues` structure is defined as follows.
-												Added 'const SemanticValues&` action.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								struct SemanticValue {
-												Corrected README.

											
										
										
											9 years ago
+								    any         val;  // Semantic value
 								    const char* name; // Definition name for the sematic value
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								    const char* s;    // Token start for the semantic value
-												Name refactoring.

											
										
										
											9 years ago
+								    size_t      n;    // Token length for the semantic value
-												Updated documentation.

											
										
										
											9 years ago
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								    // Cast semantic value
-												Updated documentation.

											
										
										
											9 years ago
+								    template <typename T> T& get();
 								    template <typename T> const T& get() const;
-												Added str() in SemanticValue.

											
										
										
											9 years ago
 								    // Get token
 								    std::string str() const;
-												Added 'const SemanticValues&` action.

											
										
										
											9 years ago
+								};
-												Changed the semantic values interface.

											
										
										
											9 years ago
 								struct SemanticValues : protected std::vector<SemanticValue>
 								{
-												Moved 'choice' property to SemanticValues.

											
										
										
											9 years ago
+								    const char* s;      // Token start
-												Name refactoring.

											
										
										
											9 years ago
+								    size_t      n;      // Token length
-												Moved 'choice' property to SemanticValues.

											
										
										
											9 years ago
+								    size_t      choice; // Choice number (0 based index)
-												Updated documentation.

											
										
										
											9 years ago
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								    // Get token
 								    std::string str() const;
-												Updated documentation.

											
										
										
											9 years ago
+								    // Transform the semantic value vector to another vector
-												Corrected README.

											
										
										
											9 years ago
+								    template <typename T> vector<T> transform(size_t beg = 0, size_t end = -1) const;
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								}
-												Added 'const SemanticValues&` action.

											
										
										
											9 years ago
+								```
-												Changed namespace/class names.

											
										
										
											9 years ago
+								`peg::any` class is very similar to [boost::any](http://www.boost.org/doc/libs/1_57_0/doc/html/any.html). You can obtain a value by castning it to the actual type. In order to determine the actual type, you have to check the return value type of the child action for the semantic value.
-												Changed the semantic values interface.

											
										
										
											9 years ago
-												Name refactoring.

											
										
										
											9 years ago
+								`const char* s, size_t n` gives a pointer and length of the matched string. This is same as `sv.s` and `sv.n`.
-												Changed the semantic values interface.

											
										
										
											9 years ago
 								`any& dt` is a data object which can be used by the user for whatever purposes.
-												Changed namespace/class names.

											
										
										
											9 years ago
+								The following example uses `<` ... ` >` operators. They are the *token boundary* operators. Each token boundary operator creates a semantic value that contains `const char*` of the position. It could be useful to eliminate unnecessary characters.
-												Added 'anchor' support. Removed implecit cast operators from 'any'.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Added 'anchor' support. Removed implecit cast operators from 'any'.

											
										
										
											9 years ago
+								auto syntax = R"(
 								    ROOT  <- _ TOKEN (',' _ TOKEN)*
 								    TOKEN <- < [a-z0-9]+ > _
 								    _     <- [ \t\r\n]*
 								)";
 								peg pg(syntax);
-												Simplefiled API.

											
										
										
											9 years ago
+								pg["TOKEN"] = [](const SemanticValues& sv) {
-												Changed the capture operator and made the anchor operator.

											
										
										
											9 years ago
+								    // 'token' doesn't include trailing whitespaces
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								    auto token = sv.str();
-												Added 'anchor' support. Removed implecit cast operators from 'any'.

											
										
										
											9 years ago
+								};
 								auto ret = pg.parse(" token1, token2 ");
 								```
-												Added 'ignore' operator.

											
										
										
											9 years ago
+								We can ignore unnecessary semantic values from the list by using `~` operator.
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								peg::pegparser parser(
-												Added more information about the ignore operator.

											
										
										
											9 years ago
+								    "  ROOT  <-  _ ITEM (',' _ ITEM _)*  "
 								    "  ITEM  <-  ([a-z])+                "
 								    "  ~_    <-  [ \t]*                  "
-												Added 'ignore' operator.

											
										
										
											9 years ago
+								);
-												Changed the semantic values interface.

											
										
										
											9 years ago
+								parser["ROOT"] = [&](const SemanticValues& sv) {
 								    assert(sv.size() == 2); // should be 2 instead of 5.
-												Added 'ignore' operator.

											
										
										
											9 years ago
+								};
 								auto ret = parser.parse(" item1, item2 ");
 								```
-												Added more information about the ignore operator.

											
										
										
											9 years ago
+								The following grammar is same as the above.
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								peg::parser parser(
-												Added more information about the ignore operator.

											
										
										
											9 years ago
+								    "  ROOT  <-  ~_ ITEM (',' ~_ ITEM ~_)*  "
 								    "  ITEM  <-  ([a-z])+                   "
 								    "  _     <-  [ \t]*                     "
 								);
 								```
-												Changed namespace/class names.

											
										
										
											9 years ago
+								*Semantic predicate* support is available. We can do it by throwing a `peg::parse_error` exception in a semantic action.
-												Added semantic predicate support.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								peg::parser parser("NUMBER  <-  [0-9]+");
-												Added semantic predicate support.

											
										
										
											9 years ago
-												Simplefiled API.

											
										
										
											9 years ago
+								parser["NUMBER"] = [](const SemanticValues& sv) {
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								    auto val = stol(sv.str(), nullptr, 10);
-												Added semantic predicate support.

											
										
										
											9 years ago
+								    if (val != 100) {
-												Changed namespace/class names.

											
										
										
											9 years ago
+								        throw peg::parse_error("value error!!");
-												Added semantic predicate support.

											
										
										
											9 years ago
+								    }
 								    return val;
 								};
 								long val;
 								auto ret = parser.parse("100", val);
 								assert(ret == true);
 								assert(val == 100);
 								ret = parser.parse("200", val);
 								assert(ret == false);
 								```
-												Fixed README.

											
										
										
											9 years ago
+								*before* and *after* actions are also avalable.
 								```cpp
 								parser["RULE"].before = [](any& dt) {
 								    std::cout << "before" << std::cout;
 								};
 								parser["RULE"] = [](const SemanticValues& sv, any& dt) {
 								    std::cout << "action!" << std::cout;
 								};
 								parser["RULE"].after = [](any& dt) {
 								    std::cout << "after" << std::cout;
 								};
 								```
-												Added simple interface.

											
										
										
											9 years ago
+								Simple interface
 								----------------
 								*cpp-peglib* provides std::regex-like simple interface for trivial tasks.
-												Changed namespace/class names.

											
										
										
											9 years ago
+								`peg::peg_match` tries to capture strings in the `$< ... >` operator and store them into `peg::match` object.
-												Added simple interface.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								peg::match m;
-												Added the named capture explanation in README.

											
										
										
											9 years ago
-												Changed namespace/class names.

											
										
										
											9 years ago
+								auto ret = peg::peg_match(
-												Added simple interface.

											
										
										
											9 years ago
+								    R"(
-												Changed the capture operator and made the anchor operator.

											
										
										
											9 years ago
+								        ROOT      <-  _ ('[' $< TAG_NAME > ']' _)*
-												Added simple interface.

											
										
										
											9 years ago
+								        TAG_NAME  <-  (!']' .)+
 								        _         <-  [ \t]*
 								    )",
 								    " [tag1] [tag:2] [tag-3] ",
 								    m);
 								assert(ret == true);
 								assert(m.size() == 4);
 								assert(m.str(1) == "tag1");
 								assert(m.str(2) == "tag:2");
 								assert(m.str(3) == "tag-3");
 								```
-												Added the named capture explanation in README.

											
										
										
											9 years ago
+								It also supports named capture with the `$name<` ... `>` operator.
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								peg::match m;
-												Added the named capture explanation in README.

											
										
										
											9 years ago
-												Changed namespace/class names.

											
										
										
											9 years ago
+								auto ret = peg::peg_match(
-												Updated README.

											
										
										
											9 years ago
+								    R"(
 								        ROOT      <-  _ ('[' $test< TAG_NAME > ']' _)*
 								        TAG_NAME  <-  (!']' .)+
 								        _         <-  [ \t]*
 								    )",
-												Added the named capture explanation in README.

											
										
										
											9 years ago
+								    " [tag1] [tag:2] [tag-3] ",
 								    m);
 								auto cap = m.named_capture("test");
 								REQUIRE(ret == true);
 								REQUIRE(m.size() == 4);
 								REQUIRE(cap.size() == 3);
 								REQUIRE(m.str(cap[2]) == "tag-3");
 								```
-												Added simple interface.

											
										
										
											9 years ago
+								There are some ways to *search* a peg pattern in a document.
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								using namespace peg;
-												Added simple interface.

											
										
										
											9 years ago
 								auto syntax = R"(
-												Updated README.

											
										
										
											9 years ago
+								    ROOT <- '[' $< [a-z0-9]+ > ']'
-												Added simple interface.

											
										
										
											9 years ago
+								)";
 								auto s = " [tag1] [tag2] [tag3] ";
-												Changed namespace/class names.

											
										
										
											9 years ago
+								// peg::peg_search
 								parser pg(syntax);
-												Added simple interface.

											
										
										
											9 years ago
+								size_t pos = 0;
-												Name refactoring.

											
										
										
											9 years ago
+								auto n = strlen(s);
-												Added simple interface.

											
										
										
											9 years ago
+								match m;
-												Name refactoring.

											
										
										
											9 years ago
+								while (peg_search(pg, s + pos, n - pos, m)) {
-												Updated README.

											
										
										
											9 years ago
+								    cout << m.str()  << endl; // entire match
 								    cout << m.str(1) << endl; // submatch #1
 								    pos += m.length();
-												Added simple interface.

											
										
										
											9 years ago
+								}
-												Changed namespace/class names.

											
										
										
											9 years ago
+								// peg::peg_token_iterator
-												Added simple interface.

											
										
										
											9 years ago
+								peg_token_iterator it(syntax, s);
 								while (it != peg_token_iterator()) {
-												Updated README.

											
										
										
											9 years ago
+								    cout << it->str()  << endl; // entire match
 								    cout << it->str(1) << endl; // submatch #1
 								    ++it;
-												Added simple interface.

											
										
										
											9 years ago
+								}
-												Changed namespace/class names.

											
										
										
											9 years ago
+								// peg::peg_token_range
-												Added simple interface.

											
										
										
											9 years ago
+								for (auto& m: peg_token_range(syntax, s)) {
-												Updated README.

											
										
										
											9 years ago
+								    cout << m.str()  << endl; // entire match
 								    cout << m.str(1) << endl; // submatch #1
-												Added simple interface.

											
										
										
											9 years ago
+								}
 								```
-												Updated README.

											
										
										
											9 years ago
+								Make a parser with parser combinators
 								-------------------------------------
-												Uploaded files.

											
										
										
											9 years ago
-												Updated README.

											
										
										
											9 years ago
+								Instead of makeing a parser by parsing PEG syntax text, we can also construct a parser by hand with *parser combinatorss*. Here is an example:
-												Uploaded files.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Changed namespace/class names.

											
										
										
											9 years ago
+								using namespace peg;
-												Uploaded files.

											
										
										
											9 years ago
+								using namespace std;
-												Major refactoring.

											
										
										
											9 years ago
+								vector<string> tags;
-												Corrected documentation.

											
										
										
											9 years ago
+								Definition ROOT, TAG_NAME, _;
-												Fixed documentation.

											
										
										
											9 years ago
+								ROOT     <= seq(_, zom(seq(chr('['), TAG_NAME, chr(']'), _)));
-												Simplefiled API.

											
										
										
											9 years ago
+								TAG_NAME <= oom(seq(npd(chr(']')), dot())), [&](const SemanticValues& sv) {
-												Added str() in SemanticValues.

											
										
										
											9 years ago
+								                tags.push_back(sv.str());
-												Fixed documentation.

											
										
										
											9 years ago
+								            };
 								_        <= zom(cls(" \t"));
-												Uploaded files.

											
										
										
											9 years ago
 								auto ret = ROOT.parse(" [tag1] [tag:2] [tag-3] ");
 								```
 								The following are available operators:
-												Updated README.

											
										
										
											9 years ago
+								| Operator |     Description       |
 								| :------- | :-------------------- |
 								| seq      | Sequence              |
 								| cho      | Prioritized Choice    |
 								| zom      | Zero or More          |
 								| oom      | One or More           |
 								| opt      | Optional              |
 								| apd      | And predicate         |
 								| npd      | Not predicate         |
 								| lit      | Literal string        |
 								| cls      | Character class       |
 								| chr      | Character             |
 								| dot      | Any character         |
-												Changed namespace/class names.

											
										
										
											9 years ago
+								| tok      | Token boundary        |
-												Fixed typo.

											
										
										
											9 years ago
+								| ign      | Ignore semantic value |
-												Updated README.

											
										
										
											9 years ago
+								| cap      | Capture character     |
 								| usr      | User defiend parser   |
-												Added 'usr' operator.

											
										
										
											9 years ago
-												Modified documentation and the calc sample.

											
										
										
											9 years ago
+								Adjust definitions
 								------------------
-												Added 'usr' operator.

											
										
										
											9 years ago
-												Updated README.

											
										
										
											9 years ago
+								It's possible to add/override definitions.
-												Added 'usr' operator.

											
										
										
											9 years ago
-												Fixed README.

											
										
										
											9 years ago
+								```cpp
-												Added 'usr' operator.

											
										
										
											9 years ago
+								auto syntax = R"(
 								    ROOT <- _ 'Hello' _ NAME '!' _
 								)";
-												Updated README.

											
										
										
											9 years ago
+								Rules additional_rules = {
-												Added 'usr' operator.

											
										
										
											9 years ago
+								    {
-												Fixed User rule problem.

											
										
										
											9 years ago
+								        "NAME", usr([](const char* s, size_t n, SemanticValues& sv, any& c) -> size_t {
-												Added 'usr' operator.

											
										
										
											9 years ago
+								            static vector<string> names = { "PEG", "BNF" };
-												Fixed User rule problem.

											
										
										
											9 years ago
+								            for (const auto& name: names) {
 								                if (name.size() <= n && !name.compare(0, name.size(), s, name.size())) {
-												Updated README.

											
										
										
											9 years ago
+								                    return name.size(); // processed length
-												Added 'usr' operator.

											
										
										
											9 years ago
+								                }
 								            }
-												Updated README.

											
										
										
											9 years ago
+								            return -1; // parse error
-												Added 'usr' operator.

											
										
										
											9 years ago
+								        })
 								    },
 								    {
 								        "~_", zom(cls(" \t\r\n"))
 								    }
 								};
-												Changed namespace/class names.

											
										
										
											9 years ago
+								auto g = parser(syntax, additional_rules);
-												Added 'usr' operator.

											
										
										
											9 years ago
 								assert(g.parse(" Hello BNF! "));
 								```
-												Uploaded files.

											
										
										
											9 years ago
-												Updated README.

											
										
										
											9 years ago
+								Unicode support
 								---------------
 								Since cpp-peglib only accepts 8 bits characters, it probably accepts UTF-8 text. But `.` matches only a byte, not a Unicode character. Also, it dosn't support `\u????`.
-												Corrected README.

											
										
										
											9 years ago
+								Sample codes
 								------------
 								  * [Calculator](https://github.com/yhirose/cpp-peglib/blob/master/example/calc.cc)
-												Updated documentation.

											
										
										
											9 years ago
+								  * [Calculator (with parser operators)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc2.cc)
 								  * [Calculator (AST version)](https://github.com/yhirose/cpp-peglib/blob/master/example/calc3.cc)
-												Updated README.

											
										
										
											9 years ago
+								  * [PEG syntax Lint utility](https://github.com/yhirose/cpp-peglib/blob/master/lint/cmdline/peglint.cc)
 								  * [PL/0 Interpreter](https://github.com/yhirose/cpp-peglib/blob/master/language/pl0/pl0.cc)
-												Corrected README.

											
										
										
											9 years ago
-												Updated README.

											
										
										
											9 years ago
+								Tested compilers
-												Uploaded files.

											
										
										
											9 years ago
+								----------------
-												Updated README.

											
										
										
											9 years ago
+								  * Visual Studio 2015
-												Update README.md

VS2013 works just fine
											
										
										
											9 years ago
+								  * Visual Studio 2013 with Update 5
-												Uploaded files.

											
										
										
											9 years ago
+								  * Clang 3.5
 								TODO
 								----
-												Updated README.

											
										
										
											9 years ago
+								  * ٍSemantic predicate (`&{ expr }` and `!{ expr }`)
 								  * Unicode support (`.` matches a Unicode char. `\u????`, `\p{L}`)
 								  * Ignore white spaces after string literals and tokens
 								  * Allow `←` and `ε`
-												Uploaded files.

											
										
										
											9 years ago
 								License
 								-------
 								MIT license (© 2015 Yuji Hirose)