•  


GitHub - ircmaxell/PHP-Yacc: A PHP port of kmyacc
Skip to content

ircmaxell/PHP-Yacc

Repository files navigation

PHP-Yacc

This is a port of kmyacc into PHP. It is a parser-generator, meaning it takes a YACC grammar file and generates a parser file.

A Direct Port (For Now)

Right now, this is a direct port. Meaning that it works exactly like kmyacc . Looking in the examples, you can see that this means that you must supply a "parser template" in addition to the grammar.

Longer term, we want to add simplifying functionality. We will always support providing a template, but we will offer a series of default templates for common use-cases.

What can I do with this?

You can parse most structured and unstructured grammars. There are some gotchas to LALR(1) parsers that you need to be aware of (for example, Shift/Shift conflicts and Shift/Reduce conflicts). But those are beyond this simple intro.

How does it work?

I don't know. I just ported the code until it worked correctly.

YACC Grammar

That's way beyond the scope of this documentation, but checkout The YACC page here for some info.

Over time we will document the grammar more...

How do I use it?

For now, check out the examples folder. The current state of the CLI tool will change, so any usage today should please provide feedback and use-cases so that we can better design the tooling support.

Why did you do this?

Many projects have the need for parsers (and therefore parser-generators). Nikita's PHP-Parser is one tool that uses kmyacc to generate its parser. There are many other projects out there that either use hand-written parsers, or use kmyacc or another parser-generator.

Unfortunately, not many parser-generators exist for PHP. And those that do exist I have found to be rigid or not powerful enough to parse PHP itself.

This project is an aim to resolve that.

Performance

There's a TON of performance optimizations possible here. The original code was a direct port, so some structures are definitely sub-optimal. Over time we will improve the performance.

However, this will always be at least a slightly-slow process. Generating a parser requires a lot of resources, so should never happen inside of a web request.

Using the generated parser however should be quite fast (the generated parser is fairly well optimized already).

What's left to do?

A bunch of things. Here's the wishlist:

  • Refactor to make conventions consistent (some parts currently use camel-case, some parts use snakeCase, etc).
  • Performance tuning
  • Unit test as much as possible
  • Document as much as possible (It's a complicated series of algorithms with no source documentation in either project).
  • Redesign the CLI binary and how it operates
  • Decide whether multi-language support is worth while, or if we should just move to only PHP codegen support.
  • Add default templates and parser implementations
    • At least one of which generates an "AST" by default, similar to Ruby's Treetop library
  • Build a reasonably performant lexer-generator (very likely as a separate project)
  • A lot of debugging (though we don't know of any bugs, they are there)
  • Building out of features we didn't need for the initial go (for example, support for %union , etc).

And a lot more.

Contributing

- "漢字路" 한글한자자동변환 서비스는 교육부 고전문헌국역지원사업의 지원으로 구축되었습니다.
- "漢字路" 한글한자자동변환 서비스는 전통문화연구회 "울산대학교한국어처리연구실 옥철영(IT융합전공)교수팀"에서 개발한 한글한자자동변환기를 바탕하여 지속적으로 공동 연구 개발하고 있는 서비스입니다.
- 현재 고유명사(인명, 지명등)을 비롯한 여러 변환오류가 있으며 이를 해결하고자 많은 연구 개발을 진행하고자 하고 있습니다. 이를 인지하시고 다른 곳에서 인용시 한자 변환 결과를 한번 더 검토하시고 사용해 주시기 바랍니다.
- 변환오류 및 건의,문의사항은 juntong@juntong.or.kr로 메일로 보내주시면 감사하겠습니다. .
Copyright ⓒ 2020 By '전통문화연구회(傳統文化硏究會)' All Rights reserved.
 한국   대만   중국   일본