ptilde: regex engine and scripting language

Discuss issues relating to the new Java-like scripting language and regex engine called P~, or ptilde. The P~ regex grammar has many novel powers not available in other regex engines, such as general statement insertion side-effects. P~ permits readable solutions to difficult problems, with document-level regexes that perform at or beyond the speed of solutions via other regex engines. Version 0.9 of the engine is found at ptilde.pbwiki.com

Wednesday, February 6, 2008

ptilde performance enhancement

A new version of P~ 0.9 has been released and posted on the website, which contains important performance enhancements. They are benchmarked at this link

Saturday, December 29, 2007

Version 0.9 of P~ released

Well, after several years of hard work, we've finished the first version of a new language. P~, or ptilde, is both a Java-like general purpose scripting language and a powerful regex engine and grammar, which can be used either in standalone mode to run scripts, or can be called from a Java application to run scriptlets.

As a scripting language, P~ allows the programmer who is comfortable with Java-like syntax to quickly create, edit, test and run scripts that execute in the JVM, and have full access to any public classes in the classpath. Thus your scripts can use and incorporate existing Java libraries. Yes, this is also what Groovy offers, but I think you will find that the P~ syntax is a lot closer to Java than even that of Groovy.

P~ specializes in some very powerful and novel regex grammars, employing a DFA engine to make your document manipulating solutions fast, even when they are grappling with difficult matching or transformation problems. I think the Java programmer will finally be able to compete on an equal footing with Perl experts when it comes to solving problems of transformation, search, and extraction.

Some of the regex features:
  • algebraic composition using the standard C operators, resulting in much more readable, though verbose, regexes. This is a stated goal of Perl 6.
  • all capture is "named-capture", again enhancing readability
  • the DoPattern, which allows you to insinuate arbitrary statements into your regex that execute in addition to the match, but if and only if the sub-pattern they wrap is part of the match
  • a more powerful transformation syntax than offered in any other regex engine, allowing any combination of stripping, insertion, and replacement in the output stream
  • the ability to parameterize your functions that return a Pattern/regex, so that even side-effects (DoPattern, named-capture) can be argument based
  • virtual Pattern functions that allow side-effects to access the instance variable "this", which opens the door to polymorphic regexes
  • match at the same time qualification, which opens the door to boolean query scan and much more
  • the combination of all of the above allowing you to create a document-level regex that incorporates the solution semantics as side-effects, so you don't intertwine parsing logic and fine-grained expressions

About Me

Java and C++ engineer since 1995. Began working on a new approach to regular expressions in 2002, which has evolved to become the P~ scripting language.