Home
Blog
Code-Blog
Twitter
Downloads
Links / Books
About
|
Websites |
TkGen - A Lexical AnalyzerWhat is tkgen?TkGen is a lexical analyzer generator (also known as scanner generator) for C++, written in C++. Input formatBasicaly the input file a list of token names and regular expressions. Input file sample: NUMBER [0-9]+(\.[0-9]+)? SYMBOL [a-zA-Z]+([0-9]+[a-zA-Z]+)? BLANKS [\ \n]+ OPEN \( CLOSE \) MULTI \* DIV \/ PLUS \+ MINUS \- The output is a C++ header file with a DFA (deterministic finite automaton) transitions information. The generated file is combined with two template classes to create the final scanner which recognizes tokens from a source of characters that can be a file or string for instance. Try tkgen onlinehttp://www.thradams.com/webtkgen.aspx How to use the generated code?To create a Tokenizer you will need two more classes
Both can be found Tokenizer and InputStream tokenizer Complete sample #include "stdafx.h" #include <iostream> #include <fstream> //download it from http://www.thradams.com/codeblog/tkgencode.htm #include "tokenizer.h" //generated by the compiler. copy from the online tkgen and paste it in your file #include "statemachine.h" int _tmain(int argc, _TCHAR* argv[]) { std::wifstream ss(argv[1]); FileTokenizerStream<wchar_t> fileStream(ss); Tokenizer<StateMachine, FileTokenizerStream<wchar_t> > tk(fileStream); std::wstring lexeme; Tokens token; while (tk.NextToken(lexeme, token)) { std::wcout << TokensToString(token) << L": '" << lexeme << L"'" << std::endl; } } Input file details Tkgen accepts these regex syntax: ? : optional + : one or more * : zero or more . : any char [] : or-groups \ : escape 0-9: range inside groups (Note: ^ is not yet supported)
Download sampleReferences
Acknowledgments
History
|