I am in the process of starting to write a Java library to implement high-performance Finite State Machines. I know there are a lot of libraries out there, but I want to write my own from scratch, as almost all the libraries out there construct automatons optimized for handling only one at a time.
I would like to know what the people in the SO community who have dabbled in state machine design feels are the most important / best design principles when it comes to implementing high-performance libraries like these.
Considerations
- The automatons generated are typically not massive. (~ 100-500 states).
- The implementation should be able to scale though.
- The implementation should enable fast transformations (minimization, determinization etc.).
- Looking to implement DFA, NFA, GNFA, PDA and possibly Tree Automata. Hopefully under a single interface if possible.
- Should have a good balance between memory use and performance.
Current questions regarding design for me at the moment are:
Should classes for
State,SymbolandTransitionbe defined? Or should a "hidden" internal structure be used. Personally I feel that using classes as such would waste a lot of memory since the same information can be stored in a much more condensed form. But, does this enable faster transformations? Does it hold any other pros / cons?What would be the best way to store the data internally? Using data structures like
HashMapandHashSetenables amortized constant time lookups, but there is an element of overhead involved. Is this the best way? Storing the transition information as a primitive (or not) array seems to waste quite a bit of memory. Especially when the library needs to handle a lot of automatons at a time. What are the pros / cons of the different data structures?
I appreciate any input. Thanks!