Probabilistic Pattern Matching and the Genetic Programming of Stochastic Regular Expressions
Brian J. Ross: Probabilistic Pattern Matching and the Genetic Programming of Stochastic Regular Expressions. Applied Intelligence, v. 13, n. 3, Nov/Dec 2000, pp.285-300.
The use of genetic programming for probabilistic pattern matching is investigated. A stochastic regular expression language is used. The language features a statistically sound semantics, as well as a syntax that promotes efficient manipulation by genetic programming operators. An algorithm for efficient string recognition based on approaches in conventional regular language recognition is used. When attempting to recognize a particular test string, the recognition algorithm computes the probabilities of generating that string and all its prefixes with the given stochastic regular expression. To promote efficiency, intermediate computed probabilities that exceed a given cut-off value will pre-empt particular interpretation paths, and hence prune unconstructive interpretation. A few experiments in recognizing stochastic regular languages are discussed. Application of the technology in bioinformatics is in progress.
1. D. Angluin, “Computational learning theory: survey and se- lected bibliography,” in Proceedings of the 24th Annual ACM Symposium on the Theory of Computing, ACM Press: New York, 1992, pp. 351–369.
2. K.S. Fu and T.L. Booth, “Grammatical inference: introduction and survey—Part I,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, no. 1, pp. 95–111, January 1975.
3. K.S. Fu and T.L. Booth, “Grammatical inference: introduction and survey—Part II,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 5, no. 4, pp. 409–423, July 1975.
4. Y. Sakakibara, “Recent advances of grammatical inference,” Theoretical Computer Science, vol. 185, pp. 15–45, 1997.
5. K.S. Fu and T. Huang, “Stochastic grammars and languages,” International Journal of Computer and Information Sciences, vol. 1, no. 2, pp. 135–170, 1972.
6. K.S. Fu, Syntactic Pattern Recognition and Applications, Prentice-Hall: Englewood Cliffs, NJ, 1982.
7. K. Lari and S.J. Young, “The estimation of stochastic context- free grammars using the Inside-Outside algorithm,” Computer Speech and Language, vol. 4, pp. 35–56, 1990.
8. E. Charniak, Statistical Language Learning, MIT Press: Cambridge, MA, 1993.
9. J.E. Hopcroft and J.D. Ullman, Introduction to Automata Theory, Languages, and Computation, Addison Wesley: Reading, MA, 1979.
10. R.C. Carrasco and M.L. Forcada, “Inferring stochastic regular grammars with recurrent neural networks,” in Proceedings 3rd International Colloqium on Grammatical Inference (ICGI96), edited by I. Miclet and C. de la Huguera, Springer-Verlag: Berlin 1996, pp. 274–286. LNAI 1147.
11. R.C. Carrasco and J. Oncina, “Learning deterministic regular grammars from stochastic samples in polynomial time,” Tech- nical Report DLSI-96-01, Universidad de Alicante, April 1998.
12. F.J. Maryanski and T.L. Booth, “Inference of finite-state proba- bilistic grammars,” IEEE Transactions on Computers, vol. C26, pp. 521–536, 1977.
13. A. van der Mude and A. Walker, “On the inference of stochastic regular grammars,” Information and Control, vol. 38, pp. 310– 329, 1978.
14. L.R. Rabiner and B.H. Juang, “An introduction to Hidden markov models,” in IEEE ASSP Magazine, pp. 4–16, January 1986.
15. L.R. Rabiner, “A tutorial on Hidden markov models and selected applications in speech recognition,” in Proceedings of the IEEE, vol. 77, no. 2, pp. 257–286, February 1989.
16. V.K. Garg, R. Kumar, and S.I. Marcus, “Probabilistic lan- guage framework for stochastic discrete event systems,” Tech- nical Report 96-18, Institute for Systems Research, April 1996. http://www.isr.umc.edu/.
17. H. Zhou and J.J. Grefenstette, “Induction of finite automata by genetic algorithms,” in Proc. 1986 IEEE Intl. Conference on Systems, Man, and Cybernetics, Atlanta, GA, IEEE Press: New York, 1986, pp. 170–174.
18. B.D. Dunay, F.E. Petry, and B.P. Buckles, “Regular language induction with genetic programming,” in Proc. 1st IEEE Con- ference on Evolutionary Computation, 1994, pp. 396–400.
19. P. Dupont, “Regular grammatical inference from positive and negative samples by genetic search: the GIG method,” in 2nd Intl. Coll. on Grammatical Inference and Applications, Springer- Verlag: Berlin, 1994, pp. 236–245.
M. Tomita, “Dynamic construction of finite automata from ex- amples using hill-climbing,” in 4th Annual Conf. of the Cognitive Science Soc., 1982, pp. 105–108.
S. Brave, “Evolving deterministic finite automata using cellular encoding,” in Proc. Genetic Programming 1997, Stanford Uni- versity, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann: Los Altos, CA, 1997, pp. 39–44.
T. Longshaw, “Evolutionary learning of large grammars,” in Proc. Genetic Programming 1997, Stanford University, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann, Los Altos, CA, 1997, pp. 406–409.
B. Svingen, “Learning regular languages using genetic pro- gramming,” in Proc. Genetic Programming 1998, edited by J.R. Koza et al., Morgan Kaufmann: Los Altos, CA, 1998, pp. 374– 376.
P. Wyard, “Context free grammar induction using genetic algo- rithms,” in Proceedings 4th International Conference on Genetic Algorithms, 1991, pp. 514–518.
M.M. Lankhorst, “Grammatical inference with a genetic algo- rithm,” in Proceedings of the 1994 EUROSIM Conference on Massively Parallel Processing Applications and Development, 1994, pp. 423–430.
S. Lucas, “Structuring chromosomes for context-free grammar evolution,” in Proceedings 1st International Conference on Evo- lutionary Computation, IEEE Press: New York, 1994, pp. 130– 135.
S. Sen and J. Janakiraman, “Learning to construct pushdown automata for accepting deterministic context-free languages,” Applications of Artificial Intelligence X: Knowledge-Based Systems, vol. 1707, pp. 207–213, 1992.
M.M. Lankhorst, “A genetic algorithm for the induction of push- down automata,” in 1995 IEEE Intl. Conference on Evolutionary Computation, IEEE Press: New York, 1995.
B.D. Dunay and F.E. Petry, “Solving complex problems with genetic algorithms,” in Proc. 6th Intl. Conf. on Genetic Algo- rithms, edited by L. Eshelman, Morgan Kaufmann: Los Altos, CA, 1995.
M. Schwehm and A. Ost, “Inference of stochastic regular gram- mars by massively parallel genetic algorithms,” in Proc. 6th Intl. Conf. on Genetic Algorithms, Morgan-Kaufmann: Los Altos, CA, 1995.
31. T.E. Kammeyer and R.K. Belew, “Stochastic context-free gram- mar induction with a genetic algorithm using local search,” in Foundations of Genetic Algorithms IV, edited by R.K. Belew and M. Vode, Morgan-Kaufmann: Los Altos, CA, 1997.
32. J. Stoy, Denotational Semantics, MIT Press: Cambridge, MA, 1977.
33. K. Subrahmaniam, A Primer in Probability, Marcel Dekker: New York, 1979.
34. M. Sipser, Introduction to the Theory of Computation, PWS Pub. Co., 1996.
35. M. Hennessy, The Semantics of Programming Languages—An Elementary Introduction Using Structural Operational Seman- tics, John Wiley and Sons: New York, 1990.
36. W.F. Clocksin and C.S. Mellish, Programming in Prolog, 4th ed, Springer-Verlag: Berlin, 1994.
37. B.J. Ross, “Logic-based genetic programming with definite clause translation grammars,” in Proc. GECCO-99, edited by J.R. Koza et al., 1999.
38. P.A. Whigham, “Grammatically-based genetic programming,” in Proceedings Workshop on Genetic Programming: From The- ory to Real-World Applications, edited by J.P. Rosca, 1995, pp. 31–41.
39. M.L. Wong and K.S. Leung, “Learning programs in differ- ent paradigms using genetic programming,” in Proceedings 4th Congress of the Italian Association for AI, 1995, pp. 353–364.
40. A. Geyer-Shulz, “The next 700 programming languages for genetic programming,” in Proc. Genetic Programming 1997, Stanford University, CA, USA, edited by J.R. Koza et al., Morgan Kaufmann: Los Altos, CA, 1997, pp. 128–136. 41. H. Abramson and V. Dahl, Logic Grammars, Springer-Verlag, 1989.
42. W.H. Press, S.A. Teukolsky, W.T. Vetterling, and B.P. Flannery, Numerical Recipes in C, 2nd edn. Cambridge University Press: Cambridge, 1992.
43. A. Brazma, I. Jonassen, I. Eidhammer, and D. Gilbert, “Ap- proaches to the automatic discovery of patterns in biosequences,” Technical Report 113, Department of Informatics, University of Bergen, Norway, December 1995.
44. K. Hofmann, P. Bucher, L. Falquet, and A. Bairoch, “The PROSITE database, its status in 1999,” Nucleic Acids Research, vol. 27, no. 1, pp. 215–219, 1999.