Peter Norvig in a response to Noam Chomsky, wrote a lengthy essay on his blog. In it he described and engaged with Chomsky's claim that "researchers in machine learning who use purely statistical methods to produce behavior that mimics something in the world, but who don't try to understand the meaning of that behavior."

In so doing, Norvig provided some great examples of statistical models can and cannot do. Before he approached the ultimate question, "are models that try to approximate meaning in language using statistics useful?", he set up a pretty great explanation of the ways in which statistical models are used. Some key passages are below.

While talking about the inferential power of statistical models, Norvig took the example of the ideal gas law (P = N k T / V). About this model Norvig says

This model ignores that complexity and summarizes our uncertainty about the location of individual molecules. Thus, even though it is statistical and probabilistic, even though it does not completely model reality, it does provide both good predictions and insight—insight that is not available from trying to understand the true movements of individual molecules. I think this is a great way to separate what models can and cannot do.

Later, Norvig makes an engineering, or performance based, case for statistical linguistic models instead of logical ones, which I thought was interesting but somewhat missed the point of what Chomsky's original complaint was. None the less, he finishes out the blog post by approaching the scientific premise upon which Chomsky built his argument and engaging with that.

Norvig suggests that Chomsky's argument or theory is categorical, and cannot account for gradience like a statistical model can. Even a naive Markov-chain model can be graded according to the probability or extreme-improbability of a sentence or phrase occurring. This gives far more power for insight than the type of logical model that Chomsky might embrace. Norvig says "Chomsky's theory, being categorical, cannot make this distinction; all it can distinguish is grammatical/ungrammatical.

The final portion of this article is the most interesting to me. In it, Norvig describes and gives some of his thoughts on the two cultures of modeling: data modeling and algorithmic modeling. The section is too long to quote here, but I highly recommend it.