What are the challenges of Natural Language Processing?
Natural Language Processing is that the field of design methods and algorithms that takes as input or produce as output unstructured. Human language is highly ambiguous (consider the sentence I ate pizza with friends, and compare it to I ate pizza with olives), and also highly variable (the core message of I ate pizza with friends also can be expressed as friends and I shared some pizza). It’s also ever- changing and evolving. People are great at producing language and understanding language and are able of expressing, perceiving, and interpreting veritably elaborate and nuanced meanings. At the same time, while we humans are great druggies of language, we also are veritably poor at formally understanding and describing the rules that govern language.
Understanding the language and producing using computers is thus highly challenging. Indeed, the simplest known set of methods for handling language data are using supervised machine learning algorithms, that plan to infer usage patterns and regularities from a set of pre-annotated input and output pairs. Consider as an example the task of classifying a document into one among four categories: Sports, Politics, Gossip, and Economy.
The words within the documents provide very strong hints, but which words provide what hints? Writing up rules for this task is rather challenging, while readers can easily categorize a document into its topic, and, based on a few hundreds of examples in each category, let a supervised machine learning algorithm come up with the patterns of word usage that help categorize the documents. Machine learning methods shine at problem domains where a good set of rules is extremely hard to define but annotating the expected output for a given input is relatively simple.