What’s new in Stanford NLP and Stanza
In this talk, I will discuss updates to Stanza, our Python natural language processing toolkit supporting 70 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.
I will talk about Stanza’s neural architectural design, its simple user interface, and its improved performance against existing toolkits over a range of datasets covering 70 languages. Our latest updates include NER support for a variety of new languages, a CNN-based sentiment model, a constituency parser, and a language detection model.
Lastly, I will talk about Stanza’s Python interface to the widely used Stanford CoreNLP Java library, which extends Stanza’s functionality to an even richer range of tasks.
I will close my talk by talking about our future plans for the Stanza library.
John Bauer
Research Programmer at Stanford University
John Bauer has a BS and MS from Stanford University in Computer Science.
At the Stanford NLP Group, John has coauthored prominent parsing and deep learning research and has been a key long-term contributor to both the Stanford CoreNLP and Stanza toolkits including authoring components such as their shift-reduce constituency parsers.
He is currently the lead maintainer of both packages.