What’s new in Stanford NLP and Stanza

In this talk, I will discuss updates to Stanza, our Python natural language processing toolkit supporting 70 human languages. Compared to existing widely used toolkits, Stanza features a language-agnostic fully neural pipeline for text analysis, including tokenization, multi-word token expansion, lemmatization, part-of-speech and morphological feature tagging, dependency parsing, and named entity recognition.

I will talk about Stanza’s neural architectural design, its simple user interface, and its improved performance against existing toolkits over a range of datasets covering 70 languages. Our latest updates include NER support for a variety of new languages, a CNN-based sentiment model, a constituency parser, and a language detection model.

Lastly, I will talk about Stanza’s Python interface to the widely used Stanford CoreNLP Java library, which extends Stanza’s functionality to an even richer range of tasks.

I will close my talk by talking about our future plans for the Stanza library.

About the speaker

John Bauer

Research Programmer at Stanford University

John Bauer has a BS and MS from Stanford University in Computer Science.

At the Stanford NLP Group, John has coauthored prominent parsing and deep learning research and has been a key long-term contributor to both the Stanford CoreNLP and Stanza toolkits including authoring components such as their shift-reduce constituency parsers.

He is currently the lead maintainer of both packages.

When

Sessions: October 5 – 7
Trainings: October 4, 12 – 15

Contact

nlpsummit@johnsnowlabs.com

Presented by