Towards Effective Natural Language Application Development

Schreiber, Marc

dc.date.accessioned	2019-06-04T06:14:32Z
dc.date.available	2019-06-04T06:14:32Z
dc.date.issued	2019
dc.identifier	doi:10.17170/kobra-20190529539
dc.identifier.uri	http://hdl.handle.net/123456789/11255
dc.language.iso	eng
dc.rights	Namensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/de/	*
dc.subject	Natural Language Processing	eng
dc.subject	Natural Language Application Development	eng
dc.subject	NLP	eng
dc.subject	NLPf	eng
dc.subject	NLP Lean Programming framework	eng
dc.subject.ddc	004
dc.title	Towards Effective Natural Language Application Development	eng
dc.type	Dissertation
dcterms.abstract	There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail. The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence. Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles: - Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP. - NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex. - Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers. - Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use. Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications: - Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools. - NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics. - NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller. - NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.	eng
dcterms.accessRights	open access
dcterms.creator	Schreiber, Marc
dcterms.dateAccepted	2019-05-10
dcterms.extent	XI, 298, S Seiten	ger
dc.contributor.corporatename	Kassel, Universität Kassel, Fachbereich Elektrotechnik/Informatik
dc.contributor.referee	Zündorf, Albert (Prof. Dr.)
dc.contributor.referee	Kraft, Bodo (Prof. Dr.)
dc.relation.issupplementedby	https://gitlab.com/schrieveslaach/NLPf
dc.relation.projectid	BMBF 01IS12019A
dc.subject.swd	Textverstehendes System	ger
dc.subject.swd	Natürlichsprachiges System	ger
dc.title.subtitle	Foundations of NLP Lean Programming framework	eng
dc.type.version	publishedVersion

Dateien zu dieser Ressource

Name:: DissertationMarcSchreiber.pdf
Größe:: 18.51Mb
Format:: PDF

Öffnen

Name:: license_rdf
Größe:: 811Bytes
Format:: application/rdf+xml

Öffnen

Das Dokument erscheint in:

Dissertationen [11]

Zur Kurzanzeige

Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Namensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland