Towards Effective Natural Language Application Development

dc.contributor.corporatenameKassel, Universität Kassel, Fachbereich Elektrotechnik/Informatik
dc.contributor.refereeZündorf, Albert (Prof. Dr.)
dc.contributor.refereeKraft, Bodo (Prof. Dr.)
dc.date.accessioned2019-06-04T06:14:32Z
dc.date.available2019-06-04T06:14:32Z
dc.date.issued2019
dc.identifierdoi:10.17170/kobra-20190529539
dc.identifier.urihttp://hdl.handle.net/123456789/11255
dc.language.isoeng
dc.relation.issupplementedbyhttps://gitlab.com/schrieveslaach/NLPf
dc.relation.projectidBMBF 01IS12019A
dc.rightsNamensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/3.0/de/*
dc.subjectNatural Language Processingeng
dc.subjectNatural Language Application Developmenteng
dc.subjectNLPeng
dc.subjectNLPfeng
dc.subjectNLP Lean Programming frameworkeng
dc.subject.ddc004
dc.subject.swdTextverstehendes Systemger
dc.subject.swdNatürlichsprachiges Systemger
dc.titleTowards Effective Natural Language Application Developmenteng
dc.title.subtitleFoundations of NLP Lean Programming frameworkeng
dc.typeDissertation
dc.type.versionpublishedVersion
dcterms.abstractThere is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail. The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence. Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles: - Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP. - NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex. - Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers. - Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use. Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications: - Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools. - NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics. - NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller. - NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.eng
dcterms.accessRightsopen access
dcterms.creatorSchreiber, Marc
dcterms.dateAccepted2019-05-10
dcterms.extentXI, 298, S Seitenger

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
DissertationMarcSchreiber.pdf
Size:
18.52 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
3.03 KB
Format:
Item-specific license agreed upon to submission
Description:

Collections