Towards Effective Natural Language Application Development

Schreiber, Marc

🇬🇧

Dissertation

Zusammenfassung

🇬🇧

There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail.
The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.
Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles:
- Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP.
- NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex.
- Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers.
- Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use.
Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications:
- Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools.
- NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics.
- NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller.
- NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.

Sammlung(en)

Dissertationen (Software Engineering)

Zitieren

BibTex

@phdthesis{doi:10.17170/kobra-20190529539,
   author={Schreiber, Marc},
   title={Towards Effective Natural Language Application Development},
   school={Kassel, Universität Kassel, Fachbereich Elektrotechnik/Informatik},
   year={2019}
}

0500 Oax
0501 Text $btxt$2rdacontent
0502 Computermedien $bc$2rdacarrier
1100 2019$n2019
1500 1/eng
2050 ##0##http://hdl.handle.net/123456789/11255
3000 Schreiber, Marc
4000 Towards Effective Natural Language Application Development / Schreiber, Marc
4030 
4060 Online-Ressource
4085 ##0##=u http://nbn-resolving.de/http://hdl.handle.net/123456789/11255=x R
4204 \$dDissertation
4170 
5550 {{Textverstehendes System}}
5550 {{Natürlichsprachiges System}}
7136 ##0##http://hdl.handle.net/123456789/11255


<resource xsi:schemaLocation="http://datacite.org/schema/kernel-2.2 http://schema.datacite.org/meta/kernel-2.2/metadata.xsd">
2019-06-04T06:14:32Z
2019-06-04T06:14:32Z
2019
doi:10.17170/kobra-20190529539
http://hdl.handle.net/123456789/11255
eng
Namensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland
http://creativecommons.org/licenses/by-nc-nd/3.0/de/
Natural Language Processing
Natural Language Application Development
NLP
NLPf
NLP Lean Programming framework
004
Towards Effective Natural Language Application Development
Dissertation
There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail.&#13;
The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.&#13;
Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles:&#13;
-   Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP.&#13;
-   NLP applications must preprocess natural language before addressing the applications&rsquo; use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex.&#13;
-   Often, customizing NLP tools and models is necessary in order to improve the quality of the tools&rsquo; outputs. This customization process is complex and requires a great deal of effort from domain experts and developers.&#13;
-   Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use.&#13;
Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab&nbsp;(https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications:&#13;
-   Based on Maven&rsquo;s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools.&#13;
-   NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics.&#13;
-   NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller.&#13;
-   NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.
open access
Schreiber, Marc
2019-05-10
XI, 298, S Seiten
Kassel, Universit&auml;t Kassel, Fachbereich Elektrotechnik/Informatik
Z&uuml;ndorf, Albert (Prof. Dr.)
Kraft, Bodo (Prof. Dr.)
https://gitlab.com/schrieveslaach/NLPf
BMBF 01IS12019A
Textverstehendes System
Nat&uuml;rlichsprachiges System
Foundations of NLP Lean Programming framework
publishedVersion
</resource>

Die folgenden Lizenzbestimmungen sind mit dieser Ressource verbunden:

Creative Commons Lizenz

Solange nicht anders angezeigt, wird die Lizenz wie folgt beschrieben: Namensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland

Öffnen

Datum

Autor

Schlagwort

URI

Supplement

Metadata