Datum
2019Autor
Schreiber, MarcSupplement
https://gitlab.com/schrieveslaach/NLPfMetadata
Zur Langanzeige
Dissertation
Towards Effective Natural Language Application Development
Towards Effective Natural Language Application Development
Foundations of NLP Lean Programming framework
Zusammenfassung
There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail.
The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.
Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles:
- Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP.
- NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex.
- Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers.
- Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use.
Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications:
- Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools.
- NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics.
- NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller.
- NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.
The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence.
Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles:
- Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP.
- NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex.
- Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers.
- Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use.
Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications:
- Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools.
- NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics.
- NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller.
- NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code.
Zitieren
@phdthesis{doi:10.17170/kobra-20190529539,
author={Schreiber, Marc},
title={Towards Effective Natural Language Application Development},
school={Kassel, Universität Kassel, Fachbereich Elektrotechnik/Informatik},
year={2019}
}
0500 Oax 0501 Text $btxt$2rdacontent 0502 Computermedien $bc$2rdacarrier 1100 2019$n2019 1500 1/eng 2050 ##0##http://hdl.handle.net/123456789/11255 3000 Schreiber, Marc 4000 Towards Effective Natural Language Application Development / Schreiber, Marc 4030 4060 Online-Ressource 4085 ##0##=u http://nbn-resolving.de/http://hdl.handle.net/123456789/11255=x R 4204 \$dDissertation 4170 5550 {{Textverstehendes System}} 5550 {{Natürlichsprachiges System}} 7136 ##0##http://hdl.handle.net/123456789/11255
2019-06-04T06:14:32Z 2019-06-04T06:14:32Z 2019 doi:10.17170/kobra-20190529539 http://hdl.handle.net/123456789/11255 eng Namensnennung-NichtKommerziell-KeineBearbeitung 3.0 Deutschland http://creativecommons.org/licenses/by-nc-nd/3.0/de/ Natural Language Processing Natural Language Application Development NLP NLPf NLP Lean Programming framework 004 Towards Effective Natural Language Application Development Dissertation There is a current trend that more and more computer programs analyze written or spoken natural language. For example, DVAs, IE systems, machine translation systems, and many other types of programs process natural language in order to solve specific use cases when interacting with humans via natural language. Amazon, Google, and Mycroft AI are just some of the companies that have produced DVAs capable of interacting with humans via voice. Such NLP applications use techniques from computer science and artificial intelligence to address their use cases. Additionally, many companies have begun to evaluate the capacity of NLP applications to improve their business processes, for instance by automatically processing customer requests received via e-mail. The development of NLP applications requires years of experience in computer science, artificial intelligence, ML, linguistics, and similar disciplines. Due to this requirement, development is exclusively available to computer science experts with many years of experience in computer science and artificial intelligence. Years of training and experience are therefore required in order to develop an NLP application capable of, for instance, automatically processing customer e-mails. However, the demand for NLP applications continues to grow, while the quantity of such computer science experts remains limited. Due to this growing demand, companies must be able to develop such applications using in-house developers without years of training or Ph. D. in computer science and artificial intelligence. Based on this limitation, this thesis identifies the main obstacles encountered by developers without many years of experience in computer science when creating NLP applications. These obstacles are identified through a research project named ETL Quadrat, which aims at building an IE system for gathering EC data from human-readable documents. The development of the IE system is hindered by a number of obstacles: - Developers require extensive knowledge of natural language, computer linguistics, statistics, ML, artificial intelligence, computer science, and NLP. - NLP applications must preprocess natural language before addressing the applications’ use cases. Additionally, a wide variety of NLP tools is available, and it is impossible to judge which set of NLP tools will perform best for a given application. This in turn makes the construction of preprocessing NLP pipelines extremely complex. - Often, customizing NLP tools and models is necessary in order to improve the quality of the tools’ outputs. This customization process is complex and requires a great deal of effort from domain experts and developers. - Finally, the available tool stack for building custom NLP tools, models, and pipelines is complex and difficult to use. Based on these and further obstacles, this thesis suggests a method based on CICD tools for supporting developers and domain experts to build NLP applications more efficiently. This method it then implemented through the open-source project NLPf. This project is available on GitLab (https://gitlab.com/schrieveslaac/NLPf) and provides the following features to improve the development process of NLP applications: - Based on Maven’s core features, NLPf enables quick project setup to create a domain-specific corpus which is then used to derive domain-specific NLP models based on existing NLP tools. - NLPf uses build automation to determine the best-performing NLP pipeline for a given NLP application. Additionally, NLPf measures and displays common metrics. - NLPf enables domain experts to easily annotate required training data through the easy-to-use annotation tool QPT and an Xbox 360 controller. - NLPf makes the best-performing NLP pipeline available as a Maven artifact, enabling it to be integrated in any Maven project. Additionally, developers can use a simple API to integrate the best-performing NLP pipeline into their program code. open access Schreiber, Marc 2019-05-10 XI, 298, S Seiten Kassel, Universität Kassel, Fachbereich Elektrotechnik/Informatik Zündorf, Albert (Prof. Dr.) Kraft, Bodo (Prof. Dr.) https://gitlab.com/schrieveslaach/NLPf BMBF 01IS12019A Textverstehendes System Natürlichsprachiges System Foundations of NLP Lean Programming framework publishedVersion
Die folgenden Lizenzbestimmungen sind mit dieser Ressource verbunden: