Using automatic text categorization technologies in the modern educational process

  • Anna Glazkova

Abstract

The active improvement of natural text processing gives a lot of opportunities for the development ofeducational technologies. It is well-known that the participants in the modern educational process are facedwith the need to quickly view and classify large volumes of text documents. This problem arises everywhere:while searching the Internet for information in digital libraries, working with text databases and other elementsof the educational process. The improvement of text categorization technologies can make information retrievalmore efficient and therefore help the learner to more quickly and efficiently find the necessary information.The research deals with the problem of text categorization by the example of its assignment to aparticular age audience. In the first place, the solution to this problem gives the ability to improve the relevanceof information retrieval and also allows improvement of the mechanisms of excluding unwanted requests fromthe search results (such as websites whose content is designed for another age category).The authors of the research are developing approaches to mathematical modeling of the textcategorization task. These approaches are implemented within a prototype software system for automatic textcategorization based on the age of the text audience. In developing this system the authors are using texts inRussian, but the proposed methods are universal and can be applied to other related tasks.

Downloads

Download data is not yet available.

References

Grechnikov E.A., Gusev G.G., Kustarev A.A., & Raygorodsiy A.M. (2009). Unnatural texts search. Proc. of 11 Scientific Conference «Digital Libraries: Advanced Methods and Technologies, Digital Collections» - RCDL’2009 (pp. 306-308). Petrozavodsk. (In Russian).

Nguyen D., Smith N., & Rose C. (2011). Author Age Prediction from Text using Linear Regression. Proc. of ICASSP (pp. 267-276). New-York.

Santosh K., Bansal R., Shekhar M., & Varma V. (2013). Author Profiling: Predicting Age and Gender from Blogs. Notebook for PAN at CLEF (pp. 119-124). Singapore.

Choi D., Ko B., Kim H., & Kim P. (2013). Text Analysis for Detecting Terrorism-Related Articles on the Web. Journal of Network and Computer Applications, 5, 37 - 46.

Akker R., & Traum D.(2009). A comparison of addressee detection methods for multiparty conversations. Proc. of methods for multiparty conversations (pp. 99 – 106). Amsterdam.

Baba N., Huang H.-H., & Nakano Y. I. (2012). Addressee identification for human-human-agent multiparty conversations in different proxemics. Proc. 4th Workshop on Eye Gaze in Intelligent Human Machine Interaction. Beijing. Article no.

Lee H., Stolcke A., & Shriberg E. (2013). Using out-of-domain data for lexical addressee detection in humanhuman- computer dialog. Proc. North American ACL/Human Language Technology Conference (pp. 215 – 219). Atlanta.

Sheehan K. M., Flor M. & Napolitano D. (2013). A Two-Stage Approach for Generating Unbiased Estimates of Text Complexity. Proc. of the Second Workshop on Natural Language Processing for Improving Textual Accessibility (pp. 49 - 58). Atlanta.

Zakharova I. G. (2003). Information Technologies in Education. Moscow. Academia. (In Russian).

Burstein J. (2009). Opportunities for Natural Language Processing Research in Education. Lecture Notes in Computer Science: Computational Linguistics and Intelligent Text Processing (pp. 6 - 27).

Zechner K. (2012). Speech and Language Processing for Educational Applications. IEEE SLTC Newsletter. Article no.

Ellis N. C.; Flor M. & Napolitano D. (2013). A Two-Stage Approach for Generating Unbiased Estimates of Text Complexity. Proc. of the Second Workshop on Natural Language Processing for Improving Textual Accessibility (pp. 49 - 58). Atlanta.

Russian National Corpus -- RNC (2015). Available http://www.ruscorpora.ru/ (accessed: 3 June 2015). Grishina E. (2009). Multimodal Russian Corpus (MURCO): types of annotation and annotator's workbenches. Corpus Linguistics Conference CL2009. Liverpool.

Published
2016-06-03