Integrating Students’ Process and Textual Data for Measuring the Interdependency of Domain-Specific and Generic Critical Online Reasoning

The state-of-the-art approach to assessing learning outcomes conceives assessment as a process of reasoning from the necessarily limited evidence of what students do to make claims about what they know and can do in the real world. In contrast, the analysis of process and text data generated by students during their learning understood as uninterrupted behavior is considered a more authentic alternative. This process and text data form multimodal data, which have the potential to create a more complete picture of critical online reasoning (COR) processes and can be analyzed by data science methods. Thus, the question arises, to what extent can data science methods be compared to state-of-the-art assessments to study COR processes.

C08 has three main objectives in advancing the field of educational research. First, C08 will provide an authentic digital assessment and learning environment in the AZURE cloud where students can behave as they do on their computers. Second, C08 will capture student activities by integrating multimodal textual and response process data in a research infrastructure called Multimodal Learning Data Science System (MLDS). MLDS will allow for examinations of students’ process data (e.g., webpage scrolling, time spent) and textual data (e.g., websites processed, text written) in generic (GEN) and domain-specific (DOM) COR tasks. Third, C08 will analyze and explore its multimodal data set to uncover latent relationships between text data processed or written by students and their behavioral response data (e.g., browsing histories, duration) while solving COR tasks.

C08 will provide an authentic digital assessment and learning environment in the AZURE cloud to emulate a Windows PC. This environment will be used for assessments in real Internet scenarios and the related simulations. The COR tasks will be implemented in close collaboration with the A- and B-projects. C08 will capture textual and process data of student activities in its MLDS research infrastructure to make it available for all research unit (FOR) projects. It will investigate the role and interaction of text and process data in successful COR task performances and how they are linked to students’ domain knowledge and personal traits.

C08 tests the significance of data science methods in the field of education. It identifies the added value and limitations of data science methods for processing multimodal text and process data generated in GEN- and DOM-COR assessments to contribute new insights and methods to educational science. 

C08 will collaborate with all FOR projects to create and evaluate a unique big data set for GEN- and DOM-COR research, and will develop an infrastructure to analyze and explore this data. While it contributes data science expertise to the FOR, it requires the expertise of educational scientists to customize and calibrate its methods.

C08 uses NLP (Natural Language Processing) methods and tools to analyze and process text and multimodal data. In order to enable efficient and automatic analysis, DUUI is being developed as a system that is scalable through the use of clusters, provides NLP tools that are compatible and interchangeable, simplifies reproducibility and is easy to use. The use of standardized data formats enables integration into MLDS and the system landscape of C08.

In addition to automated annotation, C08 provides browser-based tools for all projects to simplify manual annotation and rating tasks. This includes, for example, a tool for rating the participants' answers, a tool for the classification of websites or a tool for the annotation of linguistic structures. These tools are provided in the TextAnnotator and are based on the same standardized formats which enables direct exchange and usability.

The combination of pre-processing using DUUI and the annotation tools in TextAnnotator constitutes an annotation cycle: manually made annotations enable the iterative improvement of NLP tools, thereby reducing the reliance on manual annotation tasks over time through the enhancement and validation of automated processes.

The processed data and generated analyses are easily accessible for all projects via a website-based tool and an API from the MLDS.