appRaiseVR – An Evaluation Framework for Immersive Experiences

Carolin Wienrich; Johanna Gramlich

doi:10.1515/icom-2020-0008

Artikel Öffentlich zugänglich

appRaiseVR – An Evaluation Framework for Immersive Experiences

Carolin Wienrich

Carolin Wienrich is Juniorprofessor for Human Technique Systems at the University of Würzburg and co-leader of the XR HUB Würzburg. She graduated in psychology at the University of Halle/Wittenberg. In 2015, she finished her PHD at the TU-Berlin. Her research interests focuses interaction paradigms between humans and digital entities as well as change experiences during and after digital interventions. Her team explores antecedents, potentials and risks of digital interactions and experiences since digital entities and digital interventions accompany humans in many contexts. Participative and human-centered research, theoretical concepts, and multi-methods stemming from psychology and computer science define her qualification in the field of human-computer interaction.
und Johanna Gramlich

Johanna Gramlich received her Bachelor (B. Sc.) degree in Human-Computer-Systems. She is currently completing her Master’s degree in Human-Computer Interaction at the University of Würzburg. Her research interests are user centered design and experience evaluation.

Veröffentlicht/Copyright: 6. August 2020

Veröffentlicht von

Veröffentlichen auch Sie bei De Gruyter Brill

Manuskript einreichen Informationen für Autor*innen Erkunden Sie dieses Fachgebiet

Aus der Zeitschrift i-com Band 19 Heft 2

Abstract

Objective. VR is evolving into everyday technology. For all diverse application areas, it is essential to understand the user’s condition to ensure a safe, pleasant, and meaningful VR experience. However, VR experience evaluation is still in its infancy. The present paper takes up this research desideratum by conflating diverse expertise and learnings about experience evaluation in general and VR experiences in particular into a systematic evaluation framework (appRaiseVR).

Method. To capture diverse expertise, we conducted two focus groups (bottom-up approach) with experts working in different fields of experience evaluation (e. g., Movie Experience, Theatre Experiences). First, we clustered the results of both focus groups. Then, we conflated those results and the learnings about experience evaluation stemming from the field of user experience into the final framework (top-down approach).

Results. The framework includes five steps providing high-level guidance through the VR evaluation process. The first three steps support the definition of the experience and evaluation conditions (setting, level, plausibility). The last two steps guide the selection to find an appropriate time course and tools of measure.

Conclusion. appRaiseVR offers high-level guidance for evaluators with different expertise and contexts. Finally, establishing similar evaluation procedures might contribute to safe, pleasant, and meaningful VR experiences.

Keywords: Virtual Reality; Experience Evaluation; User Experience; Framework

1 Introduction

Over recent years, virtual reality (VR) has received more and more attention through its use in entertainment and gaming but also in areas such as healthcare, therapy, collaborative work, and education. Also, due to the availability of affordable products, VR is evolving into everyday technology.

As early as 2017, Cher Wang (Chairwoman and CEO, HTC) stated: “The potential for Virtual Reality to help us learn, understand, and transform the world is limitless. VR for impact is a challenge to the VR community and content developers across the globe to help drive awareness and to solve the biggest challenges of mankind.” However, before researchers and practitioners can address serious problems with VR applications, they have to understand how VR affects users. Evaluating VR experiences is one method to acquire this understanding, and of course, many variables and measurements have been proposed in the past to evaluate VR experiences. Often, those evaluations addressed more or less observable phenomena (e. g., sense of presence, sense of virtual body ownership) emerging from features or rather possibilities of the VR technology (e. g., immersion; change the representation of the user by manipulating the corresponding avatar). Evaluating experiences, however, is a broad field and anchored in many disciplines (e. g., user experience, game experience, theatre experience, experience design).

The present paper presents a systematic evaluation framework (appRaiseVR), contributing to a high-level categorization of evaluation steps in the field of VR experience evaluation. It conflates knowledge about experience evaluation in general, stemming from different fields of research and VR experience evaluation in particular. The framework aims for a systematic overview and top-down guidance through the evaluation process. It provides assistance for the development of good, safe, and meaningful VR experiences and comparability to address serious problems with VR applications confidently.

The related work section discusses diverse concepts of user experience evaluations in general and of VR experience evaluation in particular to provide an overview of dimensions, levels, time points, and kinds of measures (top-down approach). The method section provides insights from two focus groups conducted with 19 experts working in different fields of experience evaluation. From a diverse range of experts’ views, specific requirements have been elaborated on that are key for evaluating VR experiences (bottom-up approach). The result section presents the framework that conflates the learnings from both the top-down and the bottom-up approach. The discussion section sums up the contribution of the framework, limitations, and future work.

2 Related Work

In our everyday life, as well as in many research disciplines (e. g., theater studies, interaction design) experience is a multifaceted construct. A very general definition might understand experience as the stream of feelings, thoughts, and actions (e. g., Kahneman, Diener & Schwarz [19]) that are perceived, understood, and remembered (https://www.dictionary.com/browse/experience?s=t). Similarly, Wright & McCarthy [41], described experience as an interplay between sensation, emotion, intellect, and action situated in a particular place and time. They also noted that the experience of the current moment is conditioned by our past experiences and by other people. Furthermore, they emphasized the importance of a comprehensive picture of experience which takes into account the richness of human experience.

Because of this, the full complexity of the term “experience” is out of the scope of this paper. The present paper therefore focuses on the field of user experience evaluation in general (discipline of human-computer interaction) that reveals valuable insights for VR experiences evaluations in particular. Secondly, we briefly present the state of the art of evaluating VR experiences. In addition, the section describes different dimensions, levels, time points, and kinds of measures, serving later as categories of the framework.

2.1 Evaluating User Experience (UX) in Human-Computer Interaction (HCI)

User experience evaluations describe how users interact with technology in different dimensions. The evolution of the term shows that the underlying theoretical concept determines the popularity of dimensions that constitute the term of user experience and the corresponding aspects of this interaction. Hornbæk and Oulasvirta [14] showed how different concepts of interaction are associated with different scopes and ways of construing the causal relationships between the human and the computer. Figure 1 shows the seven identified core concepts of interaction in the field of HCI, understanding interaction as an ongoing experience that refers to different dimensions, levels, and time points of that experience.

Figure 1

The figure shows Table 1 from Hornbæk & Oulasvirta [14, p. 5042], which gives an overview of core concepts of interaction in the HCI literature.

2.1.1 Dimensions of Evaluation

Pragmatic and Hedonic Dimensions

In the field of human-computer interaction (HCI), the evaluation of the user’s experience gained importance (Diefenbach and Kolb [7]). “User experience differs from ‘experiences in a general sense’, in that it explicitly refers to the experience(s) derived from encountering systems.” (Roto, Law, Vermeeren & Hoonhout [29, p. 6]). The umbrella term “user experience” (UX) covers the perception and behavior while interacting with products or technical systems (ISO-Norm BS EN ISO 9241-210) including, among others, the functionality, the content, and the aesthetics of a product, the context of use, and the user’s perception of and emotions towards the product (please find a collection of further definitions here: http://www.allaboutux.org/). The model of UX made by Hassenzahl & Tractinsky [12] differentiates between the objective qualities of a product and the user’s perception of them. Following this model, pragmatic qualities relate to the degree of how the product enables goal achievement, i. e., to usability and usefulness, while the hedonic quality relates to the psychological needs and the emotional experience of the user (Hassenzahl [11]; Law, Vermeeren, Hassenzahl & Eds [23]). Similar to Hassenzahl’s pragmatic quality, the instrumental quality of the Components of User Experience (CUE-Model, Minge & Riedel [25]; Thüring & Mahlke [35]) model addresses usability and usefulness. The hedonic quality of the CUE model, however, is divided into different modules: non-instrumental product perception (visual aesthetics, commitment, and status), emotions, and consequences of use (product loyalty, intention to use).

In sum, these models included the hedonic quality to shift the understanding of UX from the product and the product-centered evaluation to humans and their feelings and the so-called experiential evaluation in the moment of product use. While the hedonic quality serves as a motivator for creating positive experiences, the pragmatic quality is seen as a hygiene factor enabling need fulfillment, “but not the source of positive experience itself” (Hassenzahl, Diefenbach & Görlitz [10, p. 360]).

Anticipating the evaluations of VR experiences, the pragmatic quality was focused for a long time on looking for small latencies and high resolutions to avoid simulator sickness or reducing the weight and increasing the field of view of head-mounted displays to avoid discomfort. In line with technological development and the diversity of applications, hedonic aspects come to the fore (more details in Section 2.2).

Eudaimonic Dimensions

Coming back to the field of HCI and UX evaluation, the concepts of pragmatic and hedonic qualities have a focus on the moment of product use. Recently, researchers have investigated the role of eudaimonia in UX research (e. g., Huta & Ryan [15]; Mekler & Hornbæk [24]). In contrast to hedonic concepts, eudaimonia was strongly correlated with the experience of lasting meaning and the importance for future interaction. While hedonic UX is about momentary pleasures directly derived from technology use, eudaimonic UX is about meaning from need fulfillment (Mekler & Hornbæk [24]). Botella, Riva, Gaggiolo, Wiederhold, Alcaniz, and Banos [6] have embraced the importance of eudaimonia in the field of positive technology. They emphasized the value of technology that reaches and engages in self-empowerment and personal growth, including mainly technologies that support the training process, technologies that assist in withstanding future stressors, or technologies that enable social relations.

Although many VR experiences aim for meaningful impact on human behavior (e. g., in therapy or learning domains), eudaimonia-related concepts have rarely been covered by usual VR experience evaluations up until now.

Social Dimensions

While Botella et al. [6] have connected eudaimonia-related concepts with social aspects, Mekler & Hornbæk [24] revealed in an empirical study that neither hedonia nor eudaimonia was associated with the need of relatedness indicating that these qualities have not covered social aspects. Battarbee [2] pointed out that the models of user experience do not, for the most part, take into account what happens to user experiences in interaction when people start collaborating, communicating, and doing things together. As a consequence, she extended the concept of user experience to co-experience, which was defined as the user experience, which is created in social interaction (Battarbee [1]). Based on the idea of symbolic interaction, co-experience contains lifting experiences to shared attention, reciprocating experiences (e. g., acknowledging, accepting), and rejecting experiences (e. g., ignoring, downplaying). Although co-experience has played a minor role in the evaluation of user experience so far, a summary of modern interaction concepts revealed further that the coupling with the (social) environment is one incremental aspect to attain a “good” experience (Hornbæk & Oulasvirta [14]). Figure 1 shows the seven identified core concepts of interaction in the field of HCI, including “embodiment” described as coupling with a (social) environment and “tool use” understood as acting in the (social) environment. While the former emphasized more the feeling of being situated in an environment, the latter refers to the activity theory and the assumption that technologies are tools used for completing activities and that the experience of users (e. g., good feelings) are accompanied by the status of the activity (e. g., success) (Kaptelinin & Nardi [20]).

Recently, Grundgeiger, Hurtienne & Happel [9] used those interaction concepts for the safety-critical domain of healthcare. They conspicuously demonstrated that not the user alone, but the interaction with the socio-technical system (i. e. including the social environment) determines the experience of users.

Anticipating the evaluation of VR experiences, some interaction concepts are mirrored in the VR community without any reference to the concepts from the field of UX. Being situated and embodied in the virtuality are two crucial aspects of VR experience evaluations, referring to the concept of presence and the sense of virtual body ownership. Connecting the use of tool, the status of activity and the quality of experiences is another incremental aspect attended with the evaluation of wayfinding (e. g., by teleportation), navigation (e. g., by controllers), and object manipulation (e. g., by virtual hands) (Boletsis [5]; LaViola, Kruijff, McMahan, Bowman & Poupyrev [18]; Wienrich, Dollinger, Kock & Gramann [36]). Since those concepts have a particular meaning, they are revisited in Section 2.2.

2.1.2 Levels, Time Points and Kinds of Evaluation

2.1.2.1 Level of Evaluation

In the field of human-computer interaction and user experience, evaluation procedures addressed experiences stemming from the system (e. g., enjoying the interaction with a computer), from individual sub-modules (e. g., enjoying the interaction with a specific program) or from single interactive elements (e. g., enjoying the interaction with a controller), respectively (Reinhardt, Haesler, Hurtienne & Wienrich [26]). According to the level, measures can be implemented on different time points.

2.1.2.2 Time Points of Evaluation

Quantitative measurements can be taken during and after the interaction with technological devices encompassing log data or questionnaires, for example. Qualitative measurements are often observed during the interaction or assessed via interviews after the experience. Pre-post designs focus on the detection of changes. Thus, measurements are taken prior to the intervention and after the intervention. Then differences between the time points are reported. Multiple time points that often are collected several weeks or months after the interaction or intervention investigate the longer-term effects. The number of time points determines the evaluation procedure enormously. For example, if evaluators use a pre-post design or multiple assessments over time, they can use the same test procedure each time. However, different tests measuring the same construct are often needed to avoid primary, recency, or anchor effects.

2.1.2.3 Kinds of Evaluation

In addition to the choice of level or time point, different evaluation methods fulfilling different criteria of quality can be assigned to an appropriate evaluation method. Many criteria exist that determine the quality of measures. Wierwille & Eggemeier [40] proposed seven criteria in the field of workload assessment including validity, reliability, and objectivity as well as diagnosticity (identify changes and reasons of changes), intrusiveness (measures should not interfere with the primary task), implementation requirements (contain the time, used software and instruments for evaluation), and subject acceptability (focus on the subjective perception of reliability and usefulness of measurement). The additional criteria widen the perspective to identify appropriate measures, particularly in more applicable fields of evaluation. Most notably, intrusiveness seems very important considering evaluations of VR experiences (further discussion see Section 2.2).

2.2 Evaluating Experiences in Virtual Realities

Due to the wide range of VR applications, experiences in VR can engender pragmatic, hedonic, eudaimonic, and social experiences (e. g., Botella et al. [6]; Roth and Koenitz [28]). However, evaluations of VR experiences mainly focus on dimensions that influence human performance in VR or observable phenomena emerging due to the possibilities of VR technologies.

2.2.1 Dimension of Evaluation

Stanney, Mourant & Kennedy [34] pointed out several issues that are likely to influence human performance in VR, including psychophysical preconditions, task features, user characteristics, and issues concerning multi-modal interaction. Five years later, they developed a structured approach, called MAUVE (Multi-criteria Assessment of Usability for Virtual Environments), to assess the usability of virtual environments concerning multiple criteria (Stanney, Mollaghasemi, Reeves, Breaux & Graeber [33]). MAUVE differentiates between two interface components: (1) a virtual environment system interface and (2) a virtual environment user interface. The usability of the first component depends on the adequacy of interaction paradigms, which are specific for virtual worlds, such as navigation, wayfinding, and object manipulation. Moreover, its usability is affected by the quality of the visual, auditory, and haptic output that the user receives. The usability of the second component is influenced by the degree of user engagement that is accomplished concerning presence and immersion. Additionally, it is susceptible to side effects, e. g., to the degree of comfort the user experiences, or the occurrence of simulator sickness. More details and discussions of what is not captured by this assessment approach can be found elsewhere (e. g., Wienrich, Döllinger, Kock, Schindler & Traupe [37]; Wienrich, Noller & Thüring [38]).

Other researchers focused on observable phenomena emerging from features or rather possibilities of VR technology. Since much attention has been paid to the topics of immersion and presence, self-representation and virtual embodiment, other-representation, and social aspects, we briefly revisit them in the following.

In contrast to other fields of research (e. g., media communication), researchers in the field of VR agreed that immersion stands for what the technology delivers in all sensory and tracking modalities and that it can be objectively assessed. Presence, in contrast, is defined as a human reaction corresponding to a certain level of immersion and thus describes a subjective state (Slater [31], [32]). Different aspects of presence have been identified, such as social presence (explained below), telepresence, spatial presence, or being there, also called place illusion, and plausibility illusion (for a systematic review see Skarbez, Brooks & Whitton [30]).

One exclusive feature of VR is the representation of users appearing as virtual alter – an avatar. Corresponding effects are mainly based on the illusion that the virtual representation is part of one’s biological body, called the sense of embodiment (e. g., Kilteni, Groten & Slater [21]) or the illusion of virtual body ownership (IVBO). The sense of embodiment includes the phenomenological senses of self-location (i. e., the spatial experience of being inside a body), agency (i. e., the experience of controlling a body), and ownership (i. e., self-attribution of a body). Recently, the sense of change was added to describe the IVBO further (Roth, Lugrin, Latoschik & Huber [27]).

Similarly to self-representation, other users can be virtually embodied and varied in their appearance and behavior as social partners. The representation of real or artificial others gains relevance when users start to interact with virtual others. Consequently, not only the appearance and behavior of interaction partners are crucial for evaluating a VR experience, but also the representation of interaction (e. g., the number of interaction partners, presenting similarities or differences, features of group processes, social status, social interdependence) (Wienrich, Schindler, Döllinger, Kock & Traupe [39]). Similarly to the definition of spatial presence as a consequence of immersion, the inclusion and representation of others can lead to a sense of social presence [16] or co-presence (as defined above in Section 2.1).

2.2.2 Levels, Time Points and Kinds of Evaluation

2.2.2.1 Levels of Evaluation

Similarly to evaluating experiences in the field UX evaluation, the whole of the VR experience can be evaluated on a system level (e. g., enjoying an adventure), or it can be assessed on a task level (e. g., enjoying the mastering of a quest) or on an element level (e. g., enjoying the haptic feeling of mixed reality elements).

2.2.2.2 Time Points of Evaluation and Kind of Evaluation

In a similar way, the evaluation can be conducted at different time points. However, VR evaluations have to recognise that users mostly wear an head-mounted-display (HMD) and that the experience is situated in virtuality and not in reality (Wienrich, Döllinger, et al. [37]). Thus, the criteria mentioned above of intrusivity, or rather non-intrusivity is particularly essential. Post-experience measures (often questionnaires) mainly capture the experience on the system level inducing retrospective biases (e. g., the recency effect). In-experience measures allow for evaluations on finer levels (e. g., single task in a stream of tasks), but they can lead up to break in presence or the narrative. Thus, they should have plausible ties to the narrative and should be relatively short (Freytag & Wienrich [8]). Physiological measurements can help to overcome the constraints of intrusivity. However, VR experience is mostly interactive, and includes many movements. Body movements are well known to generate significant artifacts, and appropriately, physiological measures can easily be confounded by movements (Young, Brookhuis, Wickens & Hancock [42]). Using VR itself as an assessment instrument is rarely reported, although it holds much potential (e. g., Benford, Giannachi, Koleva & Rodden [3]). Obviously, finding an appropriate evaluation procedure for VR experiences is a huge challenge (for an exhaustive discussion, see Reinhardt et al. [26]).

2.3 Learnings about Experience Evaluations and Outline of Present Contribution

In sum, the evaluation of experiences reveals many dimensions (e. g., pragmatic, hedonic, eudaimonic, social), can be accomplished on different levels (e. g., system, task, element), to different time points (e. g., pre, during, post, over time) and by different kinds of measures (e. g., explicit, implicit). In comparison to other human-computer interactions, VR experiences include specific factors (e. g., simulator sickness, presence, virtual body ownership). Of course, VR experiences evaluations have already been made. However, the VR research lacks a systematic evaluation framework, which includes knowledge from different fields of experience evaluation in general and the specific requirements of VR experiences in particular.

The authors take up the research desideratum by conflating the expertise of researchers from diverse fields of experience evaluation (e. g., theatre, games, VR) and learnings about experience evaluation stemming from the field of user experience evaluation (discipline of HCI) into a systematic evaluation framework (appRaiseVR) considering the wide range of VR applications. The framework development includes a bottom-up and a top-down approach. As a result of two focus groups, import factors and issues concerning experience evaluation in general and VR experiences in particular, have been carried out together (bottom-up approach). After systematic clustering, the factors and issues identified by the experts have been classified with dimensions, levels, time points, and kinds of measures extracted from the literature reported above (top-down approach).

3 Method

3.1 Bottom-Up Approach – The Focus Groups

The bottom-up approach aimed to find topics, examples, and exemplary items for evaluating VR experience from the view of experts working in different fields of experience evaluation either as researchers or as practitioners.

To capture diverse expertise, we conducted two focus groups with researchers and practitioners working in different fields of experience evaluation. A focus group is a moderated group discussion with selected participants to a specific topic. The number of participants is usually between five and ten persons. It is recommended to conduct several focus groups on one topic in order to avoid specific characteristics of a group. A focus group is a discussion with a previously defined question that is relevant in the context of product development or research. New aspects resulting from the participants’ comments in discussions can also be taken up. The advantage, compared to individual interviews, is that participants may offer up differing views from other participants. Participants can present their point of view in detail, justify, and defend it to the others. In the resulting discussions, critical points can be examined from different angles. Focus groups are very well suited for generating ideas and getting a broad overview of a question (Kuniavsky [22]).

We accomplished two focus groups that have been called “Evaluation of VR Experiences”. The procedure in both groups was identical, but the participants diverse. The focus groups were conducted independently, and the second focus group did not receive input about the first focus group in order to avoid interference and priming effects.

The first focus group focused on the domains of User Experience, Game Experience, and Movie Experience. The second focus group focused on VR Experiences, Learning Experiences, and Theatre Experiences.

In the following, we first give an overview of the experts, followed by the procedure and analyzing methods.

3.1.1 Experts

To gain diverse insight, we invited 19 experts from different disciplines which address experience evaluation. Participants were between 27 and 50 years old (M=37.6; SD=8.3), 14 were males and 5 females. The majority ranked their experience between much and very much experience on a five-point Likert scale ranging from less to very much experience. Only a few had less or some experience.

Eight experts attended the first focus group. Four participants were researchers at a university in the field of human factors and media informatics, one as a professor, another as a postdoc, and two as Ph. D. students. Two participants worked in a company for human factors, one in a media innovation center and one in the field of immersive media.

Twelve experts participated in the second focus group; one of them had already taken part in the first focus group. Eleven participants were researchers at a university in the field of human-computer interaction, games engineering, education, media informatics, media psychology, and dramatic arrangement. Two of them worked as professors, three as postdoc and five as Ph. D. students. Another participant worked in a company for VR setups.

3.1.2 Procedure

3.1.2.1 Focus Group Preparation

Beforehand it is essential to create an appropriate atmosphere. This includes a neutral venue in order to avoid fear of contact. We decided to host both focus groups in rooms at a university, as this was a familiar environment for most participants. Additionally, we tried to avoid distractions and disturbances, provided snacks and drinks for the participants, and paid attention to regular breaks and variety in the program.

With the consent of the participants, we recorded the focus groups with a microphone and a camera and placed the corresponding technical equipment naturally and unobtrusively in the room.

We decided the seating arrangement beforehand and ensured equal positions and a good view for all participants by seating them in a circular position. In addition, we tried to create a pleasant discussion atmosphere by strategically arranging the participants according to their previous knowledge and profession.

The authors, one acting as a moderator and the other as an assistant, led the focus groups. The moderator provided substantial input during the focus groups and mainly moderated the discussions without directing the content. The assistant mainly took care of the organization regarding time and documentation.

The program and discussion guide were constructed in advance, which the moderator used as a script in both focus groups. This ensured the comparability of both focus groups. The discussion guide included a warm-up phase with an introduction and the creation of a respectful and constructive discussion atmosphere. The topic was introduced, participants introduced each other, and discussion rules were laid down. In the teamwork phase, participants worked in groups on different aspects around the question of experiences evaluation in VR. In the composition of the groups, we also paid attention to diverse professions and different perspectives emerging from different age groups, hierarchy levels, and a combination of researchers and practitioners. In the cool-down phase, the participants were able to present and discuss their results. The moderator also summarized the most critical points. The focus group lasted three hours in total.

3.1.2.2 Focus Group Procedure

a) Screening Prior to the Focus Group Meeting

The invitation contained a short explanation about the topic. After confirming they would take part, participants received more information about the procedure. They were asked to complete a screening questionnaire assessing their demographic, ranking their experience with virtual reality applications, naming important criteria for experience evaluation, and giving their profession. In addition, the experts were asked via mail to answer the following question “How do you ascertain a good experience?”. The answers were revisited in the warm-up phase. Table 1 shows the procedure and explains what was done and why it was done.

Table 1

The table shows the procedure and explains what was done and why it was done during the focus groups.

Phase	Duration	What was done?	Why was it done?
Warm-up	10 min	Introduction of moderator, topic & discussion roles	Participants know what they did and why they participated
	35 min	Introduction of the participants to each other by using the quotes of another participant	Participants introduce themselves; get to know about the diverse disciplines and views on experience definitions
	15 min	Group discussion on the question: “What do you consider to create a good experience?”	Participants discover different approaches to define different aspects of what accounts for a “good” experience
	10 min	break
Teamwork	25 min	Teamwork 1	Participants from different professions brainstorm and discuss different aspects of VR experience evaluation in small, diverse teams and thus find a broad range of essential factors, variables, or measurements
		Experts discuss how they would approach an assessment of the experience of the shown or similar VR experiences
	20 min	Teamwork 2	Participants test and evaluate previously developed questions to identify advantages, limitations, and missing aspects
		New teams apply and explore the previously developed questions
	20 min	Presentation of Teamwork	All participants are informed about the diverse perspectives
		All teams present their results
	10 min	break
Cool-down	20 min	Group discussion	Participants discuss their results and identify similarities and differences, giving an exhaustive picture of possibilities and challenges to evaluate VR experiences
Cool-down	15 min	Conclusion	The focus group is summed up and finished

b) Focus Group – Warm-Up Phase

The focus group started with a short introduction by the moderator. The introduction included motivation and a short presentation about the topic experience evaluation. In addition, some rules for constructive discussion were given.

After the introduction, participants introduced each other by presenting the answers of another participant to the question mentioned above. Therefore, we printed the answer of each expert on notes and stuck them to a wall. One after the other, participants took a note (which was not their own), briefly described it, and imagined from which kind of expert it could be written. Afterward, they briefly introduced themselves. The next person in the round was the person who had written the answer. After summarizing the core of the answer, he/she continued by choosing another note. This procedure allowed the participants to introduce themselves to the group, get to know each other, and get an impression of the different perspectives on the topic of experience evaluation.

The warm-up phase ended with a group discussion about the question, “What do you consider to create a good experience?”. Answers were written on a flipchart.

The warm-up phase took about one hour. After a break of ten minutes, the teamwork phase started.

c) Focus Group – Teamwork Phase

After the break, the moderator presented a short video demonstrating the diversity of the application of VR experiences. After explaining the two team tasks, the teams were formed.

For the first piece of teamwork, we split the participants into small groups with two to four experts from mixed professions, who had differing views. Every group received flipchart paper and pens. With the impressions from the video, we asked them to discuss how they would approach an assessment of the experience of the shown or similar VR experiences: “Which questions would you ask?”; “How would you make the experience ascertainable?”; “How would you assess the experiences?”. We emphasized that each expert should discuss from his/her point of view. After 25 minutes, every team agreed on five to ten questions and documented them on the flipchart.

For the second piece of teamwork, we changed the team members to create new groups, again including diverse professions and views. The new task required one expert to interview a new team member using the questions from his/her first teamwork. The interviewed member was asked to think about his/her last VR experience or imagine that he/she was one of the protagonists in the video. Then, the roles switched. Thus, the second team discussed questions from different primary teams. After 10 minutes, the team members noted the advantages, limitations, and missing aspects of the questions for 10 minutes. The results of the discussion were recorded on flipcharts.

d) Focus Group – Cool-Down Phase

After a break of ten minutes, the second team presented the questions and interview results. Team members from the first team could make additions. The presentations lasted 20 minutes in total.

The presentations led directly into the second group discussion, moderated and visualized by the moderator about the results of the teamwork, essential questions, surprising aspects, and possible problems or limitations.

After 20 minutes of discussion, the moderator recapitulated all the discussions, including the introduction and discussion at the beginning. Finally, we thanked everybody for their participation and finished the focus groups.

3.1.3 Focus Group – Analyses

The data analyses included the data of both focus groups. The results of the teamwork phase stemmed from four teams from the first focus group and four teams from the second focus group. Some teams had written a short questionnaire or collected different aspects that should be considered, and others noted various methods of recording the VR experience.

Table 2

The Table shows the clusters and their assigned items as well as some exemplary questions that evolved from the focus group participants.

Clusters	Items	Exemplary Questions
1. Preconditions	– Demographic data	– Age?
	– Physical limitations	– Gender?
		– Profession?
		– Visual Impairment?
2. Frame	– Context	– First experience with VR?
	– Purpose	– What experiences have you already had with VR?
	– Expectation	– What expectations do you have toward the experience?
	– Motivation	– What is the purpose of the VR experience?
	– Previous Experience	– What is the reason for your participation?
3. Usability	– Functionality	– Was technology blocking you?
	– Efficiency	– Have you achieved your goal?
	– Effectiveness	– Were you able to complete your task in VR?
	– Engagement	– Did you feel a part of what happened?
4. Comfort	– Physiological Comfort	– How easy is it to return to reality?
	– Psychological Comfort	– Have you felt sick?
		– Have you felt comfortable?
5. Story & Narrative	– Suspense	– How is the rhythm of the narrative?
	– Credibility	– How is the tension?
	– Directing Attention	– Self-explanatory story/experience?
	– Clarity	– Who am I? How do I interact?
		– How coherent is the experience?
6. Emotion	– Fun	– Did you have fun?
	– Fear	– Were you afraid?
	– Horror	– Did something frustrate you?
	– Frustration	– How intense was the experience?
	– Well-being	– Are you all right?
	– Intensity
7. Presence	– Involvement	– At what point did you feel involved?
	– Control	– How convincing was the experience?
	– Embodiment	– What did you find convincing about the experience?
	– Break of Presence	– Has reality been completely faded out?
		– Were there moments when the illusion was disturbed?
		– Were you able to influence your environment with your actions?
8. Social	– Connection	– Did you feel connected?
	– Empathy	– Did you feel alone/in company?
	– Team spirit	– At what point did you feel involved?
	– Interaction
9. After-Effect	– Short & Long Term Reminder	– Would you like to repeat the experience?
	– Learning Effect	– How relevant is the experience to you?
	– Consequence	– How did the VR experience impact your life?
		– How is the experience reported to others?
		– What do you remember in particular?
		– How is the learning progression?

The analysis started with a clustering of the teamwork results by applying the Affinity Diagram method (Holtzblatt & Beyer [13]). By using the Affinity Diagram method, data can be classified on the bases of their relationships. It is often used in the data classification of data gathered by focus groups or brainstorms. Our analysis followed a usual Affinity Diagram process which includes five steps: (1) Abstract collected data onto post-it notes; (2) Clustering post-its by answering the question: Is the post-it similar or different to the existing groups of post-its? (3) Name the clusters; (4) Rank the clusters; (5) Find relations between clusters and name them. Consequently, we (1) noted the results of the teamwork on post-its and collected them on a whiteboard. Then, (2) we start to cluster them by finding relationships between them. Some notes contained different aspects, so we categorised them in multiple topics to make sure their importance and complexity was taken into account. (3) Through the use of this procedure, we identified nine clusters that we named: preconditions, frame, usability, comfort, story/narrative, emotion, presence, social, after-effect (for more information see Table 2). (4) We ranked the clusters chronologically according to their appearance during the VR experience. (5) We looked for relationships between items of different clusters and decided on the final clustering of the teamwork data.

After the clustering of the teamwork data, we incorporated additional aspects stemming from the group discussions at the beginning and the end of the focus groups. To retain the diversity of results, neither the frequency of the occurrence of the individual items nor the degree of agreement on single items were reflected in the diagram.

Table 2 lists the clusters and all the assigned items stemming from the bottom-up approach. The items were developed from the data of the focus groups, and we also list some interesting exemplary questions which were worked out in the teamwork for every cluster, i. e., statements stemming from the participants.

3.2 Top-Down Approach – Assignment to High-Level Evaluation Steps

The aim of the top-down approach was to conflate the results of the focus groups and the learnings about experience evaluation stemming from the field of user experience in general and VR specific evaluations in particular into the final framework presented below. The results of focus groups (bottom-up approach) were related to dimensions, levels, time points, and kinds of measures stemming from the corresponding literature (compare Section 2). Again, we used the Affinity Diagram method (Holtzblatt & Beyer [13]) to restructure and conflate the findings from the focus groups and the literature. The top-down procedure led up to five high-level evaluation steps that on one hand conduce the definition of the evaluation conditions and on the other hand target on the selection for appropriate evaluation measures. The evaluation steps are present in the following section (see Table 4 for an overview).

4 Results

The framework presents the result of the top-down-approach. The findings from the focus groups (bottom-up approach) were related to dimensions, levels, time points, and kinds of measures stemming from the corresponding literature (compare Section 2). The clusters stemming from the focus groups and the dimensions and levels stemming from the literature have been incorporated in steps 1 to 3 that characterize the definition of the evaluation conditions. Time points and kinds of measures have been addressed by steps 4 and 5, characterizing the selection for appropriate evaluation measures. To illustrate the usage of the framework, one example specifying each step accompanies the description of the steps. Our example intends to evaluate the impact of virtual reality therapy on the reduction of exam anxiety.

Tables 3.1 to 3.5 refer to the five steps. Table 4 sums up and the categories at one glance.

Step 1: Defining the Setting (see Table 3.1 ). According to a user-centered design process in the field of UX evaluation, the evaluator should start with the determination of the evaluation setting. Thus, the first step comprises the used system specifications (e. g., full immersive); the analyses of the context or rather kind of VR applications (e. g., entertainment, therapy); and reflections about the user or users (e. g., experiences with VR). Any subcomponent of setting determines the frame of the evaluation and therewith the following steps of the framework. The subcomponents of the setting step are not intended to be an “all or nothing” selection. Instead, the evaluator is supposed to choose the degree of different subcomponents that can result in mixed forms like a learning context with gamification elements.

Example.

Our use case example intends to evaluate a full-immersive HMD VR system in the context of exam anxiety therapy in the population of students. Participants and other actors are full-body embodied by virtual agents. The students should not have any other disorders, and it is expected that they have no or only a few experiences with VR.

Step 2: Defining the Level (see Table3.2). The second step, i. e., evaluation level, refers to the levels described in the related work section (see Section x.1) and complemented by the finding from the focus groups. Consequently, it incorporates factors of the system or elements (e. g., used tools to navigate); features of the tasks (e. g., utilitarian); characteristics of the user (e. g., emotions); features related to other users (e. g., social aspects) and characteristics of the narrative (e. g., coherence). As well as this, any subcomponent covers many features and characteristics. Table 2 lists those found by the experts asked in the focus groups. In sum, the level determines the units of evaluation. For example, when a questionnaire is used for evaluating the units are the scales. When a video observation is used, the units are the observation categories. Of course, a complete list seems impossible. So again, we do not claim completeness.

Table 3.1

The table shows the first step (i. e., defining the setting) of the framework appRaiseVR. Three subcomponents – system, context, and the user – are described. Further, the chosen settings for the example is depicted. Notice that not all subcomponents are covered by the example.

Define experience and evaluation conditions: Step 1: Define the setting of the evaluation.
System – determine the degree of	Context – determine the degree of	User – define the personas
– immersion	– entertainment	– expectations
– self-representation	– gaming	– goals
– other-representation	– learning	– motivation
	– prototyping	– needs
	– teaching	– degree of experience in context
	– therapy	– degree of experience in VR
	– training
	– simulation
Choosing the settings for the example
– high degree of immersion	– context of therapy	– low degree of experience in context
– high degree of self-representation		– low degree of experience in VR
– high degree of other-representation

Table 3.2

The table shows the second step (defining the level) of the framework appRaiseVR. Five subcomponents – system, task, narrative, user-self, and user-others – are described. Further, the chosen levels for the example are depicted. Note that not all subcomponents are covered by the example.

Define experience and evaluation conditions: Step 2: Define the level of the evaluation.
System – determine	Task – determine the degree of	Narrative – determine the degree of	User-Self – determine the degree of	User-Others – determine the degree of
– hardware/software	– eudaimonism	– clarity	– acceptance	– connectedness
– environmental elements of VR	– hedonism	– coherence	– cognitive impact	– interdependence
– elements of manipulation tools in VR	– pragmatism	– plausibility	– emotional impact	– humanity
	– social aspects	– believability	– impact on behavior
	– utilitarianism	– suspense	– learning	determine the degree of similarity between
		– curiosity	– memory	– self [in VR] – others [in VR]
		– influencing perception	– need satisfaction	– self [in VR] – others [outside VR]
		– role identification	– well-being	– self [in VR] – object(s) [in VR]
		– rhythm	– physiological comfort/discomfort	– self [in VR] – object(s) [outside VR]
			– psychological comfort/discomfort
Choosing the levels for the example
– elements of virtual hands (e. g., latency, fidelity)	– tasks might include a high degree of eudaimonism, social aspects, and utilitarianism	– expect a high degree of believability of cover story	– expect a negative emotional valence and a high degree of arousal	– self and others in VR (hierarchy, familiarity)
			– expect a high degree of learning

Table 3.3

The table shows the third step (defining the plausibility) of the framework appRaiseVR. Five subcomponents, rebuild reality, alternative reality, extended reality, plausibility of self-representation, and plausibility of other’s representation are described. Further, the chosen plausibilities for the example is depicted. Notice that the example covers not all subcomponents.

Define experience and evaluation conditions: Step 3: Define the plausibility of the experience.
Rebuild Reality – determine plausibility	Alternative Reality – determine plausibility	Extended Reality – determine plausibility	Plausibility of Self-Representation – determine the degree of	Plausibility of Other’s-Representation – determine the degree of
– in relation to the physics in reality	– in relation to “new” physics in virtuality	– in relation to both the physics in reality and virtuality	– self-location	– other’s location
			– virtual body ownership	– anthropomorphism
			– agency	– agency (avatar, agents)
			– change	– change
Choosing the plausibilities for the example
– expect plausibility in relation to real therapy features			– expect a high degree of virtual body ownership	– expect a high level of anthropomorphism

Example.

In our example, participants stand in front of an examination committee and have to pass a series of cognitive tasks using controllers and virtual hands that are important to gain access to the next course (cover story). After the completion of five tasks, a virtual trainer appears and teaches some anxiety-reducing skills to the participant. Participants, examiners, and the trainer are full-body embodied by virtual agents. There is no communication or interaction with the real laboratory environment while being immersed in VR. Where does this leave the evaluation? It might be important to assess if the virtual hands work as tools for manipulating virtual objects. Task-related measures like time to completion, percent of the correctly solved tasks might help to quantify the performance, or even the success of intervention should be assessed. Perhaps, the believability of the cover story is the most crucial characteristic of the narrative. Regarding the user itself, the emotion, including arousal and valence, might be incremental for the evaluation. According to the relation of the users and others in the virtual environment, the hierarchy and familiarity of others in comparison to the student might be relevant (e. g., peers, known examiners, unfamiliar examiners, examiners who test the participants before).

Step 3: Defining the Plausibility (see Table 3.3 ). Since immersive VR experiences and the associated sense of presence is one of the most incremental determinants of a high-quality VR experience, the third step addresses the concept of presence in terms of plausibility primarily. The plausibility of a virtual experience can be assessed in comparison to the physics in reality (i. e., rebuilt reality), the plausibility in comparison to “new” physics in virtuality (i. e., alternative reality), or the plausibility in comparison to both, the reality and virtuality (i. e., extended reality). In addition to the pure sense of presence, the plausibility of self-representation (e. g., sense of virtual embodiment), as well as the plausibility of others-representation (e. g., anthropomorphic form), might be important factors to evaluate the quality of the VR experiences.

Table 3.4

The table shows the fourth step (selecting the time point) of the framework appRaiseVR. Four subcomponents – pre-experience, in-experience, post-experience, and overtime – are described. Further, the chosen time points for the example are depicted. Notice that the example does not cover not the subcomponents.

Select appropriate measures: Step 4: Select the time point of measure.
Pre-Experience – determine measures	In-Experience – determine measures	Post-Experience – determine measures	Overtime-Experience – determine measures
– taken before the VR experience starts	– taken during the VR experience while the degree of immersion should not change	– taken between different VR-experiences after changing the degree of immersion	– taken as a follow-up after the experience
(Must not be on the same day)		– taken directly after the experience after changing the degree of immersion
		– taken directly after the experience while the degree of immersion did not change
Choosing the time point for the example
– measure the degree of anxiety prior to the experience	– monitor the participants continuously during the experience	– measure the degree of anxiety directly after the experience in real life	– measure the degree of anxiety 4 weeks after the experience in real life

Example.

Since virtual therapy aims at positive effects for participants in real-life situations, the VR interventions rebuild a real situation. Thus, the sense of presence might be measured in terms of plausibility in comparison to the real world by including measures of the degree of realism or the amount of anxiety in comparison to real exposure. Regarding the plausibility of self-representation, the sense of virtual body ownership might be incremental for evaluation. According to the representation of the examiners and the trainer, the physical appearance (degree of anthropomorphism) could also be important.

While the first three steps lead to the definition of the evaluation conditions, the last two steps target the selection for appropriate evaluation measures.

Step 4: Selecting the Time Point (see Table 3.4 ). The fourth step, i. e., time point of measure, comprises measures taken prior to or during the VR experience, between or directly after the VR experiences, and measures that can be taken after a more extended period.

Example.

Training effects are usually measured by a pre-post design, meaning for our use it should include the same measures prior to the virtual exposure and after the exposure (e. g., level of anxiety, task performances) to assess the success of the intervention. In addition, the state of the participants could be monitored continuously during the exposure. Finally, the transfer of effect to real behavior might be essential to measure after a number of weeks of virtual exposure.

Step 5: Selecting the Measure Tools (see Table3.5). Finally, the fifth step, i. e., choosing tools, determines the choice of the evaluation measures including surrounding (e. g., duration, location), directness (e. g., explicit, implicit), and kind (e. g., questionnaire).

Example.

For evaluating the success of the intervention and the state of participants, the evaluator might choose a multi-method approach. While monitoring the arousal continuously through physiological measures (e. g., skin conductance), the valence of emotion might be assessed by anxiety questionnaires. In addition, an implicit association task might assess the level of subconscious anxiety.

Table 3.5

The table shows the fifth step (i. e., selecting the measure tools) of the framework appRaiseVR. Three subcomponents – surrounding, directness, and kind – are described. Further, the chosen tools for the example are depicted. Note that the example does not cover all subcomponents.

Select appropriate measures: Step 4: Select the tools of measure.
Surrounding – determine the	Directness – determine the degree of	Kind – determine the portion of
– location	– explicit measures	– behavior measures
– duration	– implicit measures	– physiological measures
		– questionnaire measures
		– task measures
Choosing the tools for the example
– laboratory	– choose explicit and implicit measures	– choose physiological, questionnaires, and task measures
– duration of usual exposure in real-life therapy

5 Discussion

5.1 Motivation

VR is evolving into everyday technology, besides gaming, including applications in the fields of therapy or training. Independently of the application context, it is essential to ensure the quality of the VR experience and understand user’s cognition, emotion, and behavior, to ensure a safe, pleasant, and meaningful VR experience. However, VR research lacks a systematic evaluation framework, which includes the knowledge from diverse disciplines of experience evaluation and the specific requirements of VR experiences. The present paper takes up this research desideratum by conflating the expertise of researchers from diverse fields of experience evaluation and knowledge about experience evaluation from the domain of user experience evaluation into appRaiseVR, i. e., a systematic evaluation framework considering the wide range of VR applications. As a result of two focus groups (bottom-up approach) and identifying general and VR specific dimensions, levels, time points, and kinds of measure from the literature (top-down approach) addressing user experience evaluation, the framework includes five steps (see Table 4 for an overview). The first three steps support the definition of the experience and evaluation conditions by determining the setting and the level of evaluation, and the plausibility of the experience. Considering the condition prior to the evaluation might help to plan the evaluation and to cover the most important factors for the corresponding experience context. The last two steps guide the selection to find a suitable time course and appropriate tools of measure.

Table 4

The table summarizes the steps and subcomponents of the framework appRaiseVR.

define experience and evaluation conditions			select appropriate evaluation measures
step 1: define setting of evaluation	step 2: define level of evaluation	step 3: define plausibility of experience	step 4: select time point of measure	step 5: select tool of measure
– system	– system	– rebuild reality	– pre-experience	– surrounding
– context	– task	– alternative reality	– in-experience	– directness
– user	– narrative	– extend reality	– post-experience	– kind
	– user-self	– plausibility of self-representation	– over time-experience
	– user-others	– plausibility of other’s-representation

5.2 Contribution

The contribution of the framework lies in the high-level categorization of user evaluations in the field of VR experiences. The values are manifold.

(1) Conflating and categorizing the knowledge from diverse fields that are engaged in evaluating experiences might systematically guide evaluators. By addressing the five steps proposed by the framework, researchers can determine the evaluation, experience, and measure conditions of their VR experience prior to the evaluation. Most steps allow graduation rather than a cut-off decision. Through use of the corresponding graduation, researchers are supported in their decision and planning process. In particular, the framework could guide researchers from different fields. Researchers from the field of VR might benefit from the content condensed from domains of experience evaluation (see Section 2.1). In contrast, researchers from other disciplines using VR for creating experiences might draw attention to requirements that are specific for VR experience evaluations (e. g., plausibility of narrative). In conclusion, appRaiseVR might guide experienced and inexperienced evaluators.

(2) The categorization of subcomponents is apposite to demands but also limitations of diverse evaluation contexts enabling the selection of appropriate niveaus of evaluations ranging from quick assessments (e. g., by one item questions) to exhaustive evaluations (e. g., considering a pre-post design). The possibility to graduate or mix categories meets the requirements of complex applications that are not classifiable.

(3) Establishing similar evaluation procedures using the framework might increase the comparability between different VR applications and experiences within the same context.

5.3 Limitations and Future Work

Although the framework gives a systematic overview and top-down guidance, further elaboration will increase the value and application of the framework, particularly the bottom-up guidance. In cooperation with interested experts from the focus groups, the authors will amplify the level and tool components by proposing frequently applied measures. For example, think about the subcomponent regarding the user itself. Emotions can be assessed by questionnaires, physiological measures, or observations. Furthermore, different questionnaires, physiological measures, or observation methods can assess the same affective state. However, different measures have different qualities and are validated for different populations in different contexts. In the experience evaluation, literature is a considerable corpus of knowledge about different measures and their qualities that should be related to the subcomponents of the framework. For most subcomponents, detailed knowledge and expertise can be added. In our opinion, giving evaluators an idea of how to assess the subcomponents in detail, i. e., link them to what measures and tools should be used, would improve the value of the framework for its bottom-up guidance. In addition, for some subcomponents, a lack of appropriate measures will be identified and this will encourage researchers to find new methods.

Another important task for future work will be the modularization of the framework. Although each of the components represents a kind of a module, the use of any module independently of the other components would be feasible. The more independently the components are able to be used, the wider the range of application. Of course, if all components were applied thoroughly, it would result in a high-quality evaluation. However, if evaluators only have time for a quick evaluation, it should be possible to focus on a few specific aspects within the framework.

Finally, the field of experience evaluation is large, and the authors do not claim completeness. Thus, the steps and subcomponents will be advanced continuously, employing feedback from experts and applicants. In particular, specialized subdomains of the user experience domain can reveal valuable additions. As mentioned above (see Section 2.1), user experience gained in importance in safety-critical domains like healthcare (Grundgeiger et al. [9]). Features, which are specialized in safety-critical domains, might receive particular attention in evaluations in the context of virtual therapy or training. Another important subdomain is game experience. User experience in games has been evaluated using a variety of concepts, including some concepts that are also important in VR experiences. Among others, often mentioned are immersion, fun, presence, involvement, engagement, flow, play, and playability, and what makes play fun or social, including social play (Bernhaupt, Ijsselsteijn, Mueller, Tscheligi & Wixon [4]). Ijsselsteijn, de Kort, Poels, Jurgelionis & Bellotti [17] have recognized that the evaluation of game experience lacks a coherent and fine-grained set of methods and tools that enable the measurement of game experience and that “it is impossible to come up with a single word or concept that embraces what people feel or experience when playing digital games” (p. 83). The authors also employed the focus group methodology to assess in-depth, contextual, and motivational insights into the specific experiences of different types of gamers. They organized four focus groups with gamers differing according to several variables (e. g., game frequency). Five game researchers combined knowledge and insights gathered from both theoretical findings and focus group explorations to create a comprehensive categorization of game experience dimensions. Their final categorization included: enjoyment, flow, imaginative immersion, sensory immersion, suspense, competence, negative effect, control, and social presence – all concepts that are closely tied with components in VR experience evaluations. In particular, for finding established measured substantiating appRaiseVR, measures from the subdomain of game experiences should be considered.

In summary, the next iteration of the framework should add suggestions for measures and tools complementing the top-down categorization of the present version. In other words, appRaiseVR should be developed from a systematic to a sophisticated evaluation framework.

About the authors

Carolin Wienrich

Carolin Wienrich is Juniorprofessor for Human Technique Systems at the University of Würzburg and co-leader of the XR HUB Würzburg. She graduated in psychology at the University of Halle/Wittenberg. In 2015, she finished her PHD at the TU-Berlin. Her research interests focuses interaction paradigms between humans and digital entities as well as change experiences during and after digital interventions. Her team explores antecedents, potentials and risks of digital interactions and experiences since digital entities and digital interventions accompany humans in many contexts. Participative and human-centered research, theoretical concepts, and multi-methods stemming from psychology and computer science define her qualification in the field of human-computer interaction.

Johanna Gramlich

Johanna Gramlich received her Bachelor (B. Sc.) degree in Human-Computer-Systems. She is currently completing her Master’s degree in Human-Computer Interaction at the University of Würzburg. Her research interests are user centered design and experience evaluation.

Acknowledgment

We would like to thank all experts for their participation in our focus groups and for sharing their experiences and needs.

References

[1] Battarbee, K. (2003). Co-experience: the social user experience. In Conference on Human Factors in Computing Systems – Proceedings (pp. 730–731). doi:10.1145/765891.765956.Suche in Google Scholar

[2] Battarbee, K. (2007). Co-experience: product experience as social interaction. In Product Experience. Elsevier. Retrieved from https://www.sciencedirect.com/science/article/pii/B9780080450896500228.10.1016/B978-008045089-6.50022-8Suche in Google Scholar

[3] Benford, S., Giannachi, G., Koleva, B., & Rodden, T. (2009). From interaction to trajectories: designing coherent journeys through user experiences. In Chi2009: Proceedings of the 27th Annual Chi Conference on Human Factors in Computing Systems, Vols. 1–4. (pp. 709–718). doi:10.1145/1518701.1518812.Suche in Google Scholar

[4] Bernhaupt, R., Ijsselsteijn, W., Mueller, F., Tscheligi, M., & Wixon, D. (2008). Evaluating user experiences in games. In Conference on Human Factors in Computing Systems – Proceedings (pp. 3905–3908). doi:10.1145/1358628.1358953.Suche in Google Scholar

[5] Boletsis, C. (2017). The new era of virtual reality locomotion: a systematic literature review of techniques and a proposed typology. Multimodal Technologies and Interaction, 1(4), 24. doi:10.3390/mti1040024.Suche in Google Scholar

[6] Botella, C., Riva, G., Gaggioli, A., Wiederhold, B. K., Alcaniz, M., & Baños, R. M. (2012). The present and future of positive technologies. Cyberpsychology, Behavior and Social Networking, 15(2), 78. doi:10.1089/cyber.2011.0140.Suche in Google Scholar PubMed

[7] Diefenbach, S., Kolb, N., & Hassenzahl, M. (2014). The ‘hedonic’ in human-computer interaction: history, contributions, and future research directions. Retrieved from https://dl.acm.org/doi/abs/10.1145/2598510.2598549?casa_token=3U7hD5ZjbQkAAAAA:_OOFrpPk9sKOWfed5MC_6WeFZS0VLYoo06UZtBQMX112yLybxbQbQg41oW2bGVRovG7uplz_njXd.Suche in Google Scholar

[8] Freytag, S. C., & Wienrich, C. (2017). Evaluation of a virtual gaming environment designed to access emotional reactions while playing. In 2017 9th International Conference on Virtual Worlds and Games for Serious Applications, VS-Games 2017 – Proceedings (pp. 145–148). doi:10.1109/VS-GAMES.2017.8056585.Suche in Google Scholar

[9] Grundgeiger, T., Hurtienne, J., & Happel, O. (2019). Why and how to approach user experience in safety-critical domains: the example of health care. Human Factors. doi:10.1177/0018720819887575.Suche in Google Scholar PubMed PubMed Central

[10] Hassenzahl, M., Diefenbach, S., & Görlitz, A. (2010). Needs, affect, and interactive products – facets of user experience. Retrieved from https://academic.oup.com/iwc/article-abstract/22/5/353/684432.10.1016/j.intcom.2010.04.002Suche in Google Scholar

[11] Hassenzahl, M. (2010). Experience design: technology for all the right reasons. Synthesis Lectures on Human-Centered Informatics, 3(1), 1–95. doi:10.2200/s00261ed1v01y201003hci008.Suche in Google Scholar

[12] Hassenzahl, M., & Tractinsky, N. (2006). User experience – a research agenda. Behaviour and Information Technology, 25(2), 91–97. doi:10.1080/01449290500330331.Suche in Google Scholar

[13] Holtzblatt, K., & Beyer, H. (2017). The affinity diagram. In Contextual Design (pp. 127–146). doi:10.1016/b978-0-12-800894-2.00006-5.Suche in Google Scholar

[14] Hornbæk, K., & Oulasvirta, A. (2017). What is interaction? In Conference on Human Factors in Computing Systems – Proceedings (Vol. 2017-May, pp. 5040–5052). Association for Computing Machinery. doi:10.1145/3025453.3025765.Suche in Google Scholar

[15] Huta, V., & Ryan, R. M. (2010). Pursuing pleasure or virtue: the differential and overlapping well-being benefits of hedonic and eudaimonic motives. Journal of Happiness Studies, 11(6), 735–762. doi:10.1007/s10902-009-9171-4.Suche in Google Scholar

[16] IJsselsteijn, W. A., de Kort, Y. A. W., & Poels, K. (2015). Game Experience Questionnaire. FUGA The fun of gaming: Measuring the human experience of media enjoyment GAME. Retrieved from https://pure.tue.nl/ws/files/21666907/Game_Experience_Questionnaire_English.pdf.Suche in Google Scholar

[17] Ijsselsteijn, W., de Kort, Y., Poels, K., Jurgelionis, A., & Bellotti, F. (2007). Characterising and measuring user experiences in digital games. In ACE Conference ’07 (January 2017). Retrieved from http://www.academia.edu/download/39634534/IJsselsteijn_et_al_2007_Characterising_and_Measuring_User_Experiences_ACE_2007_workshop.pdf.Suche in Google Scholar

[18] LaViola, J.J. Jr., Kruijff, E., McMahan, R.P., Bowman, D. & Poupyrev, I.P. (2017). 3D user interfaces: theory and practice.Suche in Google Scholar

[19] Kahneman, D., Diener, E., & Schwarz, N. (1999). Well-being: foundations of hedonic psychology. Retrieved from https://books.google.com/books?hl=en&lr=&id=-wIXAwAAQBAJ&oi=fnd&pg=PR5&dq=kahnemann,+dienrer+schwarz+hedonic&ots=ZpxTm6ifc5&sig=jk2IvHDABeYVX98NR6IkbGoA2aA.Suche in Google Scholar

[20] Kaptelinin, V., & Nardi, B. (2012). Activity theory in HCI: fundamentals and reflections. Synthesis Lectures on Human-Centered Informatics, 5(1), 1–105. doi:10.2200/s00413ed1v01y201203hci013.Suche in Google Scholar

[21] Kilteni, K., Groten, R., & Slater, M. (2012). The sense of embodiment in virtual reality. Presence: Teleoperators and Virtual Environments, 21(4), 373–387. doi:10.1162/PRES_a_00124.Suche in Google Scholar

[22] Kuniavsky, M. (2003). Observing the User Experience. A Practitioner’s Guide to User Research. Retrieved from https://books.google.com/books?hl=en&lr=&id=1tE4Skp9pI8C&oi=fnd&pg=PP1&dq=Mike+Kuniavsky.+2003.+Observing+the+User+Experience:+A+Practitioner’s+Guide+to+User+Research+(Morgan+Kaufmann+Series+in+Interactive+Technologies)+(The+Morgan+Kaufmann+Series+in+Interactive+Technologies).+Morgan+Kaufmann+Publishers+Inc.,+San+Francisco,+CA&ots=KiwAYU-IX8&sig=-tfij03dSz_vt4m5iTcF3bJVXIY.Suche in Google Scholar

[23] Law, E., Vermeeren, A., Hassenzahl, M., & Eds, M. B. (2007). Towards a UX manifesto COST294 – MAUSE affiliated workshop. Structure, 2(September 2007), 205–206. doi:10.1183/09031936.00022308.Suche in Google Scholar PubMed

[24] Mekler, E. D., & Hornbæk, K. (2016). Momentary pleasure or lasting meaning? Distinguishing eudaimonic and hedonic user experiences. In Conference on Human Factors in Computing Systems – Proceedings (Vol. 2016-January, pp. 4509–4520). Association for Computing Machinery. doi:10.1145/2858036.2858225.Suche in Google Scholar

[25] Minge, M., & Riedel, L. (2013). meCUE – Ein modularer Fragebogen zur Erfassung des Nutzungserlebens. In Mensch Und Computer, Munich (pp. 89–98). Retrieved from https://books.google.de/books?hl=de&lr=&id=u1fpBQAAQBAJ&oi=fnd&pg=PA89&dq=Minge,+M.,+%26+Riedel,+L.+(2013).+meCUE-Ein+more+modular+Fragebogen+zur+Erfassung+des+Nutzungserlebens.+(pp.+89–98).&ots=9MNMCgzo30&sig=35YxI7JG3wyK7FWJonLafW3YF-0.10.1524/9783486781229.89Suche in Google Scholar

[26] Reinhardt, D., Haesler, S., Hurtienne, J., & Wienrich, C. (2019). Entropy of controller movements reflects mental workload in virtual reality. In 26th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2019 – Proceedings (pp. 802-–808). doi:10.1109/VR.2019.8797977.Suche in Google Scholar

[27] Roth, D., Lugrin, J.-L., Latoschik, M. E., & Huber, S. (2017). Alpha IVBO – construction of a scale to measure the illusion of virtual body ownership. In Proceedings of the 35th Annual ACM Conference on Human Factors in Computing Systems.10.1145/3027063.3053272Suche in Google Scholar

[28] Roth, C., & Koenitz, H. (2016, October). Evaluating the user experience of interactive digital narrative. In Proceedings of the 1st International Workshop on Multimedia Alternate Realities (pp. 31–36).10.1145/2983298.2983302Suche in Google Scholar

[29] Roto, V., Law, E., Vermeeren, A., & Hoonhout, J. (2011). User Experience White Paper -– bringing clarity to the concept of user experience (2011).Suche in Google Scholar

[30] Skarbez, R., Brooks, F. P., & Whitton, M. C. (2017, November 1). A survey of presence and related concepts. ACM Computing Surveys, 50(6), 96. doi:10.1145/3134301.Suche in Google Scholar

[31] Slater, M. (2009). Place illusion and plausibility can lead to realistic behaviour in immersive virtual environments. Philosophical Transactions of the Royal Society B: Biological Sciences, 364(1535), 3549–3557. doi:10.1098/rstb.2009.0138.Suche in Google Scholar PubMed PubMed Central

[32] Slater, M. (1999). Measuring presence: a response to the Witmer and Singer presence questionnaire. Presence: Teleoperators and Virtual Environments, 8(5), 560–565. doi:10.1162/105474699566477.Suche in Google Scholar

[33] Stanney, K. M., Mollaghasemi, M., Reeves, L., Breaux, R., & Graeber, D. A. (2003). Usability engineering of virtual environments (VEs): identifying multiple criteria that drive effective VE system design. International Journal of Human Computer Studies, 58(4), 447–481. doi:10.1016/S1071-5819(03)00015-6.Suche in Google Scholar

[34] Stanney, K. M., Mourant, R. R., & Kennedy, R. S. (1998). Human factors issues in virtual environments: a review of the literature. Presence: Teleoperators and Virtual Environments, 7(4), 327–351. doi:10.1162/105474698565767.Suche in Google Scholar

[35] Thüring, M., & Mahlke, S. (2007). Usability, aesthetics and emotions in human-technology interaction. International Journal of Psychology, 42(4), 253–264. doi:10.1080/00207590701396674.Suche in Google Scholar

[36] Wienrich, C., Dollinger, N., Kock, S., & Gramann, K. (2019). User-centered extension of a locomotion typology: movement-related sensory feedback and spatial learning. In 26th IEEE Conference on Virtual Reality and 3D User Interfaces, VR 2019 – Proceedings, (pp. 690-–698). doi:10.1109/VR.2019.8798070.Suche in Google Scholar

[37] Wienrich, C., Döllinger, N., Kock, S., Schindler K. & Traupe, O. (2018). Assessing user experience in virtual reality – a comparison of different measurements. In International Conference of Design, User Experience, and Usability (pp. 573–589).10.1007/978-3-319-91797-9_41Suche in Google Scholar

[38] Wienrich, C., Noller, F., & Thüring, M. (2017). Design principles for VR interaction models: an empirical pilot study. In Dörner, R., Kruse, R., Mohler, B., and Weller, R. (Eds.), Virtuelle und erweiterte Realitäten. 14. Workshop der GI-Fachgruppe VR/AR (pp. 162-–171).Suche in Google Scholar

[39] Wienrich, C., Schindler, K., Döllinger, N., Kock, S., & Traupe, O. (2018). Social presence and cooperation in a large-scale, multi-user virtual reality – an empirical evaluation of a location-based adventure. In IEEE VR.10.1109/VR.2018.8446575Suche in Google Scholar

[40] Wierwille, W. W., & Eggemeier, F. T. (1993). Recommendations for mental workload measurement in a test and evaluation environment. Human Factors, 35(2), 263–281. doi:10.1177/001872089303500205.Suche in Google Scholar

[41] Wright, P., & McCarthy, J. (2010). Experience-centered design: designers, users, and communities in dialogue. Synthesis Lectures on Human-Centered Informatics, 3(1), 1–123. doi:10.2200/s00229ed1v01y201003hci009.Suche in Google Scholar

[42] Young, M. S., Brookhuis, K. A., Wickens, C. D., & Hancock, P. A. (2015, January 2). State of science: mental workload in ergonomics. Ergonomics, 58(1), 1–17. doi:10.1080/00140139.2014.956151.Suche in Google Scholar PubMed

Published Online: 2020-08-06

Published in Print: 2020-08-26

Artikel in diesem Heft

https://doi.org/10.1515/icom-2020-0008

Schlagwörter für diesen Artikel

Virtual Reality; Experience Evaluation; User Experience; Framework