Abstract
Telemedicine has become a valuable asset in emergency responses for assisting paramedics in decision making and first contact treatment. Paramedics in unfamiliar environments or time-critical situations often encounter complications for which they require external advice. Modern ambulance vehicles are equipped with microphones, cameras, and vital sensors, which allow experts to remotely join the local team. However, the visual channels are rarely used since the statically installed cameras only allow broad views at the patient. They neither allow a close-up view nor a dynamic viewpoint controlled by the remote expert. In this paper, we present EyeRobot, a concept which enables dynamic viewpoints for telepresence using the intuitive control of the user’s head motion. In particular, EyeRobot utilizes the 6 degrees of freedom pose estimation capabilities of modern head-mounted displays and applies them in real-time to the pose of a robot arm. A stereo-camera, installed on the end-effector of the robot arm, serves as the eyes of the remote expert at the local site. We put forward an implementation of EyeRobot and present the results of our pilot study which indicates its intuitive control.
Introduction
Communication is a key aspect of successful medical treatment. Due to the fast modernization of connected devices, establishing a communication lines between two parties often only require mere seconds. During the process of digitization of the health care section, telemedicine quickly gains importance during patient treatment. Telemedicine describes the practice of utilizing digital long-distance communication for connecting patients and doctors anywhere in the world and is a rapidly growing technology and beneficial for inpatient, outpatient, and remote care [1]. It naturally increases the availability of expert knowledge at a given time, reduces travel expenses, and synergizes with digital patient data acquisition.
During a mass-casualty incident, telemedicine can be life-saving. Such incidents involve a large number of injured people, but equally important, involve a large number of regional paramedics. If the impact of the incident overstrains the capacity of paramedical first response, even novices in medical practices are eligible to be deployed. In order to reduce the workload of paramedics and to aid less experienced health workers, remote experts from geographically distant locations connect to them via local telecommunication channels. Modern ambulance vehicles are equipped with microphones and cameras which are accessible for remote experts. Contrary to the intuition that the videos captured from inside the vehicle provides most of the relevant information for consultation, it is the audio channel that is most and frequently used. The cameras in those vehicles are installed statically on the rooftop. Naturally, from these viewpoints, the cameras are only able to capture a broad view of the patient. The remote experts are neither able to change their viewpoint nor able to have a close-up view of the patient. Instead, they have to instruct the paramedics to manually move a mobile camera to their desired view [2]. Additionally to these problems, the view of the camera can be occluded by local workers. We solve the presented problem by explaining our concept of EyeRobot in section 2. Furthermore, we show in section 3 that spatial exploration with the EyeRobot can be done faster with the head-mounted display compared to using keyboard and mouse.
1.1 Related works
There are several concepts published to digitally immerse distant persons into a remote location.
The first approach to enable telepresence with dynamic viewpoints uses human surrogates. Kasahara et al.’s [3] purposes their JackIn Head system, in which one user wears a 360° camera while the second person spectates the video stream using a head-mounted display. They reported the sharing of the first-person view is a promising method. The human surrogate created dynamic viewpoints; however, they did not correspond to the movement of the second user. This confusion between the visual input and vestibular cues (i.e. the sense of balance) induced symptoms of cybersickness for the spectator [4, p. 49].
The second approach to enable telepresence utilizes RGB-D cameras, which are cameras able to capture the color and depth information of the scene. Escolano et al. [5] set up a dedicated area for the HoloPortation that is fully captured by multiple RGB-D cameras. They combined the data of all RGB-D camera for generating a fully 3D reconstructed scene that is transmitted to the Mixed Reality head-mounted display HoloLens worn by the remote user. This allowed the remote user to freely inspect the 3D reconstructed scene from arbitrary angles. Fuchs et al. [6] utilize bipolar projectors to project the 3D reconstructed scene on a flat surface and effectively creates the illusion of a window into the remote room. The local user can perceives the projection in 3D by wearing shutter glasses calibrated and synchronized beforehand to the projector. These approaches require high network bandwidth and strong computational power.
The third approach utilizes robotic components, coupled with a camera and a display. Higuchi et al. [7] use a drone in synchronization of the user’s head motion. The Flying Head successfully translated three-dimensional head-motion to control the drone’s movement and yielded faster navigation compared to using joystick controls. The AESOP surgical system is a robotic camera holder for laparoscopes during minimally invasive surgery, which motivated the ZEUS Robotic Surgical System and da Vinci Surgical System. The ZEUS and da Vinci systems share the concept of a remote console controlling several robot-arms [8]. Kristoffersson et al. [9] published a thorough review on mobile robotic telepresence on industrial and consumer levels. Those systems listed in the review article were perceived positively; however, they required unoccupied space on floor level to change their viewpoint and were limited in their degrees of freedom.
Method
EyeRobot emulates the paradigm of telepresence by using a robot arm as a surrogate for the user. This is accomplished by synchronizing the movement of a head-mounted display with the movement of the robot arm. A stereo camera, mounted onto the end-effector of the robot arm, provides immediate visual feedback from the perspective of the robot-arm. Lastly, by displaying the stereoscopic video stream onto the stereo displays inside the HMD, the user sees the local scene through the perspective of the robot arm.
Due to this setup, we expect the benefits of EyeRobot to be manifold: (a) The utilization of network bandwidth can be optimized rather easily due to existing well-optimized real-time video streaming libraries. (b) By capturing stereoscopic images with a baseline similar to the average interpupillary distance of a human, the 3D perception is intuitive to understand. (c) The control of the viewing angle and position of the robot deems to be intuitive since the end-effector of the robot arm mimics the movement of the user’s head. (d) The remote expert gains an exceptional 3D spatial understanding through the two visual cues motion parallax and stereopsis.
A possible hardware combination of EyeRobot uses the soft robot by Panda Franka Emika GmbH, the ZED mini stereoscopic camera by Stereolabs Inc, and a Virtual-Reality Headset with inside-out tracking such as a Windows Mixed-Reality Headset by Acer (Figure 1). Alternative HMDs include optical see-through HMDs such as the Microsoft HoloLens, or VR HMDs such as the Oculus Rift series with outside-in tracking. EyeRobot is realizable with any HMD capable of real-time 6 DoF spatial tracking which includes translational and rotational information.

Depiction of the real-time update loop for building up the paradigm of telepresence in the EyeRobot. Both components, HMD and robot arm synchronize at an initiation step (the initial poses are depicted transparently). As a traditional master-slave system, the HMD controls the robot arm by sending the relative position from the initiated pose. In return, the robot arm sends the stereoscopic video, captured by a binocular camera mounted at the end-effector, back to the remote user as replacement of its sight.
2.1 Implementation
EyeRobot involves several transformations. The notation
On the same note, the robot takes in the default pose and stores the transformation to its end-effector as
computes the new pose of the end-effector at frame i. The calculation and control of the final robot inverse kinematics are taken over by the robot interface. In order to allow the collaboration of EyeRobot in a relatively tight space on the local site, the control is designed as an active impedance controller inside a Cartesian space. This decision is crucial to the system since it allows human intervention on the robot and contributes to the safety of EyeRobot towards nearby people at the local site. The additional feather-damping system consisting of the final control signal U, and current position of the end effector
completes the control scheme of the robot arm.
For ensuring the safety of the local workers, EyeRobot contains several abort conditions that cause the robot arm to immediately cease manipulation and take on hold once met. The abort conditions are met if (a) the velocity of the joints or the end-effector surpasses a threshold value, (b) the desired end-position is outside the operating range of the robot, or (c) an external force is applied on the end-effector e.g. when colliding with an object or person.
The network communication between the remote user and the local site expands to the transmission of the relative transformation
Due to the small encoded size of the transformation, the User Datagram Protocol is a suitable candidate. The transmission of the stereoscopic frame requires a more sophisticated approach due to the amount of the data. A suitable approach for transmitting the video stream utilizes GPU accelerated H.264 compression, followed by the transmission to the remote user via Real-Time Streaming Protocol.
3 Study on spatial navigation
In this section, we evaluate the EyeRobot in the aspects of intuitiveness and the precision of control. We prepared an environment containing five printed letters as seen in Figure 2. The task of the participants is to position the camera, and with it, the robot arm, to face every letter sequentially in a timely manner. First, the participants use the HMD to control the EyeRobot and the second time with a keyboard and mouse. WASD controlled the translation based on the camera’s view direction, ‘Q’ and ‘E’ controlled up and down movement, and the mouse controlled pitch and yaw rotation. The time is measured for the entire procedure. We acquired n=5 participants for our pilot study. None of them had experience with EyeRobot or a similar system.

Setup of the pilot study on comparing EyeRobot with Keyboard/Mouse navigation of the robot arm. The goal is to navigate the camera such that each letter is well visible.
3.1 Results and discussion
All of the participants were faster at finding the letters with EyeRobot. On average, they were about twice as fast, with 20 s compared to 35 s using keyboard and mouse as seen in Figure 3. One participant reported symptoms of cybersickness.

Box-plot time in seconds for exploring the test-scene via the robot arm: control with EyeRobot vs. Keyboard/Mouse control.
This result suggests users can quickly adapt to the control of EyeRobot since all participants were able to control the systems immediately without previous experience. Due to the physical properties of the robot arm, the working area is limited by its reach and self-collision. The area may be enlarged by mounting the robot arm on the ceiling and choosing a robot arm with longer links. A larger user study needs to be conducted in order to validate the added value of EyeRobot and the overall user experience for telemedicine.
4 Conclusion
We presented the concept of EyeRobot, a telepresence system with a manually controllable viewpoint and stereo vision in an AR/VR HMD. A remote expert using EyeRobot gains an in-depth spatial understanding using the two visual cues motion parallax and stereopsis. Furthermore, the direct translation of the movement of the HMD to the remote camera suggests an intuitive control with a low learning curve. We believe EyeRobot is a valuable asset to collaborative environments. For the future, we want to evaluate EyeRobot in a large collaborative setting with local paramedics and a remote expert in a realistic ambulant setting. In particular, we are curious about the feel of presence and acceptance of both local and remote users.
Funding source: German Ministry of Education and Research (BMBF)
Award Identifier / Grant number: 16SV8088
Research funding: This work has been developed in the project ArtekMed. ArtekMed (reference number: 16SV8088) is funded by the German Ministry of Education and Research (BMBF).
Author contributions: All authors have accepted responsibility for the entire content of this manuscript and approved its submission.
Conflict of interest: Authors state no conflict of interest.
Informed consent: Informed consent has been obtained from all individuals included in this study.
Ethical approval: The research related to human use complies with all the relevant national regulations, institutional policies and was performed in accordance with the tenets of the Helsinki Declaration, and has been approved by the authors’ institutional review board or equivalent committee.
References
1. Coombes, CE, Gregory, ME. The current and future use of telemedicine in infectious diseases practice. Curr Infect Dis Rep 2019;21:41. https://doi.org/10.1007/s11908-019-0697-2.Suche in Google Scholar PubMed
2. Kurillo, G, Yang, AY, Shia, V, Bair, A, Bajcsy, R. New emergency medicine paradigm via augmented telemedicine. In: International conference on virtual, augmented and mixed reality. Springer; 2016, pp. 502–11.10.1007/978-3-319-39907-2_48Suche in Google Scholar
3. Kasahara, S, Rekimoto, J. JackIn head: immersive visual telepresence system with omnidirectional wearable camera for remote collaboration. In: Proceedings of the 21st ACM symposium on virtual reality software and technology; 2015, pp. 217–25.10.1145/2821592.2821608Suche in Google Scholar
4. Joseph, JJr, Kruijff, E, McMahan, RP, Bowman, D, Poupyrev, IP. 3D user interfaces: theory and practice. LaViola: Addison-Wesley Professional; 2017.Suche in Google Scholar
5. Orts-Escolano, S, Rhemann, C, Fanello, S, Chang, W, Kowdle, A, Degtyarev, Y, et al. Holoportation: virtual 3d teleportation in real-time. In: Proceedings of the 29th annual symposium on user interface software and technology; 2016, pp. 741–54.10.1145/2984511.2984517Suche in Google Scholar
6. Fuchs, H, State, A, Bazin, J-C. Immersive 3D telepresence. Computer 2014;47:46–52. https://doi.org/10.1109/mc.2014.185.Suche in Google Scholar
7. Higuchi, K, Rekimoto, J. Flying head: a head motion synchronization mechanism for unmanned aerial vehicle control. In: CHI’13 extended abstracts on human factors in computing systems; 2013, pp. 2029–38.10.1145/2468356.2468721Suche in Google Scholar
8. Pugin, F, Bucher, P, Morel, P. History of robotic surgery: from AESOP® and ZEUS® to da Vinci®. J Visceral Surg 2011;5:e3–8. https://doi.org/10.1016/j.jviscsurg.2011.04.007.Suche in Google Scholar PubMed
9. Kristoffersson, A, Coradeschi, S, Amy, L. A review of mobile robotic telepresence. Adv Hum Comput Interact 2013;2013. https://doi.org/10.1155/2013/902316.Suche in Google Scholar
© 2020 Kevin Yu et al., published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Artikel in diesem Heft
- Proceedings Papers
- 4D spatio-temporal convolutional networks for object position estimation in OCT volumes
- A convolutional neural network with a two-stage LSTM model for tool presence detection in laparoscopic videos
- A novel calibration phantom for combining echocardiography with electromagnetic tracking
- Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy
- Automatic generation of checklists from business process model and notation (BPMN) models for surgical assist systems
- Automatic stent and catheter marker detection in X-ray fluoroscopy using adaptive thresholding and classification
- Autonomous guidewire navigation in a two dimensional vascular phantom
- Cardiac radiomics: an interactive approach for 4D data exploration
- Catalogue of hazards: a fundamental part for the safe design of surgical robots
- Catheter pose-dependent virtual angioscopy images for endovascular aortic repair: validation with a video graphics array (VGA) camera
- Cinemanography: fusing manometric and cinematographic data to facilitate diagnostics of dysphagia
- Comparison of spectral characteristics in human and pig biliary system with hyperspectral imaging (HSI)
- COMPASS: localization in laparoscopic visceral surgery
- Conceptual design of force reflection control for teleoperated bone surgery
- Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation
- Deep learning for semantic segmentation of organs and tissues in laparoscopic surgery
- DL-based segmentation of endoscopic scenes for mitral valve repair
- Endoscopic filter fluorometer for detection of accumulation of Protoporphyrin IX to improve photodynamic diagnostic (PDD)
- EyeRobot: enabling telemedicine using a robot arm and a head-mounted display
- Fluoroscopy-guided robotic biopsy intervention system
- Force effects on anatomical structures in transoral surgery − videolaryngoscopic prototype vs. conventional direct microlaryngoscopy
- Force estimation from 4D OCT data in a human tumor xenograft mouse model
- Frequency and average gray-level information for thermal ablation status in ultrasound B-Mode sequences
- Generalization of spatio-temporal deep learning for vision-based force estimation
- Guided capture of 3-D Ultrasound data and semiautomatic navigation using a mechatronic support arm system
- Improving endoscopic smoke detection with semi-supervised noisy student models
- Infrared marker tracking with the HoloLens for neurosurgical interventions
- Intraventricular flow features and cardiac mechano-energetics after mitral valve interventions – feasibility of an isolated heart model
- Localization of endovascular tools in X-ray images using a motorized C-arm: visualization on HoloLens
- Multicriterial CNN based beam generation for robotic radiosurgery of the prostate
- Needle placement accuracy in CT-guided robotic post mortem biopsy
- New insights in diagnostic laparoscopy
- Robotized ultrasound imaging of the peripheral arteries – a phantom study
- Segmentation of the distal femur in ultrasound images
- Shrinking tube mesh: combined mesh generation and smoothing for pathologic vessels
- Surgical audio information as base for haptic feedback in robotic-assisted procedures
- Surgical phase recognition by learning phase transitions
- Target tracking accuracy and latency with different 4D ultrasound systems – a robotic phantom study
- Towards automated correction of brain shift using deep deformable magnetic resonance imaging-intraoperative ultrasound (MRI-iUS) registration
- Training of patient handover in virtual reality
- Using formal ontology for the representation of morphological properties of anatomical structures in endoscopic surgery
- Using position-based dynamics to simulate deformation in aortic valve replacement procedure
- VertiGo – a pilot project in nystagmus detection via webcam
- Visual guidance for auditory brainstem implantation with modular software design
- Wall enhancement segmentation for intracranial aneurysm
Artikel in diesem Heft
- Proceedings Papers
- 4D spatio-temporal convolutional networks for object position estimation in OCT volumes
- A convolutional neural network with a two-stage LSTM model for tool presence detection in laparoscopic videos
- A novel calibration phantom for combining echocardiography with electromagnetic tracking
- Domain gap in adapting self-supervised depth estimation methods for stereo-endoscopy
- Automatic generation of checklists from business process model and notation (BPMN) models for surgical assist systems
- Automatic stent and catheter marker detection in X-ray fluoroscopy using adaptive thresholding and classification
- Autonomous guidewire navigation in a two dimensional vascular phantom
- Cardiac radiomics: an interactive approach for 4D data exploration
- Catalogue of hazards: a fundamental part for the safe design of surgical robots
- Catheter pose-dependent virtual angioscopy images for endovascular aortic repair: validation with a video graphics array (VGA) camera
- Cinemanography: fusing manometric and cinematographic data to facilitate diagnostics of dysphagia
- Comparison of spectral characteristics in human and pig biliary system with hyperspectral imaging (HSI)
- COMPASS: localization in laparoscopic visceral surgery
- Conceptual design of force reflection control for teleoperated bone surgery
- Data augmentation for computed tomography angiography via synthetic image generation and neural domain adaptation
- Deep learning for semantic segmentation of organs and tissues in laparoscopic surgery
- DL-based segmentation of endoscopic scenes for mitral valve repair
- Endoscopic filter fluorometer for detection of accumulation of Protoporphyrin IX to improve photodynamic diagnostic (PDD)
- EyeRobot: enabling telemedicine using a robot arm and a head-mounted display
- Fluoroscopy-guided robotic biopsy intervention system
- Force effects on anatomical structures in transoral surgery − videolaryngoscopic prototype vs. conventional direct microlaryngoscopy
- Force estimation from 4D OCT data in a human tumor xenograft mouse model
- Frequency and average gray-level information for thermal ablation status in ultrasound B-Mode sequences
- Generalization of spatio-temporal deep learning for vision-based force estimation
- Guided capture of 3-D Ultrasound data and semiautomatic navigation using a mechatronic support arm system
- Improving endoscopic smoke detection with semi-supervised noisy student models
- Infrared marker tracking with the HoloLens for neurosurgical interventions
- Intraventricular flow features and cardiac mechano-energetics after mitral valve interventions – feasibility of an isolated heart model
- Localization of endovascular tools in X-ray images using a motorized C-arm: visualization on HoloLens
- Multicriterial CNN based beam generation for robotic radiosurgery of the prostate
- Needle placement accuracy in CT-guided robotic post mortem biopsy
- New insights in diagnostic laparoscopy
- Robotized ultrasound imaging of the peripheral arteries – a phantom study
- Segmentation of the distal femur in ultrasound images
- Shrinking tube mesh: combined mesh generation and smoothing for pathologic vessels
- Surgical audio information as base for haptic feedback in robotic-assisted procedures
- Surgical phase recognition by learning phase transitions
- Target tracking accuracy and latency with different 4D ultrasound systems – a robotic phantom study
- Towards automated correction of brain shift using deep deformable magnetic resonance imaging-intraoperative ultrasound (MRI-iUS) registration
- Training of patient handover in virtual reality
- Using formal ontology for the representation of morphological properties of anatomical structures in endoscopic surgery
- Using position-based dynamics to simulate deformation in aortic valve replacement procedure
- VertiGo – a pilot project in nystagmus detection via webcam
- Visual guidance for auditory brainstem implantation with modular software design
- Wall enhancement segmentation for intracranial aneurysm