Investigating the graphical IEC 61131-3 language impact on test case design and evaluation of mechatronic apprentices
-
Kathrin Land
and Birgit Vogel-Heuser
Abstract
Mechatronic on-site technicians are responsible for maintenance and thus adequate testing of automated production systems. Hence, they must derive test cases from the IEC 61131-3 control code, which requires systematic testing skills. But despite the industry’s high demand for such skills, testing is not established in mechatronic apprentices’ educational curriculum. For the design of a teaching strategy for mechatronic apprentices on testing, this paper investigates the impact of the IEC 61131-3 language on test case design and evaluation. Comparing Sequential Function Chart (SFC) and Function Block Diagram, a trend towards SFC in apprentices’ self-perceived and real competence is shown.
Kurzzusammenfassung
Mechatroniker sind für die Wartung und den Test automatisierter Produktionssysteme zuständig. Dafür müssen sie adäquate Testfälle aus dem IEC 61131-3 Steuerungscode ableiten, was ein systematisches Vorgehen erfordert. Die dafür dringend notwendige Testkompetenz ist in den Lehrplänen für Mechatroniker jedoch nicht verankert. Dieser Beitrag untersucht für die Entwicklung einer Testen-Lehrstrategie für Mechatronik-Auszubildende den Einfluss der IEC 61131-3-Sprache auf Testfallentwurf und – evaluierung. Die von den Auszubildenen wahrgenommene als auch tatsächliche Testkompetenz ist beim Vergleich von Ablaufsprache (SFC) und Funktionsbausteinsprache (FBD) im Fall von SFC besser.
1 Challenges in code coverage assessment for mechatronic apprentices
In the field of production automation, the number of functionalities realized in software increased significantly [1] to meet the industrial need for automated production systems (aPS) to be flexible and adaptable during their decades-long lifetime [2]. However, flexible system adaptations and the resulting higher complexity of aPS software pose a challenge to the quality assurance of such systems. Testing and test coverage of aPS software thus gain importance to increase software quality and reduce system downtimes or broken equipment. The importance of adequate testing is particularly evident after software changes on-site [2], where testing is often neglected due to lack of time or experience. On-site technicians, who are mainly responsible for start-up and maintenance, face the challenge of identifying and executing test cases that can detect errors that may have resulted from the changes.
In computer science, code coverage assessment is a well-established method and an important indicator to determine whether available test cases are adequate to verify a given control code and to derive new test cases to cover yet untested system behavior. Adequate test cases are those that cover a significant portion of the control code, address the system behavior expected according to the requirements, and are clearly documented (i.e. input and expected output values). Despite the increasing research on code coverage for IEC 61131-3 programming languages in the field of production automation ([3] shows a research trend in code-coverage-based test case design and evaluation in over 10 different domains, including aPS), most studies focus on either tools (e.g. [4], [5]), metrics (e.g. [6]), or procedures (e.g. [7]) for code coverage instead of code coverage education. To the best of the authors’ knowledge, the training or educational aspect has not yet been studied despite the relevance introduced above.
Prospective start-up and maintenance personnel are educated in at least one (preferably graphical) IEC 61131-3 programming language at vocational school during their (mechatronic) apprenticeship [8]. However, their curricula typically do not contain testing or code coverage assessment of test cases. Consequently, they must be trained in the necessary testing skills throughout their careers. As the control code plays a significant role in code-based testing, this paper investigates whether and how mechatronic apprentices’ learning success and competence are affected by the graphical IEC 61131-3 programming language used for test case design and evaluation. This paper focuses on the IEC 61131-3 programming languages Function Block Diagram (FBD), as it is widely used in Europe and thus the language mainly taught in the mechatronic apprenticeship [8], and Sequential Function Chart (SFC) because of its similarity to control flow graphs, which were found to be helpful support for code coverage assessment in computer science, especially for novices [5]. The following research questions are regarded:
Research question 1 (RQ1): Is there a difference in whether the graphical IEC 61131-3 programming language SFC or FBD is used to teach mechatronic apprentices’ test case design and evaluation in terms of their comprehension and ability to create adequate test cases, especially as beginners?
Assuming a difference in RQ1, research question 2 (RQ2) is as follows: Which of the two graphical IEC 61131-3 programming languages, SFC or FBD, is better suited for teaching test case design and evaluation to mechatronic apprentices in terms of their comprehension and ability to create adequate test cases, especially as beginners?
Research questions RQ1 and RQ2 are investigated qualitatively in an experiment with mechatronic apprentices from a vocational school. Therefore, the following constraints and requirements are defined:
Constraint 1 (C1): The experiment is aimed at apprentices educated in mechatronics vocational schools because they represent the majority of prospective start-up and maintenance personnel in the field of production automation in Germany [8]. In the following, German mechatronic apprentices are abbreviated as “apprentices”.
Vocational schools in Germany offer a dual training program. Apprentices are taught the theory hands-on at the vocational school and apply their knowledge and thus gain more practical insights into a company. Apprentices usually start their dual training at the age of 16, thus being younger than the average university student.
Constraint 2 (C2): To address the practical orientation of apprentices, a mechatronic use case is required to teach test case design and evaluation.
Constraint 3 (C3): As stated above, this paper focuses solely on the comparison of the two graphical IEC 61131-3 programming languages, Function Block Diagram (FBD) and Sequential Function Chart (SFC).
To determine the qualitative impact of the graphical IEC 61131-3 programming language on apprentices’ test case design and evaluation, two different levels of competence, perceived [9] and real (evaluated) competence, are studied. This leads to the following requirements for the experiment:
Requirement 1 (R1): Apprentices’ self-perceived competence (self-efficacy) must be recorded on a measurable scale. R1 aims to ensure that learners have a positive attitude, meaning high self-efficacy, toward the subject matter to ensure that they use it in the future [9], [10].
Requirement 2 (R2): Real competence must be recorded on a measurable scale to ensure that learners understand the subject regardless of their self-perceptions and are able to apply it.
The constraints and requirements above provide the boundaries for the following concept and respective experimental design to investigate the impact of the two IEC 61131-3 programming languages, SFC and FBD, on the test case design and evaluation. The paper’s main contribution is an experiment concept for such comparison measurements and a recommendation on whether and which of the two IEC 61131-3 programming languages should be selected to teach test case design to apprentices. The remainder of the paper is structured as follows: Section 2 presents related work in teaching testing and code coverage assessment. The concept by which to investigate the impact of the IEC language is introduced in Section 3, followed by the experiment design and methods used. The experiment’s results are presented in Section 4 and discussed in Section 5. Section 6 concludes the paper and provides an outlook on future research.
2 Related work in test teaching
This section introduces related work in teaching testing, especially test case evaluation and design using code coverage assessment. As discussed in Section 1, start-up and maintenance personnel hardly know or use test case assessment in production automation or for IEC 61131-3 control code. Reasons for this are, among others, the difficulty of understanding code coverage, especially for novice testers [5], [8], and the lack of training for prospective maintenance personnel with these methods. “The main hindrance [to use systematic testing approaches from literature in the industry] is skills and incentive to do so for the developer, as well as a lack of well-educated testers” [11]. Test coverage assessment was rated as one of the most challenging testing topics in a study with 230 computer science students [12]. Start-up and maintenance personnel are required to know how to apply code coverage, understand how test cases are generated to maximize the amount of code covered, and understand the results of code coverage assessment tools, e.g. the meaning of 80 % coverage, which is especially challenging for novice testers [5], [8]. As these abilities are crucial for adequate testing, a potential influence of the IEC 61131-3 language used for teaching code-based test case design and test case evaluation using code coverage assessment is investigated in this paper.
The impact of programming languages or code features on code quality or comprehension of programming structures has been explored in various papers. For example, block-based visual languages such as Scratch were found to be superior in teaching programming structures to pupils over text-based languages [13]. Additionally, code readability was highlighted as one crucial factor for its comprehension [14]. A recent case study found that the IEC 61131-3 language “Ladder Diagram” [15] impacts the productivity of end-user programmers, being more error-prone and less scalable than text-based programming languages. Thus, it can be assumed that the programming language also influences test case design. The following subsections focus on related work in code coverage assessment and testing. Section 2.1 provides a brief overview of code coverage assessment in computer science and IEC 61131-3 control software, respectively. Section 2.2 introduces existing lectures and assistance approaches for code coverage in testing.
2.1 Code coverage assessment in IEC 61131-3 control software
Code coverage assessment is a white-box test strategy to assess the test adequacy using the control code of the system under test [3], [10]. The parts of the control code activated during test execution are tracked to detect software parts not covered by the available test cases. The analysis of the code coverage of existing test cases is used in computer science to assess the adequacy of these test cases [16], as a basis to design new test cases for non-tested software parts [17] or to select test cases based on their coverage of changed software parts [6]. Statement and branch coverage are the most common code coverage criteria in test adequacy assessment. Whereas statement coverage focuses on executing “every statement in the program at least once” [16], the mightier branch coverage additionally focuses on executing all (decision) branches within the program [5].
The control code of aPS is mainly realized in one of the five programming languages of the international standard IEC 61131-3 [18]. IEC 61131-3 defines two text-based languages (Structured Text [ST] and Instruction List [IL]) and three graphical languages (Function Block Diagram [FBD], Sequential Function Chart [SFC], and Ladder Diagram [LD]). Some approaches exist to apply code coverage criteria from computer science to the IEC 61131-3 languages. Ulewicz et al. [6] estimate code coverage for ST and SFC in aPS by inserting trackers into the respective control code. Hao et al. [19] generated test cases for structured text using a control flow graph of the software, which allowed them to apply branch coverage criteria from computer science. Bohlender et al. [4] generated test cases for ST using symbolic execution. To assess code coverage for the graphical programming language FBD, the researchers either transformed the FBD control code to C code [20], Java code (for IEC 61499 FBD [21]), or to timed automata [5] to apply computer science metrics. Jee et al. [6] introduced d-path-coverage as a criterion for code coverage assessment for FBDs without transforming them. With the d-path criterion, every data flow path within the FBD should be tested at least once to obtain full code coverage. This approach is similar to applying branch coverage to control flow charts in computer science. As the code structure of SFC is similar to control flow graphs, branch coverage can be applied similarly. Each control flow path within the SFC must be executed at least once to obtain full code coverage. This paper uses the d-path criterion for FBD and the control flow path assessment for SFC for better comparability in the following.
2.2 Code coverage teaching and assistance in testing
Systematic testing approaches are introduced in research to train or assist engineers in quality assurance. For example, testers are assisted in test adequacy assessment during start-up and maintenance with a visualization of the code coverage achieved by executed test cases. Aniche et al. [17] observed in a study with 84 professional software developers that they strongly rely on the source code and code coverage to design new test cases and manually check their test adequacy. They suggested (visually) connecting test cases to the code statements they cover to assist testers in test case selection for changed code statements. In a study with 30 professional software developers, Lawrence et al. [22] highlighted the code covered by test cases in C# code. However, this measure did not result in more test cases or faults detected. The developers tended to misjudge their level of test adequacy, meaning they focused more on achieving a high code coverage instead of creating test cases able to detect distinct fault types. On the contrary, Berner et al. [23] showed with an 8-person team that code coverage visualization increases the system’s robustness and is beneficial, especially for novices, and when testers are aware of possible misjudging. Rahmani et al. [24] used control flow graphs (CFGs) to visually represent the code and highlight covered code segments, thus achieving increased productivity of their 32 study participants. For IEC control code, experts in the automation domain rated the visual representation of the code segments covered in the graphical language FBD (Tool FBDTestMeasurer) as useful in achieving systematic and adequate testing [7].
To support engineers in the long term, testing and code coverage should be taught early during their education [17]. According to Bandura [9], a person’s ability and willingness to use learned skills does not only correspond to their real competence but also their self-perceived competence to succeed in a specific task (self-efficacy or perceived competence) [10]. Ribeiro et al. [25] investigated the self-efficacy concept in software engineering and recommended including it in future teaching strategies. In computer science, testing is included in the university curricula as a “key skill”, and teaching various testing techniques is proposed. For example, teaching programming and testing jointly is suggested to teach students code validation using code coverage early on alongside programming [26]. Code coverage visualization and tool support assist students in reaching high code coverage levels for their test cases [27]. Early test education or even test-driven development is further suggested to improve the programming style of students [28], leading to a more efficient testing phase and less effort for testers.
The aPS domain lacks concrete teaching concepts for code coverage assessment or test case generation based on code coverage, as introduced in the state of the art. In this paper, factors that influence education, as well as the testing strategies of vocational school apprentices, shall be investigated. As professional software developers strongly rely on source code and code coverage to design new test cases [17], the impact of the programming language, which is used as a basis for code coverage education, shall be analyzed.
3 Concept to investigate IEC 61131-3 programming language impact
This section presents the overall concept by which to investigate the impact of the graphical IEC 61131-3 programming language on the test case design and evaluation of mechatronic apprentices. The conceptual procedure (cf. Figure 1) is designed to investigate research questions RQ1 (Difference between SFC and FBD in testing) and RQ2 (Preference for SFC or FBD in testing) while considering the requirements and constraints defined in Section 1.

Concept to investigate IEC 61131-3 programming language impact.
A learned skill will only be used later if learners can apply it and feel capable of doing so (self-efficacy [9]). As apprentices (C1) learn testing and code coverage, their learning is assessed at two levels of competency: perceived competence (R1) and real competence (R2). Perceived competence hereby reflects the apprentices’ subjective assessment and self-confidence in the subject taught, while real competence reflects their results on hands-on tasks. To better convey the subject to practical-oriented apprentices it is taught on a mechatronic use case (C2). After assessing the learning success at the two levels of competency (R1, R2), the results are evaluated for each of these two levels of competency. The following findings (F) can be derived from the concept, which can later be used to evaluate whether and how the research questions can be answered:
(F1) The preference, if any, for FBD or SFC concerning the apprentices’ perceived competence.
(F2) The preference, if any, for FBD or SFC concerning the apprentices’ real competence.
Comparing the results for perceived and real competence, finding F3 can be derived as follows:
(F3) The alignment, if any, between the preferences in perceived (F1) and real (F2) competence.
Since the literature assumes a strong mutually dependent relationship between perceived and real competence [10], the preferences in F1 and F2 are expected to match. If there is a deviation, either the experiment may not be suitable, the sample size of apprentices may be too small, or other disturbing factors prevent a qualitative answer to the two research questions, RQ1 and RQ2. If the preferences in perceived and real competence align (F3), it is checked whether SFC and FBD are perceived as equivalent at both levels of competence or whether there is a general preference for one over the other for test case evaluation and design (F4). If SFC and FBD are equally suitable for test case design and evaluation according to perceived and real competence, research question RQ1 is answered with “No difference”. If either SFC or FBD is preferred in both, an impact of the graphical IEC 61131-3 programming language on apprentices’ test case design and evaluation can be assumed (RQ1), and a recommendation for one of the two languages, SFC or FBD, can be derived for future teaching (RQ2).
In the following subsections, the concept procedure parts (cf. Figure 1) are detailed for a concrete experiment execution. Section 3.1 explains the experiment setup and hypotheses for the learning success investigation, focusing on the two requirements. Hypotheses are formulated to evaluate the experiment and to answer the two research questions (RQs). Subsequently, Section 3.2 presents the implementation of the experiment, introducing the sample of apprentices who participated in the actual experiment and showing which results are expected for the individual implementation phases. The experiment is conducted on the mechatronic use case presented in Section 3.3.
3.1 Hypotheses to investigate apprentices’ learning success
Apprentices’ learning success is measured on two levels of competence: the apprentices’ perceived and real competence. Perceived competence [10] can be decomposed into several factors such as “comprehension” (1), “self-assessment of performance” (2), and “explanatory ability”, meaning the self-confidence to explain a given subject (3). The three factors of perceived competence are evaluated based on the fulfillment of the following hypotheses.
As Vogel-Heuser et al. showed [8], mechanical engineering students prefer to model the behavioral perspective of a system. SFC clearly depicts the procedure of the program and provides its functional view. On the contrary, FBD is a data-flow language that shows the logic combination of single elements on the hardware level, like Boolean algebra. Assuming that apprentices also prefer the behavioral perspective, the first hypothesis H1 is as follows:
Mechatronic apprentices rate SFC to be more comprehensible than FBD.
Apprentices’ comprehension (1) is hereby assessed in three areas:
Basic understanding of test case design based on code coverage assessment (H1.1: Mechatronic apprentices are basically able to derive test cases based on SFC and FBD code)
Ability to critically examine their test cases (H1.2: Mechatronic apprentices are able to critically reflect on the test cases they derived based on SFC and FBD code)
Classification of the two languages regarding their informativeness (H1.3: Mechatronic apprentices rate SFC program code of equal functionality as more informative than FBD program code.)
The evaluation of the basic understanding (1.1) furthermore validates the suitability of the two graphical IEC 61131-3 programming languages for code-based test case design. Based on the results of the comparison of the apprentices’ self-assessment of hypotheses H1.1 and H1.2, it can be determined whether there is a difference in the apprentices’ comprehension (1) depending on the language SFC or FBD (cf. RQ1).
After learning and applying code-based test case design, apprentices are supposed to self-assess their performance (2) as part of their perceived competence. As apprentices are assumed to comprehend SFC better than FBD (H1), they are expected to self-assess that they perform better using SFC. Performance means here that they can derive more non-equivalent test cases and achieve higher code coverage using SFC than FBD in a given time, resulting in hypothesis H2:
Mechatronic apprentices self-perceive their performance to be better using SFC instead of FBD code.
The last factor for perceived competence is the apprentices’ “explanatory ability” (3), which requires comprehension of the subject to be explained, but also the confidence and ability to adjust the explanation depending on the addressee and to adapt the complexity of the linguistic level respectively [10]. Apprentices are expected to feel more confident in one of the two languages and thus favor SFC or FBD when explaining the subject to different addressees, here a fellow apprentice and an experienced technician. This leads to the following hypothesis H3:
Mechatronic apprentices have a language preference when explaining their test case design.
To evaluate the real competence of the apprentices in test case design and evaluation using SFC and FBD, the results of the task they are supposed to work on during the experiment are analyzed. Due to the better comprehension expected (cf. H1) and the strong mutually dependent relationship between perceived and real competence assumed in literature [10], apprentices are expected to perform better with SFC than with FBD despite their prior knowledge of FBD, leading to the following hypothesis H4:
Mechatronic apprentices derive more non-equivalent, adequate test cases in a limited time using SFC instead of FBD code.
3.2 Experiment implementation
The experimental focus was on a preliminary, qualitative study of the hypotheses regarding the IEC 61131-3 language impact to enable the derivation of a teaching strategy for a future empirical study. For this preliminary, qualitative study, obtaining a homogeneous sample of apprentices was crucial. To ensure this, the experiment was conducted as a one-day workshop with a typical class size of 20 apprentices from a mechatronic vocational school. All 20 apprentices were male, between 16 and 20 years old, and about half had an Abitur (German high school diploma). All apprentices were from the same class in the 2nd year of apprenticeship, so an equal level of mechatronic education and knowledge can be assumed. They already learned and practiced the graphical IEC 61131-3 programming language FBD at school in a 2-week programming course. According to the apprentices’ teachers, none of the apprentices had prior knowledge of programming in general, SFC programming, modeling, or testing outside of vocational school.
To avoid bias due to the order in which test case design is taught (starting either with SFC or FBD), the group is divided into two equally sized subgroups, “FBD subgroup” and “SFC subgroup” before the experiment (cf. Figure 2). Two of the apprentices’ teachers, who trained them in the last two years of apprenticeship, divided them into equal subgroups regarding prior knowledge (e.g. type of their dual training company and whether they use FBD there), performance (e.g. grade), degree of education (e.g. Abitur), and motivation (subjective teacher assessment based on usual classroom collaboration and interest in software-related topics) in this order. As these factors could, of course, bias the self-efficacy results at the end of the experiment. The FBD subgroup is taught testing and code coverage with FBD first and then with SFC (cf. upper part of Figure 2). Conversely, the SFC subgroup starts with SFC and then learns to test with FBD (cf. lower part of Figure 2). Immediately before learning testing and code coverage in the previously unknown language (here: SFC), both subgroups are introduced to that language and its programming. All teaching units comprised a 20-minute PowerPoint lecture followed by a 70-minute exercise. Each exercise consisted of two tasks: First, the apprentices received test cases and the task: “Determine the code coverage of the test cases given”, followed by the second task: “Create all test cases required to achieve 100 % code coverage”. The exercise results were solved on paper, collected, and then discussed. In the “learn SFC programming” unit, the 70-minute exercise was a practical programming exercise on small laboratory plants normally used for IEC 61131-3 programming education. The small sample size of a typical vocational school class allows individual counseling to ensure that everyone understands the topic before moving on to the next phase of the experiment. The table at the bottom of Figure 2 depicts the expected knowledge per subgroup after each experiment phase. Assuming both subgroups have the same prior knowledge in FBD, SFC and testing using both languages, after the separate experiment phases, the subgroups are reunited for an assessment on code coverage and test case design in FBD and SFC and a final questionnaire. The questionnaire addresses the hypotheses introduced in Section 3.1. The final evaluation of results (cf. Figure 1, center box) is based on the tasks and questionnaires completed during the separate experiment phases, as well as the final assessment and the final questionnaires.

Execution of the experiment with knowledge differences between the two subgroups (FBD and SFC) after each experiment phase (cf. bottom table).
To answer hypotheses H1–H3, questionnaires are used because they are suitable methods for subjective assessments such as self-perceived competence. The apprentices’ self-assessed performance and comprehension (cf. Section 3.1) are assessed with paper questionnaires consisting of questions Q1–Q3 for both programming languages, SFC and FBD:
Q1: How confident do you feel that your test cases are sufficient to test the SFC/FBD code given?
Q2: How confident do you feel about creating at least one suitable test case using SFC/FBD?
Q3: How confident do you feel about finding all test cases required for 100 % code coverage using SFC/FBD?
The questions are answered on a 5-point Likert scale with response options ranging from 1 – “not at all confident” to 5 – “extremely confident”. Question Q2 tests the apprentices’ basic understanding of whether they are able to create basic test cases using SFC and FBD. Question Q3 aims at apprentices’ self-confidence in their test case design skills to achieve complete coverage based on the respective IEC 61131-3 programming language. The results of the questionnaire (self-assessed performance and comprehension) are compared with the real performance of the apprentices, which is determined by the number of adequate, non-equivalent test cases they created in SFC and FBD, respectively.
Preferences for either SFC or FBD in terms of comprehension and self-confidence to explain the test case design are assessed with a paper questionnaire consisting of questions Q4–Q6. For each question, the apprentices had to choose one of the three answer options (Equal, SFC, FBD) and briefly reason their choice.
Q4: Which language would you rate more informative for test case design and evaluation?
Q5: Which language would you choose to explain code-based test case design to a fellow apprentice?
Q6: Which language would you choose to explain code-based test case design to an experienced technician?
3.3 Mechatronic use case
As a mechatronic use case (cf. C2) for the apprentices’ code coverage training, a screw gripper is chosen, which is part of a pick-and-sorting unit for different screw types. The gripper has three fingers to grip screws (cf. Figure 3, left). Four different shapes of screw heads are to be picked: Triangle, hexagon, dodecagon, and circle (cf. Figure 3, center). For secure gripping, the screws must be gripped by their screw heads along the edges and not at the corners (cf. Figure 3, right). For the sake of simplicity, it is assumed that the screw gripper can only rotate 30° and must therefore rotate a certain number of 30° rotation steps depending on its initial position to be in the correct position to grip the respective screw. The screw gripper use case represents a typical mechatronic application that is easily understandable within the given time but is sufficiently complex that not all test cases are immediately obvious. The different screw types and possible initial rotation angles require the apprentices to consider multiple conditions and testing scenarios to achieve thorough testing and to develop comprehensive test cases.

Screw gripper example – (a) side view (left), (b) top view (middle), (c) rotation required for gripping (right).
The number of rotations required is determined by an IEC 61131-3 control code part, realized in FBD and SFC (cf. Figure 4). The screw types (triangle, hexagon, dodecagon, and circle) are defined with the numbers 0 to 3. Dodecagon and circle do not require the gripper to rotate as they can be gripped from any angle. The hexagon-shaped screw needs the gripper to be rotated at most once, whereas the triangle-shaped screw requires at most three rotations based on the gripper’s initial rotation angle. While it is intuitive to test each screw type, considering all possible initial rotation angles is not immediately evident while creating possible test cases solely based on the text description.

Excerpt of control code for gripper rotation in FBD (left) and SFC (right).
The code excerpts (cf. Figure 4) were designed so that the program paths and corresponding test cases are equally difficult to detect from a testing perspective. SFC appears to be more intuitive at first glance, owing to its structured and sequential nature. On the contrary, possible input parameters and paths based on the initial angle (0, 30, 60, 90) are more clearly visible in FBD, attributed to the use of the modulo operator. Additionally, the apprentices lack familiarity with the SFC language whereas they are acquainted with FBD and can thus handle more complex examples in FBD.
4 Results of the experiment
In this section, the results of the experiment are presented. Based on their grouping, the apprentices designed test cases for the mechatronic use case using either SFC or FBD. After learning testing and code coverage in their respective groups, all apprentices were able to design new test cases (cf. Figure 5, left) using the respective IEC languages. The SFC subgroup identified more non-equivalent, adequate test cases than the FBD subgroup. One apprentice from the FBD subgroup only created a non-adequate test case (non-suitable input), thus counting as “zero” in Figure 5 (left). However, both subgroups stated that the control code helped them to better understand the behavior of the mechatronic use case in more detail and to derive test cases more confidently than based on text-based requirements only. Regarding the self-confidence of the apprentices in their test case sufficiency (cf. Q1 in Section 3.2), the FBD subgroup was, on average, less confident than the SFC subgroup (cf. Figure 5, right). Most of the SFC subgroup apprentices were “neutral” or “confident”, whereas most FBD subgroup apprentices tend to be “not confident (at all)”. The apprentices in the SFC subgroup argued – without knowing code coverage in FBD – that using SFC for code coverage assessment is intuitive, leading to higher average confidence. The apprentices of the FBD subgroup did not make similar statements regarding code coverage using FBD.

Left: number of non-equivalent, adequate test cases using either FBD or SFC in the respective groups; Right: comparison of apprentices’ confidence in their own test cases’ sufficiency (N = 10 apprentices per subgroup).
As depicted in Figure 2, the two subgroups were joined for the final assignment and questionnaire (1-h phase), the results of which are now being presented. In the final assignment, the apprentices directly compared SFC control code and FBD control code to assess the code coverage of the test cases and create test cases. The apprentices were asked to find at least one additional test case for the uncovered code. They began to solve the task without hesitation even though they faced unfamiliar, complex code. They then rated their confidence in finding at least one new test case on a five-point Likert scale (cf. Figure 6, left), as well as their confidence in finding all the test cases needed to cover the entire control code based on the given programming language (cf. Figure 6, right). The answers “extremely confident” and “confident” are consolidated to “good”, whereas the answers “not confident” and “not confident at all” are consolidated to “low” in Figure 6 for clarity. For the SFC control code, most apprentices rated their confidence in finding at least one additional test case as “good”. The FBD subgroup’s rating is almost evenly split between “good” and “medium”, while almost one-third of the SFC subgroup’s rating is “good” (see group hatching within columns in Figure 6). For the FBD control code, apprentices’ ratings in both groups were almost evenly distributed and averaged “medium”.

Self-assessment of ability to find at least one additional test case for uncovered code (left) or all test cases to cover the whole control code (right) given SFC control code (SFC) or FBD control code (FBD).
Regarding the apprentices’ subjective confidence in finding all test cases necessary to cover the entire control code (cf. Figure 6, right), the apprentices’ rating for the SFC control code was mixed (average at “medium”) with a slight tendency of the SFC subgroup to “good” and a slight tendency of the FBD subgroup to “medium”. Regarding the FBD control code, both subgroups rated their ability to find all test cases similarly low. Apprentices cited a lack of routine in evaluating code coverage as the main reason for the mixed ratings, as the topic was new to them.
In the final questionnaire, the apprentices had to state their preferences for either of the two IEC programming languages (cf. Section 3.2, questions Q4–6). Regardless of the prior group allocation, most apprentices rated SFC as more informative for test case design and evaluation than FBD, reasoning their choice with SFC being more intuitive for test case design and evaluation (cf. Figure 7, left). Regarding the apprentices’ preference to explain code-based test case design to either a fellow apprentice (Q5) or an experienced technician (Q6), most apprentices preferred SFC instead of FBD (cf. Figure 7, center and right). Most apprentices who chose SFC argued that SFC control code is more transparent and easier to comprehend and thus explain than FBD, especially in deriving test cases. They described SFC as more intuitive regarding code coverage assessment than FBD due to its control-flow-graph-like structure. On the contrary, the few apprentices who chose FBD argue that they are more familiar with FBD. Thus, it is easier for them to explain the FBD control code, especially to an experienced technician.

Apprentices’ subjective rating of more informative IEC language (left) and their language preference to explain test case design to a fellow apprentice (middle) or to an experienced technician (right).
5 Discussion of the experiment’s results
The experiment was conducted successfully and received positive responses from both teachers and apprentices. They found the topic interesting and expressed their willingness to participate in similar workshops in the future. Their only negative comment was that they would have liked to have had more time to learn SFC. Additionally, the teachers indicated they would consider including SFC and testing in their future curriculum, reflecting the positive impact and relevance of the experiment. In the following, the fulfillment of the hypotheses H1–H4 stated in Section 3.1 is assessed based on the experiment results (cf. Section 4). The section concludes with a consideration of the threat to validity.
5.1 Hypotheses assessment
Mechatronic apprentices rate SFC to be more comprehensible than FBD.
Basic Understanding
Apprentices were, in general, able to derive test cases based on SFC and FBD code (H1.1, cf. Figure 5 left) and rated their confidence in finding at least one test case in both languages at least as medium (cf. Figure 6, left), thus showing their basic understanding as well as the general suitability of the two languages for code-based test case design. The comparison of apprentices’ average perception regarding SFC and FBD showed, in general, a slight trend towards SFC, as apprentices were, on average, “confident” to find at least one test case. In contrast, the perceived competence in FBD was only “medium”. Considering the previous subgroups separately, the FBD subgroup was less confident in finding at least one test case using SFC than the SFC subgroup. The missing confidence could be explained through the less routine and exercise they had with code coverage in SFC. On the other hand, no positive effect of more exercise could be observed for FBD. Both subgroups rated their confidence to find at least one test case within FBD as similarly poor, even though the apprentices of the FBD subgroup were expected to be more familiar with FBD code coverage assessment than the SFC subgroup. The FBD subgroup did not show a preference for FBD in the final questionnaire even though they first learned code coverage with FBD and thus exercised it more than the SFC subgroup.
Critical Evaluation
Apprentices were able to evaluate their test cases (H1.2, cf. Figure 6 right). In the early experiment phases, apprentices were already slightly more confident in the sufficiency of their test cases with SFC than with FBD (cf. Figure 5 right). Regarding apprentices’ confidence in finding all test cases required for 100 % code coverage using SFC/FBD (cf. Figure 6, right), apprentices were, on average, more confident with SFC than with FBD. The results show that apprentices were able to reflect their test cases critically in both languages (H1.2). However, a preference towards SFC is visible in the results (cf. Figure 6 right).
Classification of the two languages regarding their informativeness
As expected in H1.3, apprentices of both subgroups rated SFC as more informative than FBD (cf. Figure 7, left). Apprentices named the clearer code structure as the main reason for preferring SFC.
Since all three comprehension areas (1.1–1.3) were confirmed and showed a preference towards SFC instead of FBD control code for test case design and evaluation, hypothesis H1 is considered confirmed.
Mechatronic apprentices self-perceive their performance to be better using SFC instead of FBD code.
As assumed in hypothesis H2, the SFC subgroup using SFC self-perceived their performance (cf. Figure 5 right) better than the FBD subgroup using FBD. Despite being more experienced in FBD programming, the FBD subgroup was, on average, “not confident”, whereas the SFC subgroup was, on average, “neutral”. In conclusion, a preference for SFC in test case design was already visible, comparing the subgroups’ subjective ratings when they were still separated.
Mechatronic apprentices have a language preference when explaining their test case design.
The final questionnaire examined whether apprentices have a language preference to explain their test case design to (a) a fellow apprentice or (b) an experienced technician. The results of both questions show a clear preference for SFC, thus confirming hypothesis H3 that there is a preference and further highlighting the perceived competence as better using SFC.
In conclusion, the evaluation of the perceived competence shows that apprentices were more confident using SFC than FBD for their code-based test case design, thus deriving a subjective preference for SFC as finding F1. Next, the objective preference regarding the apprentices’ real competence is discussed.
Mechatronic apprentices derive more non-equivalent, adequate test cases in a limited time using SFC instead of FBD code.
Regarding their real competence (cf. Figure 5), the apprentices were able to derive more non-equivalent, adequate test cases using SFC than FBD (finding F2), thus confirming hypothesis H4 . Even though only a slight trend is visible, it matches the perceived competence of the apprentices (finding F3). Due to this match in finding F3, the research questions RQ1 and RQ2 are considered answerable. The results indicate a difference in test case design and evaluation depending on the graphical IEC 61131-3 programming language (SFC or FBD) used by the apprentices (RQ1). Due to the apprentices’ preferences and statements in favor of SFC and the better performance results, SFC is recommended for teaching testing to apprentices (RQ2).
Despite not having prior knowledge of SFC programming, the 1.5-h block on SFC introduction and programming was sufficient for the apprentices to understand the control code and to apply code-based test case design and evaluation. A new programming language does not hinder testers, such as maintenance personnel, from learning and applying new testing techniques. According to the apprentices’ reasons for preferring SFC, a clear code structure is crucial to assess the code coverage and to derive new test cases faster. The results (cf. Figures 6 and 7) also confirm that a clear code structure facilitates finding more or even all test cases. Due to this feedback, a stronger focus on code structure while programming is promising to improve the subsequent testing process. Thus, it is recommended to teach testing and code coverage while teaching programming so that developers become aware of the impact of their programming style on the later testing phase and personnel responsible for testing. Concluding, the experiment results show a trend toward SFC in apprentices’ perceived and real competence. Further, apprentices are more likely to test intuitively rather than systematically when applying code coverage.
5.2 Threat to validity
This section discusses the potential threat to the validity of the study. The paper proposes a method to assess language impact on test case design and evaluation, using perceived and real competence to derive the experiment design. The structure, initially independent of the application, (cf. Figure 1) enhances transferability to similar studies. However, generalizability is potentially compromised as the questionnaires used to measure the self-efficacy are task- and domain-specific, following Bandura’s [9] recommendation. Additionally, follow-up studies with larger sample sizes would improve the generalizability of the results obtained. Considering the construct validity, meaning the validity of the measures taken to answer the research questions, the tasks within each group phase were identical in both content and structure, except for IEC language, to enhance comparability. The uniformity of the example for both use cases introduces a potential bias, as different languages in IEC are more suitable for specific use cases (e.g. SFC for procedural programs and FBD for logic combination [8]). The code excerpts of the use case were chosen to be equally difficult from a testing perspective (cf. Section 3.3). Nevertheless, as the use case itself reflects a sequential decision process, a potential residual bias remains, due to which a follow-up investigation of the impact of the IEC 61131-3 language based on the code construct and problem scenario (e.g. static and/or sequential process) at a granular level is recommended. For results reliability, meaning the researcher’s impact on the data and the analysis, the final questionnaires and task results of all apprentices were blindly evaluated without knowing their respective groups. The link of an apprentices’ evaluation result to their group was recovered afterward using unique apprentice identifiers. The division of apprentices into two subgroups was conducted not by the researcher but by the apprentices’ teachers. This division was based on three objective criteria such as degree of education, along with the teachers’ subjective assessment of the apprentices’ motivation as fourth criterion (cf. Section 3.2). The subjective assessment served only as the final tiebreaker in group allocation in case of uniformity in previous categories. However, there remains the possibility of misjudgment by the teachers, potentially resulting in an uneven distribution of groups. Additionally, apprentices’ condition on that day may also impact the comparability of the groups. These threats are hardly evitable and have a greater impact on small sample sizes than on large sample sizes. In a future empirical study with a large sample size, the impact of such effects could be investigated. Regarding the accuracy of the conclusions (internal validity), it should be noted that this paper presents the results of a preliminary study, deliberately only reflecting qualitative tendencies, which will be used to derive a teaching strategy on testing to be investigated regarding its statistical significance in empirical follow-up studies.
6 Conclusion and outlook
The increasing number of software functionalities in automated production systems leads to a growing need for methods to ensure the quality of the corresponding software. However, especially after software changes on-site, testing is often neglected due to lack of time or experience. While most studies addressing inadequate testing and test coverage focus on tools, metrics, and procedures, this paper examines the training of mechatronic apprentices who resemble future on-site technicians in assessing test coverage. The paper compares the qualitative impact of the graphical IEC 61131-3 programming languages FBD and SPC on test case design and evaluation of mechatronic apprentices, considering their self-perceived and real competence. FBD is chosen because it is widely used in Europe and taught typically in mechatronics education. SFC resembles control flow graphs, which were helpful in assessing computer science test coverage, especially for novices.
The experiment design presented for investigating the IEC languages’ impact is conducted with a typical 2nd-year vocational school class of 20 mechatronic apprentices. The experiment results indicate that the language used to teach and apply test adequacy assessment impacts mechatronic apprentices’ test case design and evaluation. IEC 61131-3 programming languages, SFC and FBD, have proven suitable for teaching test adequacy assessment and test case design to apprentices. The comparison of SFC and FBD shows a preference in favor of SFC. Despite their prior knowledge of FBD programming, apprentices tend to be more confident using SFC and show a slightly better performance in test case design and evaluation than with FBD.
As future work, a teaching strategy on testing, focusing especially on test adequacy assessment and design but also programming with awareness for the later test phases, can be derived from the results (e.g. language preference) of this preliminary study. The teaching strategy derived can be evaluated with a larger group to investigate the empirical significance and degree of impact of the two programming languages. The follow-up empirical study can further compare different problem scenarios to investigate the impact of the level of static and sequential logic elements to obtain language preferences at a granular level for different types of code constructs. A guideline could be derived based on these results, recommending the IEC 61131-3 programming language for different problem scenarios, such as more complex or combined code constructs, from a testing perspective. Further, the experiment design can be used to compare other IEC 61131-3 programming languages. In particular, comparing a graphical programming language (e.g. SFC) with a text-based programming language (e.g. Structured Text) could yield promising results and show possible synergies with test adequacy assessment from computer science.
About the authors

Kathrin Land received an M.Sc. in Electrical Engineering from the University of Stuttgart in 2017. She is pursuing a Ph.D. at the Institute of Automation and Information Systems at TUM. Her main research interests include model-based testing of automated production systems, test case management, and test education.

Univ.-Prof. Dr.-Ing. Birgit Vogel-Heuser received a Diploma degree in Electrical Engineering and a Ph.D. in Mechanical Engineering from RWTH Aachen. Since 2009, she has been full professor and director of the Insititute of Automation and Information Systems at the Technical University of Munich (TUM). Her current research focuses on systems and software engineering. She is a member of the acatech (German National Academy of Science and Engineering), editor of IEEE T-ASE, and IEEE Fellow and member of the science board of MIRMI at TUM.
-
Research ethics: Not applicable.
-
Author contributions: The authors have accepted responsibility for the entire content of this manuscript and approved its submission.
-
Competing interests: The authors state no conflict of interest.
-
Research funding: None declared.
-
Data availability: The raw data can be obtained on request from the corresponding author.
References
[1] V. Vyatkin, “Software engineering in industrial automation: state-of-the-art review,” IEEE Trans. Ind. Inf., vol. 9, no. 3, pp. 1234–1249, 2013. https://doi.org/10.1109/tii.2013.2258165.Search in Google Scholar
[2] B. Vogel-Heuser, A. Fay, I. Schaefer, and M. Tichy, “Evolution of software in automated production systems: challenges and research directions,” J. Syst. Software, vol. 110, pp. 54–84, 2015, https://doi.org/10.1016/j.jss.2015.08.026.Search in Google Scholar
[3] V. Garousi, M. Felderer, C. M. Karapıçak, and U. Yılmaz, “Testing embedded software: a survey of the literature,” Inf. Software Technol., vol. 104, pp. 14–45, 2018, https://doi.org/10.1016/j.infsof.2018.06.016.Search in Google Scholar
[4] D. Bohlender, H. Simon, N. Friedrich, S. Kowalewski, and S. Hauck-Stattelmann, “Concolic test generation for PLC programs using coverage metrics,” in IFAC WODES, 2016, pp. 432–437.10.1109/WODES.2016.7497884Search in Google Scholar
[5] E. Enoiu, A. Čaušević, T. Ostrand, E. Weyuker, D. Sundmark, and P. Pettersson, “Automated test generation using model checking: an industrial evaluation,” Int. J. Software Tool. Technol. Tran., vol. 18, pp. 335–353, 2016, https://doi.org/10.1007/s10009-014-0355-9.Search in Google Scholar
[6] S. Ulewicz and B. Vogel-Heuser, “Increasing system test coverage in production automation systems,” Control Eng. Pract., vol. 73, no. 1, pp. 171–185, 2018. https://doi.org/10.1016/j.conengprac.2018.01.010.Search in Google Scholar
[7] E. Jee, S. Kim, S. Cha, and I. Lee, “Automated test coverage measurement for reactor protection system software implemented in function block diagram,” Lect. Notes Comput. Sci., vol. 6351, pp. 223–236, 2010.10.1007/978-3-642-15651-9_17Search in Google Scholar
[8] B. Vogel-Heuser, M. Obermeier, S. Braun, K. Sommer, F. Jobst, and K. Schweizer, “Evaluation of a UML-based versus an IEC 61131-3-based software engineering approach for teaching PLC programming,” IEEE Trans. Educ., vol. 56, no. 3, pp. 329–335, 2012. https://doi.org/10.1109/te.2012.2226035.Search in Google Scholar
[9] A. Bandura, “Guide for constructing self-efficacy scales,” in Self-Efficacy Beliefs of Adolescents, F. M. Pajares and T. Urdan, Eds., Greenwich, Information Age Publishing, 2006, pp. 307–337.Search in Google Scholar
[10] L. Baartman and L. Ruijs, “Comparing students’ perceived and actual competence in higher vocational education,” Assess Eval. High Educ., vol. 36, no. 4, pp. 385–398, 2011. https://doi.org/10.1080/02602938.2011.553274.Search in Google Scholar
[11] S. Eldh, “On technical debt in software testing – observations from industry,” Lect. Notes Comput. Sci., vol. 13702, pp. 301–323, 2022.10.1007/978-3-031-19756-7_17Search in Google Scholar
[12] M. Aniche, F. Hermans, and A. van Deursen, “Pragmatic software testing education,” in ACM SIGCSE’19, 2019, pp. 414–420.10.1145/3287324.3287461Search in Google Scholar
[13] M. Mladenović, S. Mladenovic, and Ž. Žanko, “Impact of used programming language for K-12 students’ understanding of the loop concept,” Int. J. Technol. Enhanc. Learn., vol. 12, no. 1, pp. 79–98, 2019. https://doi.org/10.1504/ijtel.2020.103817.Search in Google Scholar
[14] Y. Tashtoush, Z. Odat, I. Alsmadi, and M. Yatim, “Impact of programming features on code readability,” Int. J. Software Eng. Appl., vol. 7, no. 6, pp. 441–458, 2013. https://doi.org/10.14257/ijseia.2013.7.6.38.Search in Google Scholar
[15] F. Fronchetti, et al.., “Language impact on productivity for industrial end users: a case study from programmable logic controllers,” J. Comput. Lang., vol. 69, pp. 2590–1184, 2022, https://doi.org/10.1016/j.cola.2021.101087.Search in Google Scholar
[16] H. Zhu, P. Hall, and J. May, “Software unit test coverage and adequacy,” ACM Comput. Surv., vol. 29, no. 4, pp. 366–427, 1997. https://doi.org/10.1145/267580.267590.Search in Google Scholar
[17] M. Aniche, C. Treude, and A. Zaidman, “How developers engineer test cases: an observational study,” IEEE Trans. Software Eng., vol. 48, no. 12, p. 1, 2021. https://doi.org/10.1109/tse.2021.3129889.Search in Google Scholar
[18] IEC, “IEC 61131-3 programmable controllers – part 3: programming languages,” in IEC Std, 2013.Search in Google Scholar
[19] L. Hao, J. Shi, T. Su, and Y. Huang, “Automated test generation for IEC 61131-3 ST programs via dynamic symbolic execution,” in 2019 International Symposium on Theoretical Aspects of Software Engineering (TASE), 2019, pp. 200–207. https://doi.org/10.1109/TASE.2019.00004.10.1109/TASE.2019.00004Search in Google Scholar
[20] K. Doganay, M. Bohlin, and O. Sellin, “Search-based testing of embedded systems implemented in IEC 61131-3: an industrial case study,” in IEEE ICST, 2013, pp. 425–432.10.1109/ICSTW.2013.78Search in Google Scholar
[21] I. Buzhinsky, V. Ulyantsev, J. Veijalainen, and V. Vyatkin, “Evolutionary approach to coverage testing of IEC 61499 function block applications,” in IEEE INDIN, 2015, pp. 1213–1218.10.1109/INDIN.2015.7281908Search in Google Scholar
[22] J. Lawrence, S. Clarke, M. Burnett, and G. Rothermel, “How well do professional developers test with code coverage visualizations? An empirical study,” in IEEE VL/HCC’05, 2005, pp. 53–60.10.1109/VLHCC.2005.44Search in Google Scholar
[23] S. Berner, R. Weber, and R. K. Keller, “Enhancing software testing by judicious use of code coverage information,” in IEEE ICSE, 2007, pp. 612–620.10.1109/ICSE.2007.34Search in Google Scholar
[24] A. Rahmani, J. L. Min, and A. Maspupah, “An evaluation of code coverage adequacy in automatic testing using control flow graph visualization,” in IEEE ISCAIE, 2020, pp. 239–244.10.1109/ISCAIE47305.2020.9108838Search in Google Scholar
[25] D. Ribeiro, R. Lima, C. Franca, A. Souza, I. Silva, and G. Pinto, “Understanding self-efficacy in software engineering industry: an interview study,” in ACM EASE, 2023.10.1145/3593434.3593467Search in Google Scholar
[26] J. Carver and N. Kraft, “Evaluating the testing ability of senior-level computer science students,” in IEEE CSEE&T, 2011, pp. 169–178.10.1109/CSEET.2011.5876084Search in Google Scholar
[27] S. Edwards and Z. Shams, “Comparing test quality measures for assessing student-written tests,” in ACM ICSE, 2014, pp. 354–363.10.1145/2591062.2591164Search in Google Scholar
[28] W. Sheikh, “Teaching C++ programming using automated unit testing and test-driven development—design and efficacy study,” Comput. Appl. Eng. Educ., vol. 30, no. 3, pp. 821–851, 2022. https://doi.org/10.1002/cae.22488.Search in Google Scholar
© 2024 the author(s), published by De Gruyter, Berlin/Boston
This work is licensed under the Creative Commons Attribution 4.0 International License.
Articles in the same Issue
- Frontmatter
- Editorial
- Agentenbasierte Automationsarchitekturen für rekonfigurierbare Produktions- und Logistiksysteme
- Methoden
- Ein toolgestützter Ansatz für die benutzerfreundliche Definition von Funktionsbaustein-Einschränkungen
- Anwendungen
- Investigating the graphical IEC 61131-3 language impact on test case design and evaluation of mechatronic apprentices
- Agentenbasierte Verhandlung für kooperative Transporte in der Flugzeugmontage
- Strukturierter Ansatz für die automatisierte Erstellung von Enterprise Architecture Modellen
- Erstellung und Integration von Verwaltungsschalen aus Heterogenen Datenquellen
- Agentenbasiertes Redesign und Neuinterpretation von OPC UA Designstrategien zur Flexiblen Fähigkeitsbasierten Produktion
Articles in the same Issue
- Frontmatter
- Editorial
- Agentenbasierte Automationsarchitekturen für rekonfigurierbare Produktions- und Logistiksysteme
- Methoden
- Ein toolgestützter Ansatz für die benutzerfreundliche Definition von Funktionsbaustein-Einschränkungen
- Anwendungen
- Investigating the graphical IEC 61131-3 language impact on test case design and evaluation of mechatronic apprentices
- Agentenbasierte Verhandlung für kooperative Transporte in der Flugzeugmontage
- Strukturierter Ansatz für die automatisierte Erstellung von Enterprise Architecture Modellen
- Erstellung und Integration von Verwaltungsschalen aus Heterogenen Datenquellen
- Agentenbasiertes Redesign und Neuinterpretation von OPC UA Designstrategien zur Flexiblen Fähigkeitsbasierten Produktion