Display Clutter in Advanced Head-up Display


Kim, Prinzel, Kaber, Alexander, Stelzer, Kaufmann &Veil conducted a research in 2011 to assess the influence of head-up displays (HUDs) configuration on perceptions of display clutter, workload and flight performance. This article summarises and reanalyses the results to provide more useable findings.

Perception of Multidimensional Measure of Display Clutter

The present study achieved the objectives of developing a new multidimensional measure of display clutter and  investigating the relationships among perceived and objective measures of clutter, workload, and flight performance. 

Table 1: Interaction among Measurements
Measurements Pilot Flight Experience Workload Level Display Configuration Flight Performance Flight Segment Overall Clutter Ratings
Pilot Flight Experience Marginally Significant Significant Not Significant Not Significant Significant
Workload Level Marginally Significant Not Mentioned Not Mentioned Significant
Display Configuration Significant Marginally Significant Significant
Flight Performance Significant Not Mentioned
Flight Segment Not Mentioned
Overall Clutter Ratings

Table 1 shows the overall significant interactions among pilot flight experience, workload level, HUD configuration, flight performance, flight segment and pilot perceptions of overall display clutter ratings. These results indicated that clutter is an actual quality of displays that may lead to human factor problems.

Table 2: Analysis Results for Hypothesis
Descriptives p-values Interpretation Relevant Hypothesis
Calculated clutter scores & Overall perceived clutter ratings R=0.77, P<0.0001 Highly Significant H1
Workload ratings by NASA-TLX scores & Pilot experience F(2,45) =2.929, P =0.064 Marginally Significant H2
Workload ratings by NASA-TLX scores & Display configuration F(2,45)=7.911, P =0.001 Significant H3 & 4
Pilot experience & Workload level F(2,45) =3.023, P =0.059 Marginally Significant H5
Basic visual display properties & Calculated clutter scores R²=0.33, P<0.0001 (low workload); R²=0.18, P<0.0001 (high workload) Significant H6
Calculated clutter scores & Performance measures (LOC control and G/S control) R²=0.037, P=0.018 (vertical), and R²=0.033, P=0.005 (horizontal deviation) RMSEs Significant H7
The log-transformed RMSE responses & Workload ratings by NASA-TLX scores R²=0.024, P=0.024 (vertical), and R²=0.01, P=0.068 (horizontal deviation) Significant H7

The study group predicted seven Hypothesis for the research. Table 2 shows that except for H2 (even though the researchers consider it as Marginally Significant, the p-value is actually shows not significant), most of the results are proved as significant and support the hypothesis.

From table 2, calculated clutter scores were found to be highly correlated with overall perceived clutter ratings. This supported hypothesis (H1) that the new multidimensional measure of clutter would have internal consistency. In agreement with H5, table 2 shows that high experience pilots were less sensitive to the workload manipulation according to NASA-TLX scores. To address H6, multiple linear regression models of clutter scores based on HUD visual properties were developed. Software-based analysis of HUD images yielded visual property results that proved to be predictive of clutter scores. Table 2 proves that pilot perceptions of clutter in new HUD designs can be projected based, in part, on low-level display characteristics. Finally, table 2 indicates that, as expected by H7, both normalized clutter and NASA-TLX scores were significant predictors of pilot performance (vertical and horizontal deviation measures) in the various segments of the landing approach.  

Table 3: Clutter ratings by HUD configuration and pilot experience
HUD configuration Low Clutter Medium Clutter High Clutter
Pilot Experience mean interpretation mean interpretation mean interpretation
Low Experience 51 High 41 Low 50 Low
Medium Experience 36 Low 53 Medium 50 Low
High Experience 44 Medium 57 High 65 High

Contrary to H2, table 3 indicates that high experience pilots were more sensitive to display clutter and were more accurate and consistent in judging the occurrence of clutter (imagery obscuring or confusing other information). This also suggests that flight experience may support pilots in extracting relevant information from displays and the ability to judge when information is extraneous (i.e., clutter).

Table 4: NASA-TLX scores of workload ratings by HUD configuration and pilot experience
Pilot Experience Low Experience Medium Experience High Experience
HUD configuration mean interpretation mean interpretation mean interpretation
Low Clutter 53 Medium 61 High 62 Medium
Medium Clutter 48 Low 52 Low 50 Low
High Clutter 56 High 60 Medium 66 High

Table 4 shows that, negative effects of low and high clutter displays were found across workload and performance measures, which indicated some optimal amount of HUD information may exist in terms of information overload and support for flight path control. This was predicted by H3 and H4.


Research approach

This was an exploratory study into the effects of clutter on flight experience as perceived by commercial airline pilots.


  • 18 current commercial airline pilots with no prior HUD experience.
  • The sample comprised following demographic: male pilots (n=16) and female pilots (n=2), with age from 23 to 51 years old (mean= 40.4 yrs) and total flight hours from 1500 to 20,900 hours (mean =8947.8 h).


Criterion (dependent) variable

  • Pilot rankings of the various dimensions of clutter for describing HUDs were collected after pilot training on the IFD and flight scenario.
  • Subjective ratings of overall perceived display clutter on a scale from 51 “low” to 520 “high” and ratings on the underlying dimensions of clutter were collected at the close of each flight segment.
  • Clutter scores were calculated by rank-weighted sums of ratings across the six clutter sub-scales (redundancy, colorfulness, salience, dynamics, variability and density).
  • Basic visual properties of HUDs (contrast, occlusion, display density and luminance) were calculated to predict the calculated clutter scores resulting from the multidimensional subjective measure.
  • Pilot ratings of workload were recorded by using NASA-TLX.
  • Pilot performances were recorded in each segment.

Predictor (independent) variable

Between-subject variables

  • Three experience groups (‘Low’, ‘Medium’ and ‘High’) were formed based on pilot total flight experience.
  • Three HUD configuration sets (‘high clutter’, ‘medium clutter’, and ‘low clutter’) were presented. Three target displays were selected to represent unique HUD feature sets within each group for a total of nine test displays.

Within-subject variables

  • Two levels of flight workload (‘High workload’ under crosswind condition and ‘Low workload’ under no wind condition) were used.
  • Three legs (phases) of flight were separated.


Data were collected from an experiment which including a total of 108 trials across all pilots and 324 observations on perceived workload, ratings of the dimensions of clutter, and overall perceived clutter.

Data analysis

1. For H1, Correlation analyses (Pearson coefficients) were conducted to identify whether pilot ratings on the underlying dimensions of clutter were consistent with overall perceived clutter ratings. 
2. A series of repeated measures ANOVAs were conducted to assess the effects of pilot experience level (low, medium, or high), HUD configuration (low, medium, or high clutter), flight task workload level (low, high), and flight segment ( 1 – 3 ) on the overall clutter ratings, NASATLX scores, and a subset of the flight performance measures.
3. The post hoc tests were conducted to assess the HUD configuration effect.
4. Model parameters were revealed by regression analysis (t-tests) to all be significant predictors (P<0.05) of clutter score except for occlusion under the high flight workload condition.
5. Regression models (multiple linear regression model, step-wise regression model and best-fit regression model)) were developed to predict the flight performance measures in terms of display clutter and TLX scores. The scores were converted to standardized z-scores. For each regression model, graphical analysis and diagnostic tests were conducted on the residuals to assess the normality assumption.  

Generalization Potential

  • For evaluation of a range of aviation system display concepts beyond SVS and EVS HUDs.
  • For other researchers work on pilot performance measures in other flight simulation studies.
  • For airlines as a basis for new avionics display certification and systems acquisitions.
  • For evaluating air traffic management support display technologies for the occurrence of clutter and to assess the reliability of the measurement outcomes.


  1. Sang-Hwan Kim, Lawrence J. Prinzel, David B. Kaber, Amy L. Alexander, Emily M. Stelzer, Karl Kaufmann, and Theo Veil (2011). // Multidimensional Measure of Display Clutter and Pilot Performance for Advanced Head-up Display.// Aviat Space Environ Med 2011, 82:1–10.
  2. Sang-Hwan Kim, Karl Kaufmann, and Simon Hsiang (2008).// Perceived Clutter in Advanced Cockpit Displays: Measurement and Modeling with Experienced Pilots.// Aviat Space Environ Med 2008, 79:1–12.

Footnote1 : P-Value of 0.05 is referred to in this research as a level of significance.


Captain JayCaptain Jay

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License