OUP user menu

The use of anthropometric proportion indices in the measurement of facial attractiveness

Raymond Edler, Pragati Agarwal, David Wertheim, Darrel Greenhill
DOI: http://dx.doi.org/10.1093/ejo/cji098 274-281 First published online: 13 January 2006

Abstract

This study used anthropometric data in the form of Farkas' proportion indices in order to quantify facial attractiveness, and to relate measured change through surgery, to clinical judgement. Standardized photographs of 15 orthognathic patients were used in album form and rated by 10 experienced clinicians: album 1 for facial attractiveness (before surgery) and album 2 for improvement in facial attractiveness (before and after surgery). Twenty-five proportion indices were selected and linear measurements recorded from the pre- and post-surgical photographs. The corresponding change in indices and in clinicians' scores were compared.

The clinicians' assessment of the degree of improvement in facial appearance achieved through surgery, related closely to the scores produced by the change in proportion indices (r = 0.698, P = 0.004). Clinical assessment demonstrated a clear inverse relationship between initial attractiveness rating and the degree of improvement achieved through orthognathic surgery (r = –0.781, P = 0.001). The results showed good repeatability in terms of clinical assessment, photography and digitization. The method would appear to have potential for further development, possibly into a ‘facial attractiveness index’ for the objectives of quantification of improvement achieved through treatment.

Introduction

The ability to quantify facial appearance and any improvement achieved through treatment, would clearly help in the objective assessment of treatment quality. It would also be useful in monitoring growth changes. Currently, attempts at the objective assessment of facial appearance involve panels, in which rating or ranking by a group of professional or lay individuals is undertaken. The validity of this method depends on the notion that the panel will reach a reasonable level of agreement; fortunately, there is convincing evidence that high levels of agreement amongst panellists are usually obtained (Thakera and Iwawaki,1979; Bernstein et al., 1982; Maret and Harling, 1985; Patzer, 1985, 1994). Good levels of agreement have also been reported in studies of attractiveness amongst patients with a cleft lip and palate deformity (Tobiasen and Hiebert, 1988). However, whether clinical judgement is assessed using a Likert scale (Okkerse et al., 2001; Ritter et al., 2002) or a visual analogue scale (VAS; Russell et al., 2001) or whether ranking, rather than rating methods are used (Phillips et al., 1992; Roberts-Harry et al., 1992), the organizational difficulties involved in assembling panels make it an impractical method for everyday clinical use. Additionally, the large number of variables involved in establishing panels have been seen to influence the assessments, e.g. the gender of either the raters, or those being rated (Richardson, 1970; Tobiasen, 1987; Okkerse et al., 2001) or the age of the raters and/or those rated has been found to provide different results in some studies (Richardson, 1970; Tobiasen, 1987; Tobiasen and Hiebert, 1988). Whether the assessment panel should consist of professionals or members of the public can be an issue, as some studies have found differences in levels of judgement (Cochrane et al., 1999; Lines et al., 1978). Other investigations involving dental appearance have shown that the socio-economic status of the raters can also have an effect (Howells and Shaw, 1985). These difficulties, together with the problem of assessing the consistency of attractiveness ratings at one sitting, make it desirable to find a less cumbersome and perhaps more valid, method of assessment.

It has been shown that the common denominator shared by those individuals consistently judged to be the most attractive, is the fact that their facial proportions tend to be relatively near to the mean of the population, within their racial group (Symons, 1979; Langlois and Roggman, 1990; Strzalko and Kaszycka, 1991; Grammer and Thornhill, 1994). Whilst ‘averageness’ is not the sole factor involved in attractiveness (Alley and Cunningham, 1991; Cunningham et al., 1990; Perrett et al., 1994), it would appear to be one of the most important factors. Accordingly, if averageness is important in facial aesthetics, then average facial proportions could provide the basis for a quantitive assessment for aesthetics. A great deal of highly relevant data representing mean proportions is available through the extensive work of Farkas (1994) and Farkas and Munro (1987). This evidence derives from the Toronto anthropometric growth study, involving manual measurements, carried out during the years 1967 to 1984 and involving over 2500 individuals (Farkas and Munro, 1987). The resultant data has been used to provide a ‘battery’ of individual facial ratios, presented as proportion indices (Farkas and Munro, 1987), simply involving two linear measurements, the smaller expressed as a percentage of the larger. One hundred and sixty seven such indices, involving cranial and facial measurements are incorporated in this work. Manual anthropometry requires a considerable level of proficiency based on experience (Kolar and Salter, 1997) and is rather time-consuming, so alternative methods have been sought, primarily involving the use of photography (Vegter et al., 1997; Becker and Svensson, 1998; Berger et al., 1999; Vegter and Hage, 2001). Such photographic techniques have been shown to be entirely valid, providing the points chosen are readily identifiable (Farkas et al., 1980) and the correct photographic technique is used. Vegter et al. (1997) demonstrated the way in which changes in proportion indices effected by treatment can be used to record aesthetic improvement. It would seem that the potential exists for developing this principle further.

Accordingly, the aim of this study was to identify whether there might be a use for Farkas' proportion indices as an indicator of facial appearance:

  1. By identifying pre- and post-treatment changes in a battery of proportion indices to quantify the overall change produced by surgical-orthodontic treatment, in a selected group of patients.

  2. To relate the magnitude of change (identified by the pre- and post-treatment proportion indices scores) to clinical judgement.

  3. To identify a possible facial index for ‘quantifying’ changes in attractiveness, based on the proportion indices data.

Subjects and methods

Facial photographs of orthognathic patients formerly treated in the maxillofacial unit at Queen Mary's Hospital, Roehampton, London, UK were identified. Standardized full face and profile photographic prints of 15 Caucasian patients were used (9 male, 6 female; age range 20–44 years, median 22 years), the facial images having been carefully selected to present a range of facial imbalance. The photographs had all been taken by the same medical photographer, using a standardized approach (Bengel, 1985; Claman et al., 1990; Edler et al., 2001).

Clinical assessment

Two albums were constructed. Album 1 consisted of the 15 patients' pre-treatment photographs, i.e. full face and profile (4 × 6 inch). The photographs were placed in a random sequence and a VAS of 100 mm was used for the assessment of the attractiveness of the individual. Ten experienced clinicians, comprising five orthodontists (3 male, 2 female) and five maxillofacial surgeons (4 male, 1 female) were asked to rate the series of photographs for facial attractiveness, using the VAS which extended from ‘very unattractive’ (low) to ‘very attractive’ (high). The clinicians were deliberately given ample time to study the images. After a two-week period, the same clinicians were asked to repeat their ratings. Album 2 was presented to the same panellists several months later. Each album consisted of the same 15 patients images but this time, post-treatment as well as pre-treatment images were incorporated and an alternative VAS extending from ‘extreme worsening’ (low) to ‘extreme improvement’ (high) was used to enable the clinicians to assess the degree of improvement in attractiveness, following orthognathic surgery. Again, the process was repeated two weeks later. All the pre- and post-treatment photographs in album 2 were deliberately presented in black and white format, to prevent bias through artificial alteration in colour.

In order to investigate the relationship between pre-treatment attractiveness as rated in album 1 and the subsequent degree of improvement as rated in album 2, the scores for each patient were ranked and the median rankings of all the patients in albums 1 and 2 correlated. Ranked data was investigated instead of actual VAS scores, since it has been shown that clinicians may vary in their use of the full VAS range (Phillips et al., 1992). This allowed relative changes, rather than absolute values, to be investigated.

Proportion indices

Twenty-five of the proportion indices published by Farkas and Munro (1987) were used. The basis for selection was that the landmarks should be reliably identified on photographs (Farkas et al., 1980) and the resultant indices be potentially changed by the effects of orthognathic surgery (Table 1).

View this table:
Table 1

Proportion indices used in the study (Farkas and Munro, 1987).

Index numberName of indexDescription
1Upper face-face height indexNasion–stomion
Nasion–gnathion
2Lower face-face height indexSubnasale–gnathion
Nasion–gnathion
3Mandibulo-face height indexStomion–gnathion
Nasion–gnathion
4Mandibulo-upper face height indexStomion–gnathion
Nasion–stomion
5Mandibulo-lower face height indexStomion–gnathion
Subnasale–gnathion
6Upper-middle third face depth indexTragion (l)–nasion
Tragion (l)–subnasale
7Middle-lower third face depth indexTragion (l)–subnasale
Tragion (l)–gnathion
8Nasal indexAlare (r)–alare(l)
Nasion–subnasale
9Upper lip height-mouth width indexSubnasale–stomion
Chelion (r)–chelion (l)
10Cutaneous-total upper lip height indexSubnasale–labiale superius
Subnasale–stomion
11Vermilion-total upper lip height indexLabiale superius–stomion
Subnasale–stomion
12Vermilion-cutaneous upper lip height indexLabiale superius–stomion
Subnasale–labiale superius
13Upper lip vertical contour indexSubnasale–stomion
Subnasale–labiale superius plus
Labiale superius–stomion
14Vermilion height indexLabiale superius–stomion
Stomion–labiale inferius
15Chin-mandible height indexSublabiale–gnathion
Stomion–gnathion
16Upper face height-biocular width indexNasion–stomion
Exocanthion (r)–exocanthion (l)
17Intercanthal-nasal width indexEndocanthion (r)–endocanthion(l)
Alare (r)–Alare (l)
18Nose-face height indexNasion–subnasale
Nasion–gnathion
19Nose-mouth width indexAlare (r)–alare (l)
Chelion (r)–chelion (l)
20Upper lip-upper face height indexSubnasale–stomion
Nasion–stomion
21Upper lip-mandible height indexSubnasale–stomion
Stomion–gnathion
22Upper lip-nose height indexSubnasale–stomion
Nasion–subnasale
23Lower lip-face height indexStomion–sublabiale
Subnasale–gnathion
24Lower lip-mandible height indexStomion–sublabiale
Stomion–gnathion
25Lower lip-chin height indexStomion–sublabiale
Sublabiale–gnathion

The images were imported as bitmap files to an on-screen digitising program (IPTool; Greenhill et al., 2000) and those measurements relevant to the proportion indices were digitized (Table 1). The digitizations were performed by one operator (PA), taking approximately 15 minutes per pair of patient's photographs. The program has a resolution of one pixel, for linear measurements.

The clinicians' assessment rankings were compared with the measurement of change in Farkas' proportion index scores. For each patient, the changes in value for each of the 25 proportion indices were calculated as shown in Table 2. These rankings were correlated with the ranked clinicians' assessments of treatment change in three ways.

  1. For each patient, the overall change of each of the 25 indices (relative to the mean) was totalled, then the 15 patients' total scores were ranked and compared with their corresponding ranked clinical assessments.

  2. For each patient, the change in index was first divided by the mean value of that index and then all 25 values were totalled and ranked and then compared with the clinical assessments. This allowed the relationship between the mean value and the expected improvement to be assessed, i.e. to ascertain whether the size of the index itself might relate to the size of the change.

  3. For each patient, the change in each index was divided by the standard deviation of that index and then all 25 values were totalled and ranked, prior to comparison with the clinicians' rankings. This provided a way of reducing the influence of those particular indices with a wide standard deviation.

View this table:
Table 2

An example of the proportion indices calculations for one of the 15 patients.

ValueDifference
Index number.Pre-surgeryPost-surgeryMatched Farkas meanMatched Farkas SDPre-surgery/Farkas meanPost-surgery/Farkas meanchange; nearer (+) further (−) to Farkas mean
160.8862.936120.121.93−1.81
255.5954.8959.22.73.614.31−0.7
34038.5141.22.31.22.69−1.49
465.761.1967.75.326.51−4.51
571.9670.1669.62.72.360.561.8
694.9195.6898.32.43.392.620.77
778.1384.7990.72.912.575.916.66
854.7258.7965.86.811.087.014.07
94448.841.15.42.97.7−4.8
1072.7367.2166.46.86.330.815.52
1125.4527.8741.17.515.6513.232.42
123541.4664.813.629.823.346.46
13101.85105.1792.26.29.6512.97−3.32
1445.1648.5787.818.542.6439.233.41
1552.2156.7261.95.89.695.184.51
1696.3898.2782.97.913.4815.37−1.89
1786.2177.3295.12.58.8917.78−8.89
1844.7146.2643.74.91.012.56−1.55
1969.677.665.34.64.312.3−8
2027.0526.0329.532.453.47−1.02
2141.1842.5443.84.12.621.261.36
2236.8435.441.34.64.465.9−1.44
2334.3930.3726.837.593.574.02
2447.7943.2838.74.19.094.584.51
2591.5576.3263.713.927.8512.6215.23
Total21.32
  • SD, standard deviation.

Repeatability

Clinicians' ratings: To assess inter-examiner agreement for both albums, the clinicians individually rated each image on two separate occasions and the mean rating was then ranked with those of the other clinicians. For album 1, each examiner's median scores were ranked from 1–15; 1 being the most attractive individual and 15 the least attractive (Table 3a). The same procedure was adopted for assessing inter-examiner agreement for album 2, involving improvement in appearance (Table 3b).

View this table:
Table 3a

Ranked data for the 15 patients' photographs (album 1) from all 10 clinicians' assessments.

PatientClinician
12345678910Median± 2.5± 3.5
14787812109727.578
2111228911131110710.589
373557864555910
41310131314101414141213810
5128141211595121311.578
616443321243910
7796111176711108710
81014910695138119.568
9346342364641010
106536544861589
112121111218199
121415121410141212131513.5910
13911109131311109910810
1415131515151515151514151010
1552122583232.599
Median810
View this table:
Table 3b

Ranked data for the 15 patients' photographs (album 2) from all 10 clinicians' assessments.

PatientClinicianMedian± 2.5± 3.5
12345678910
1569710135118118.579
2723841168576.578
391012111310109111010910
41111141118189
549656128466689
6141114131281214121512.599
710128151498121391169
810253524542499
9105299736936.569
1081411581410107121067
111381313116151314131388
1227102216231277
136233752325389
1431271023137104725
151515151215151415151415910
Median89

Photographic technique: Six volunteers (2 male, 4 female, age range 27–30 years) were photographed on six occasions, with a two-week interval between each set of photographs (full-face and profile). The photographs were analysed using IPTool, in order to compare five linear measurements and one proportion index (Table 4a).

View this table:
Table 4a

An example of linear measurements (cm) taken of one of six individuals for photographic repeatability.

MeasurementPhotographMeanRangeMedianRange/median (%)
123456
Nasion–subnasle6.136.156.196.416.326.246.240.286.2154.51
Tragion–nasion10.6710.6410.4710.6410.6810.6310.620.2110.641.97
Alare–alare3.473.363.43.483.443.483.480.123.4553.47
Exocanthion–exocanthion9.259.469.459.299.459.59.50.259.452.65

Digitization: The photographs of six patients from the study were randomly selected and digitized on six occasions, with at least a two-week interval between each digitization. On each occasion, six linear measurements (Table 4b) and one proportion index (Table 4c) were calculated.

View this table:
Table 4b

An example of linear measurements (pixels) taken of one patient for digitizing repeatability.

MeasurementDigitizationMeanSDMedianRange differenceRange/median (%)
123456
Nasion–gnathion349349347348347346347.671.21347.530.86
Tragion–nasion256261256262255259258.172.93257.572.72
Sublabiale–gnathion82787981767378.1673.3178.5911.5
Alare–alare89878989878888.1670.9888.522.26
Exocanthion–exocanthion2202252212222212232221.79221.552.26
Chelion–chelion128124126125124126125.51.52125.543.19
  • SD, standard deviation.

View this table:
Table 4c

An example of the proportion index of one patient for digitizing repeatability.

Proportion indexFarkas mean and SDIndexAttemptMeanMedianSDRange
123456
(Alare–alare) × 100/(chelion–chelion)65.3 (SD 5.0)1969.5370.270.6371.27069.847070.10.5921.67
  • SD, standard deviation.

Results

Clinicians' assessments

Inter-examiner agreement: Tables 3a and 3b present the ranked data for all 15 patients from the 10 clinicians' assessments, from both albums. For album 1, it can be seen that within 2.5 rankings, 8 out of the 10 clinicians agreed, and within 3.5 rankings there was complete agreement. For album 2, the median rankings were 8 and 9, respectively.

The result of comparing the clinician's initial rating of attractiveness (album 1) with the degree of improvement through treatment (album 2) is shown in Figure 1. There was a strong inverse correlation, indicating that those clinically judged to be the least attractive showed the greatest improvement (r = –0.781, P = 0.001).

Figure 1

Clinicians' ratings of attractiveness (album 1) versus degree of improvement following surgery (album 2); r = –0.7871, P = 0.001.

Proportion indices change versus clinical assessment of change (album 2)

Table 2 provides the data for one of the 15 patients as an example, showing changes in the 25 indices as a result of treatment and their relationship to the Farkas mean, matched for age and gender. Whilst Table 5 shows the three different ways in which the proportion indices changes were calculated, Figure 2 shows the correlation (Spearman's rank) according to the first of the methods (r = 0.698, P = 0.004), indicating a good level of agreement between the change of proportion index and clinically judged improvement in appearance. Correlation, as assessed by the second method of presenting the changes in indices (involving the mean), was r = 0.608, P = 0.016; taking the standard deviation of the indices into account, resulted in the correlation r = 0.603, P = 0.017.

View this table:
Table 5

Treatment change in proportion indices (PI), relative to mean data, matched for age and gender (Farkas and Munro, 1987).

PatientTotal change scoreaChange/mean, totaledbChange/SD, totaledc
121.3241.83−0.27
252.9794.614.29
3−14.79.4−10.14
493.06149.58.57
519.7351.544.38
615.246.044.25
728.959.352.2
881.05135.668.3
9113.86192.116.33
102.7911.43−9.08
11−30.91−15.933.8
1215.9212.598.5
1392.99159.182.2
1481.83175.1111.63
15−16.2−8.74−0.72
r0.6980.6080.603
P0.0040.0160.017
  • a Overall change in PI values pre- and post-treatment, relative to matched means, totalled.

  • b Change in PI value divided by matched mean, then totalled.

  • c Change in PI value divided by standard deviation (SD) of matched mean, then totalled.

Figure 2

Rank of clinicians' assessment of degree of change versus rank of total change of proportion indices (pre- and post-treatment); r = 0.698, P = 0.004. The total change in proportion indices was ranked, with 1 being the most positive and 15 the most negative change.

Repeatability

Table 4a shows the raw data obtained in assessing photographic repeatability. As an example, the standard deviation for repeated measurements of the index Al–Al/N–Sn was 0.9 (Farkas index = 5.0).

Digitizing repeatability is presented in Table 4b, and again the standard deviation of repeated measurements of the proportion index Al–Al/Ch–Ch was 0.59 (Farkas index = 65.3, SD = 5.0).

Discussion

As seen in Figure 1, according to clinical judgement, the least attractive faces subsequently showed the most improvement through treatment. These results, based purely on the clinicians' assessments (albums 1 and 2), suggest that individuals who are relatively more attractive to begin with clearly have less to gain from treatment. These findings are to be expected, but do not appear to have been identified or reported hitherto. As may be expected, where the patients were at the extreme ends of the range of attractiveness, relatively more clinicians agreed with each other regarding their attractiveness than those patients who were nearer to the average.

The degree of improvement as assessed by the clinicians (album 2) clearly correlated strongly with the changes in the selected anthropometric proportion indices. This was confirmed by all three methods of comparison. It is not possible to compare these findings with the work of others, since this type of research has not previously been undertaken. However, the principle of comparing patients' proportion indices with those representing the mean, has previously been used by anthropologists (Ward et al., 1998, 2000). However, in their work, the indices were essentially used for syndrome diagnosis and the measurements were obtained by manual anthropometry, rather than photogrammetry, as in the present study. Using a conventional photographic approach has the benefit of wide availability, but the quality of results is very dependent on the protocol for standardization. Three-dimensional photographic techniques, such as stereo-photogrammetry, have the benefit of ease of image capture and also allow the use of further proportion indices, involving surface measurements; this approach is currently under development. It is also relevant to point out that facial attractiveness is dependent on a range of other factors that have not been considered here, including dynamic proportions, skin texture and colour, and dental appearance. However, the topic considered, namely static facial morphology, is clearly a dominating factor. Caucasians living in the Toronto area provided the anthropometric data used as a basis for comparison, so limiting the ethnic sample of the album images. However, data involving other ethnic groups, for example black Americans (Ofodile et al., 1993) is gradually becoming available.

Conclusions

  1. There was a clear relationship between the attractiveness of the patients in album 1 and the degree of subsequent improvement in album 2, i.e. according to clinical judgement, the less attractive patients showed the greatest improvement.

  2. There was a strong correlation between proportion index changes and clinical assessment of improvement through surgery (album 2), using all three methods of comparison.

  3. There was good photographic and digitizing repeatability.

  4. The results were sufficiently encouraging to suggest that there may be a use for proportion indices in the objective quantification of facial attractiveness.

Acknowledgments

We are very grateful to the clinicians who provided the assessments; the orthodontists, Karen Clarke, Claire Hepworth, Allan Jones, Farhad Naini and Steve Newell, and the maxillo-facial surgeons, Malcolm Bailey, Peter Blenkinsopp, Andrew Stewart, Caroline Mills and Paul Robinson.

References

View Abstract