Correlating Speech and Voice Features of Transgender Women with Ratings of Femininity and Gender

Purpose: This study investigated which acoustic features of the voices of transgender (trans) women correlate with selfand listener ratings of voice femininity and with listener perceptions of gender. Differences between transand cisgender (cis) voices on these acoustic variables were also explored. Methods: Speech samples were collected from 12 trans women and 10 cis control subjects. The acoustic variables of speaking fundamental frequency (SFF), SFF variation, intensity, vowel formants, and correlates of breathiness were collected for each speaker. Speakers completed a self-evaluation of voice femininity on a five-point scale drawn from the Transsexual Voice Questionnaire for Male-to-Female Transsexuals. Excerpts of these speech samples were presented to blind listeners, who also evaluated the femininity of each voice and classified each speaker within a binary gender system. Correlations between the acoustic variables and selfand listener ratings of voice femininity and listener perceptions of gender were measured using Spearman’s rank-order coefficient. Results: Moderate-to-strong correlations were found between ratings of voice femininity and mean and maximum SFF, SFF variation, and mean intensity. These same four acoustic variables were moderately correlated with listener perceptions of gender. There were no consistent or significant correlations between voice femininity ratings or gender perceptions and minimum SFF, vowel formants, and breathiness measures. The analysis of differences between trans and cis speakers was limited by sample size. Results suggest SFF, SFF variation, and intensity—or pitch, intonation, and loudness—are appropriate targets for evidence-based voice training of trans women.

Professional Association for Transgender Health (WPATH) Standards of Care, the foremost guide for physicians and allied health professionals, until its seventh edition (Coleman et al., 2012). Now, as cultures move through what some have coined the "gender revolution," clinicians and researchers in the field of speech-language pathology are challenged with building a sufficient base of knowledge to support effective, evidence-based treatment programs for trans clients.
The selection of treatment targets is a critical first step in implementing an effective speech and voice therapy protocol. Intervention should target the speech and voice features most likely to lead to the desired outcome of treatment. In the case of trans voice treatment, the desired outcome is a gender-congruent vocal presentation.
This outcome is often assessed using listeners' evaluations of the speaker's voice and perceptions of the speaker's gender. Trans speakers' self-evaluation of voice, however, is also an important outcome measure. But the clinician must first determine which speech and voice features should be targeted for the client to achieve a gendercongruent voice.

Purpose
The purpose of this study is to contribute to the data on evidence-based selection of treatment targets in trans speech and voice therapy. It does so by measuring the correlation of acoustic features of trans women's speech with self-and listener ratings of voice femininity and with listener perceptions of gender. This study poses four primary research questions: 1. Are the acoustic measures of speaking fundamental frequency (SFF), SFF variation, vowel formant frequencies, vocal intensity, and breathiness correlated with trans women's self-ratings of voice femininity? 2. Are these same measures correlated with listener ratings of trans women's voice femininity?
3. Are these measures correlated with listener perceptions of trans women's gender identity? 4. Are there statistically significant differences on these measures between trans women and cisgender (cis) 2 individuals?

Significance
This study offers a unique contribution to the literature in several ways. First, it includes measures of speech and voice that have received minimal attention in the 3 literature (e.g., intensity and breathiness). Second, it incorporates trans women's selfevaluations of voice, which has often been overlooked in studies of trans speech and voice treatment. Third, it assesses vocal characteristics in connected speech, wherever possible, to more closely approximate natural communicative contexts than isolated words or vowels would allow. It is therefore more relevant to treatment in which the goal is to achieve generalization of voice changes outside of the clinical environment.
Finally, this study seeks to identify significant differences between the voices of trans women and cis individuals. Identifying these differences is important for intervention, as they will inform the speech-language pathologist about which features of cis speech, if any, are appropriate treatment targets for trans women.

Sex-based speech and voice differences
The speech of men and women differ in several prominent characteristics. The most well documented difference is in average fundamental frequency. An adult female has an average speaking fundamental frequency (SFF) of 220 Hz, and an adult male 120 Hz (see Stoicheff, 1981;Hollien & Shipp, 1972;Titze, 1994). Average vowel formant frequencies also differ by sex, with vowel formants of male voices being at lower frequencies than those of female voices (Coleman, 1971). Other studies have documented sex-based differences in vocal intensity (Brockmann et al., 2011), intonation patterns (Brend, 1975), breathiness (Klatt & Klatt, 1990), and vocal quality (ibid.).
Taken together, this research suggests that a female voice is likely to have a higher fundamental frequency, higher vowel formant frequencies, more varied intonation with more upward inflections, and a greater degree of breathiness than a male voice. These norms inform the expectations a listener has about which characteristics a voice coming from a female body will have. These expectations, in turn, impact a listener's perception of the speaker's gender, as described in the research reviewed below.

Voice and gender perception
Voice is a salient marker of sex and gender. Previous studies have demonstrated differences between the speech of trans and cis individuals and between the masculine and feminine versions of a trans woman's voice (Coleman, 1983;Gelfer & Schofield, 2000). These voice differences offer the listener cues to the speaker's gender identity.
The question of which elements of a speaker's voice are most salient to gender perception, however, is not fully resolved.
Pitch has long been a central focus of many trans voice studies. Yet, even in early research, the primacy of pitch has been questioned. Coleman (1983) noted that pitch increase alone is insufficient to eliminate a persistently male voice quality in trans women. Gelfer and Schofield (2000) reached a similar conclusion in an experiment evaluating the pitch, intonation, and vowel formants of 15 trans women. Their results showed that speakers perceived as female had a significantly higher mean and maximum SFF than speakers perceived as male. However, in this study, some trans women with an SFF in the feminine range were still perceived as male, indicating, again, that pitch is not the sole determinant of voice-based gender perception.
Additional studies investigated the contributions of other speech and voice features to gender perception. Hillenbrand and Clark (2009) and Gelfer and Bennett (2013) both found correlations between vowel formants and gender perception. Hillenbrand and Clark (2009) electronically altered sentences spoken by male and female participants to evaluate the respective importance of fundamental frequency and vowel formants to listener perceptions of gender. The study demonstrated that manipulation of fundamental frequency or vowel formants alone was insufficient to elicit a change in gender perception. Only when both variables were adjusted to fall within the range of the opposite sex did a change in listener perceptions of gender follow. Gelfer and Bennett (2013) also used digital manipulation to evaluate the relative importance of fundamental frequency and vowel formants in gender perception. Their results showed correlations between both variables and gender perception. They concluded that vowel formants likely contributed to gender perceptions in their sample, especially for voices within a gender-ambiguous SFF range (i.e., 145-165 Hz).
Studies evaluating vowel formants in natural speech have not generated such conclusive evidence of a correlation with gender perception. Gelfer and Schofield's (2000) analysis of trans women's speech did not reveal a statistically significant link between vowel formants and gender perception. Hardy, Boliek, Wells, Dearden, Zalmanowitz, and Reiger (2016) found that the second vowel formant (F2), together with minimum frequency and shimmer percentage, correlated with ratings of speech naturalness only. Because the majority of their trans participants were perceived as male, no relationship could be established between vowel formants and gender perception or femininity ratings.
Intonation has also received some attention in the literature. Wolfe, Ratusnik, Smith, and Northrop (1990) found that trans women who were perceived by listeners to be female had more varied intonation, as measured by a greater percentage of both upward and downward inflections and a smaller percentage of level intonations, than those perceived to be male. This study evaluated the speech of trans women only, without comparison groups of cis controls. A more recent study by Hancock, Colton, and Douglas (2014) resulted in some of the same conclusions. Their study included trans men and women as well as cis control subjects. Hancock and colleagues found that speakers who were perceived as female spoke with more upward intonations and a greater semitone range, another way of measuring variation in intonational patterns.
Other voice features considered in the literature include breathiness, intensity, and glottal fry. Research on these variables suggests that increased breathiness (Gorham-Rowan & Morris, 2006), reduced intensity (Holmberg et al., 2010), and reduced use of glottal fry (ibid.) may be salient in the perception of speakers as female or as more feminine. Literature on these measures, however, is relatively sparse.

Self-evaluation of speech and voice
The conclusions derived from the research reviewed above rely exclusively on the perceptions of listeners. The speakers' beliefs about their own speech are largely unconsidered. Yet, the impact of these beliefs can be significant. For example, Kasama and Brasolotto (2007) found that, for individuals with voice disorders, self-evaluation of speech and voice correlated with the individual's quality-of-life rating. In this study, a correlation did not exist between listener evaluations and the speaker's quality-of-life rating.
The communication challenges facing trans women are complex, and while much attention has been paid to listener perceptions, self-evaluations of speech and voice are also important. In fact, trans women's responses to self-perception questionnaires reveal voice satisfaction to be an issue of primary concern (Pasricha et al., 2008). In another study, trans women ranked voice and non-verbal communication (e.g., gestures, laughter) as the second and third most important factors in successful gender presentation, just behind physical appearance (Kayajian, 2005). Hancock, Krissinger, and Owen (2010) found quality-of-life ratings to be more strongly tied to trans women's self-rating of their voice femininity and likability than to listener ratings on these scales.
Few published studies have explored the relationship between acoustic speech and voice features and trans women's self-evaluation of their voices. Owen and Hancock (2010) and McNeill, Wilson, Clark and Deakin (2008) found self-ratings of voice femininity to be strongly correlated with SFF. Owen and Hancock (2010) also found a strong correlation with semitone range. Others have noted conflicts between client satisfaction with voice and acoustic measures of voice (Dacakis, 2000). That is, trans women whose voices remained within male ranges of SFF reported high levels of voice satisfaction. These studies suggest that self-evaluation of voice has the potential to greatly impact the perceived success of any voice training for trans women.
The present study reflects the importance of self-evaluation by including both speaker and listener ratings of speech and voice in the analysis. Though researchers have found self-evaluation of voice to correlate with something as deeply significant as quality of life (see Kasama & Brasolotto, 2007;Hancock et al., 2010), few studies have incorporated the speaker's own ratings into investigations of trans speech and voice.
This study thus seeks to fill this gap in the literature and inform clinical decisions that affect the success of a voice treatment program.

Research design
This study used a correlational research design incorporating quantitative acoustic and perceptual analyses. The research protocol was approved by the University of Rhode Island Institutional Review Board, HU1718-002 and HU0809-139. The study was carried out in three phases: (1) the collection of speech samples, self-ratings, and personal data from speaker participants, (2) the evaluation of these speech samples by blind listeners, and (3) the statistical analysis of the data collected. A detailed description of each phase of the study is provided in the sections that follow.
The independent variables of this study included the following acoustic variables: speaking fundamental frequency (SFF), SFF variation, vowel formant frequencies, vocal intensity, soft phonation index (SPI), pitch period perturbation quotient (PPQ), and relative average perturbation (RAP). Previous research suggests that each of these measures may be relevant to perceptions of gender.
Fundamental frequency (F0) has been described as "the single most important acoustic variable for voice classification" (Titze, 1994, p. 169). It measures the number of vibratory cycles that the vocal folds complete in one second and is perceived by the listener as pitch. Of interest to the present study was the speaking fundamental frequency (SFF), a measure of vocal fold vibrations in connected speech. Included in the analysis were the mean, minimum, and maximum SFF, which quantify the speaker's habitual speaking pitch as well as pitch range. SFF variation was defined here by the standard deviation of SFF. It was included as a means of measuring intonation, or the degree to which a speaker varies pitch during connected speech. All SFF measures were collected using Praat, an open-resource acoustic software program.
Formants are "the resonating frequencies of the vocal tract" (Owen & Hancock, 2010, p. 274). They are bands of acoustic energy concentrated at a particular frequency that reflect the shape of the vocal tract when producing a given sound. The vowel has many formants, but F1 and F2 are of greatest interest and were the formants included in this study. F1 correlates with tongue height, and F2 with tongue retraction. They are thus the formants that most clearly distinguish one vowel from another. For this study, F1 and F2 were measured in Praat during the production of /a, i, u/ in connected speech.
Vocal intensity is the physical correlate of loudness and varies as a function of subglottal pressure and vocal fold vibratory amplitude. It is measured in decibels of sound pressure level (dB SPL), and, in this study, was collected from connected speech samples using Praat.
SPI, PPQ, and RAP are all acoustic measures that reflect the degree to which a voice is perceived as breathy. SPI is a measure of vocal fold approximation, or the degree to which the vocal folds achieve full closure. A high SPI is suggestive of increased breathiness. PPQ and RAP assess jitter, or the variability of frequency from period to period. Jitter is commonly used to assess breathiness . Mean SPI, PPQ, and RAP were collected from samples of sustained /a/ using the Kay Pentax Computerized Speech Lab (CSL). The dependent variables in this study included trans women's self-ratings of their voice femininity, listener ratings of the speaker's voice femininity, and listener perceptions of the speaker's gender. Self-ratings were collected during phase one of this study using the Transsexual Voice Questionnaire for Male-to-Female Transsexuals (TVQ MtF ; . Listener ratings and perceptions of gender were collected during phase two. The procedures for the listener component of this study are described in section 3.6.

Study sample: speakers
The study sample included 12 trans women and a control group of five cis women and five cis men. Participants were recruited by email announcement, recruitment flyer, and word of mouth. Trans participants had a mean age of 36.3 years (SD=10.6, range=25-56). Cis participants were age-matched to trans participants within seven years (mean=35.3, SD=13.3, range=21-61).
Exclusionary criteria for participants included: (1) younger than 18 or older than 78; (2) gender identity other than trans woman, cis woman, or cis man; and (3) past history of laryngeal surgery or trauma. Participant eligibility was determined by self-disclosure of age, gender identity, and relevant medical history. The gender criterion was included to ensure that results were not confounded by outlier voice characteristics associated with gender identities other than trans or cis. Similarly, agerelated voice changes in the elderly and past laryngeal surgery or trauma could impact acoustic measures of voice and voice satisfaction ratings, thereby confounding the results of the study. Thus, eligibility was constrained by age and past medical history.
Trans participants covered a broad spectrum of experiences with gender presentation and gender transition. The study sample included participants who presented full-time as a woman, those who presented as male during at least part of most days (e.g., professionally), those whose personal and family relationships dictated their gender presentation, and those who presented as a woman only in their own homes. Time in transition ranged from two months to seven years (in months, mean=17.5, SD=20.8). All but one participant was taking female hormones at the time of participation. Seven participants had attempted voice modification strategies on their own, and four with a speech-language pathologist. Table 3.3 summarizes the characteristics of study participants.

Study sample: listeners
Twenty listeners participated in the listener evaluation phase of the study. Participants were recruited by email announcement, visits to university classes, and word of mouth.
Listeners had a mean age of 25.7 (SD=10.1, range=19-54). All listeners identified as female. 3 Exclusionary criteria for the listener group included: (1) under 18 years old, (2) non-native speaker of English, and (3) hearing loss greater than 30 dB at 500, 1000, and 2000 Hz. Age and English-speaking status were self-disclosed by study participants. All participants were given a hearing screening before any voice recordings were presented. The exclusionary criterion regarding language status was chosen to avoid confounding factors such as cultural differences in gender-based communication norms. It was assumed for the purposes of this study that a native speaker of English living near the study site was fluent in the dominant culture of the area (i.e., mainstream American culture). The restriction on hearing status was applied to ensure reliable transmission of the auditory stimulus (i.e., the speaker's voice) to the listener, whose responses would be based on this stimulus. 5. Oral monologue in response to a prompt (e.g., What is your proudest moment?)

Speech tasks and related acoustic variables
These tasks were completed twice, once in each of two evaluation sessions, 4 to account for the day-to-day variability in an individual's voice caused by differences in hydration, diet, sleep, mood, room temperature, and other potential factors (Bough et al., 1996). The two sessions were completed within a 17-day period (mean=5.75, SD=4.32). Participants wore a Countryman omnidirectional, head-mounted microphone at a standard distance of 8 cm from the center of the lips. Speech samples were recorded with GoldWave v6.21 software using a Universal Audio 4-710D preamplifier and an RME Fireface UC audio interface. Table 3.4 summarizes the speech tasks and the acoustic variables elicited from each for use in the subsequent statistical analyses. Speech samples were edited to isolate the phonemes or speech of interest from each task. From Task 1 (vowel prolongation), six productions of /a/ were extracted for analysis of breathiness measures (i.e., SPI, PPQ, and RAP) in the Computerized Speech Lab. Each /a/ was analyzed independently, and the mean of the six productions was calculated for use in the subsequent statistical analysis.
From Task 2 (sentence reading), six productions each of /a, i, u/ were extracted separately. From each of these productions, F1 and F2 were measured in Praat. These frequencies were averaged so that the statistical analysis included a mean F1 and a mean F2 for each corner vowel.
Task 3 (Rainbow Passage) provided the excerpt that was presented to listeners during the second phase of the study. Along with Tasks 4 (picture description) and 5 (oral monologue), it also provided connected speech samples from which SFF, SFF variation, and intensity were calculated. All samples of connected speech were grouped together in a single audio file to measure these variables. Note that Task 3 was a reading task, whereas Tasks 4 and 5 produced samples of spontaneous, connected speech. Research suggests that reading tasks and spontaneous speech may elicit different results on some measures of speech and voice (Hollien et al., 1997;Van Lancker et al., 2012). For most people, reading is a less cognitively demanding task than spontaneous speech, which requires the speaker to simultaneously speak while planning what to say. Without this additional cognitive load, reading allows speakers to attend more to their speech. This heightened attention can impact speech and voice characteristics. However, the literature provides no evidence that these differences are statistically significant for the measures of interest in this study. Furthermore, a pilot study of the present project produced no statistically significant differences between acoustic measures taken from spontaneous speech only and those taken from connected speech that included both reading and spontaneous speech tasks.
Of particular importance here is that the reading task in this study provided the speech sample that was presented to listeners. It was therefore deemed important that this excerpt be included in the larger connected speech sample that generated the measures used in the statistical analysis. Furthermore, the use of speech from a standard reading task allowed continuity of content across speech samples, with the goal of eliminating any influence semantic content might have had on listener perceptions.

Self-ratings
Participants completed the TVQ MtF during the first evaluation session. The TVQ MtF is the only self-report instrument that specifically addresses the voice-related concerns of trans individuals and that has been subject to psychometric evaluations. The authors of the TVQ MtF have demonstrated high reliability by assessing internal consistency (Cronbach's α = 0.97) and test-retest reliability (intraclass correlation coefficient = 0.97) . They have also presented evidence of content and construct validity (see Davies, 2015;Dacakis et al., 2017).
The questionnaire consists of 30 items describing perceptions and experiences related to voice (e.g., My voice doesn't match my physical appearance.). Respondents indicate how frequently these perceptions or experiences occur on a four-point scale.
An additional two items at the end of the form ask respondents to provide a general rating of their voice on a five-point scale ranging from "very female" to "very male." The penultimate rating applies to the respondent's voice at present, and the last to the respondent's ideal voice.
Trans participants completed the questionnaire in its entirety, while cis participants were asked only to answer the last two items. The wording of the 30 items is often inappropriate for assessing the voice-related perceptions and experiences of cis men because the questionnaire is written specifically for trans women (e.g., My voice makes me feel less feminine than I would like.). Furthermore, based on participant interviews and observations, there were no indications that any of the cis participants had a voice disorder that would lead to voice-related concerns such as those suggested by the questionnaire. Of primary interest to this study was simply each participant's self-rating of voice femininity, and thus these two items were the only ones administered across participant groups.

Listener ratings
The listener evaluation component of this study was conducted in a quiet room in the URI Speech and Hearing Center. Short audio-recordings of each participant reading the second through fourth sentences of The Rainbow Passage were played via closed headphones (Sennheiser HD280 Professional) to listeners who were blind to the identity of the speakers. This excerpt was chosen so that the influence of the sample content was controlled and so that each recording was a minimum of 15 seconds (range=17-30). The order of the recordings was randomized for each listener to control for order effects. Listeners were told that the purpose of this study was to evaluate the influence of certain voice features on listeners' judgments of the speaker, without any specific mention of sex or gender. Each listener heard and evaluated all speaker participants, and each recording was played only once.
The listener was asked to classify or rate the speaker on the following measures: age, gender, vocal quality, overall health, and femininity/masculinity. Gender classification and the femininity/masculinity rating were the only responses used in the statistical analysis; all other measures were included to prevent the listener from overly scrutinizing gender-based features of speech. Gender options included only "male" and "female." The femininity/masculinity scale was altered slightly from the TVQ MtF scale, which ranged from "very female" to "very male." Because listeners had already classified the speaker within a binary gender system, this rating had the potential to seem redundant to the listener and possibly confusing. Thus, listeners rated the speaker on an analogous five-point scale, which ranged from "very feminine" to "very masculine." Five recordings were randomly selected for repetition and incorporated in the randomized order of voice samples to determine intra-rater reliability. Reliability was based on both listener perception of gender and rating of voice femininity. Adapting the listener reliability method employed by Owen and Hancock (2010), a listener was deemed reliable if, for at least four of the five repeated recordings, the gender classification was identical, and the femininity rating had a difference of no more than one point on the five-point scale of voice femininity.

Statistical analyses
Research questions 1-3 of this study examined the associations between the dependent and independent variables. These associations suggested whether and to what degree the acoustic measures of interest correlated with self-ratings of voice femininity, listener ratings of voice femininity, and listener perceptions of gender. Correlations between these variables were calculated using Spearman's rank-order correlation coefficient, a non-parametric test of association applicable to data that is not known to be normally distributed.
Differences between trans and cis voices were measured using independent sample t-tests to address research question 4. Participants were first divided into four groups: (1) trans women predominantly identified as female, (2) cis women predominantly identified as female, (3) trans women predominantly identified as male, and (4) cis men predominantly identified as male. Predominance was defined as occurring at least 75% of the time. Independent sample t-tests were conducted to compare differences between groups 1 and 2 and between groups 3 and 4 on the variables of SFF, SFF variation, vowel formants (F1 and F2), intensity, and breathiness measures (i.e., SPI, PPQ, and RAP).

RESULTS
This chapter presents the results generated in each phase of this study. First, the data collected across these phases are summarized. These data include: (1) acoustic measures collected from speech samples of trans women, cis women, and cis men, (2) self-and listener ratings of voice femininity, and (3) listener perceptions of gender. This data summary is followed by the results of the listener reliability procedures.
Finally, the results related to each research question of this study are presented.

Summary of data
This study generated acoustic variables of voice, self-ratings of voice femininity, and listener ratings of voice femininity and gender for each participant. In the tables that follow, these data are summarized by participant group-trans women, cis women, and cis men. Table 4.1 summarizes SFF and intensity for each group. The values of these acoustic measures for cis women and cis men in the research sample were consistent with gender norms in the adult population. Trans women in this study tended to exhibit SFF values between those of cis women and men, while mean vocal intensity among the trans participants was the lowest of all groups. There was, however, very little variation across groups on minimum SFF.     Compared to these self-ratings, listener ratings were more widely distributed across the femininity scale for each participant group. One cis woman received at least one listener rating of a somewhat masculine voice, and three received at least one listener rating of a gender-neutral voice. Two cis men received at least one listener rating of a somewhat feminine voice, and three received at least one listener rating of a gender-neutral voice.
Listeners also tended to rate the voices of trans women as more feminine. Only 17 percent of listener ratings characterized the voice of a trans participant as very masculine, while half of all trans participants rated their own voice as very masculine.
Meanwhile, seven percent of listener ratings assigned the voice of a trans participant as very feminine, while none of the trans participants themselves did so. Finally, Table 4.5 summarizes listener perceptions of gender for each participant group. Both cis women and cis men were largely identified as female and male, respectively, with very few outliers. As a group, trans participants were predominantly identified as male; three individual trans women accounted for all occasions in which a trans speaker was identified as female.

Research question 1: Acoustic variables and self-ratings
The first research question of this study asked whether the acoustic measures of SFF For comparison, a Spearman's rank-order correlation coefficient was also constructed for the group of cis participants. As with trans women, mean SFF was strongly positively correlated with self-ratings of voice femininity (r = 0.731), and maximum SFF was moderately positively correlated with voice femininity (r = 0.699).
Beyond these similarities, correlations within the cis sample diverged from those of the trans sample. For example, SFF variation, a measure of intonation, showed a strong positive correlation with self-rating of voice femininity (r = 0.731) within the cis sample. And all vowel formants except F1 of /a/ were strongly positively correlated with self-ratings of voice femininity. In summary, a higher mean SFF, higher maximum SFF, greater SFF variation, and higher vowel formant frequencies all correlated with a more feminine self-rating of voice for cis participants. In this group, there were no statistically significant correlations between self-ratings of voice femininity and vocal intensity or breathiness correlates.

Research question 2: Acoustic variables and listener ratings
The second research question of this study asked whether any of the acoustic variables considered above correlate with listener ratings of trans women's voice femininity.
Again, Spearman's rank-order correlation coefficient was done to measure associations between each acoustic variable and listener ratings of voice femininity, with significance set at a level of 0.05.
Moderate positive correlations were found between listener ratings of trans women's voice femininity and mean SFF (r = 0.513), maximum SFF (r = 0.442), SFF variation (r = 0.431), and mean intensity (r = 0.455). F2 of /i/ and F1 of /u/ were also moderately positively correlated with listener ratings (r = 0.410; r = 0.431), as was SPI (r = 0.319). Thus, as the value of each of these variables increased, the speaker was rated as having a more feminine voice. Weak correlations were found between listener ratings and minimum SFF (r = -0.277), F1 of /a/ (r = 0.148), F1 of /i/ (r = 0.288), F2 of /u/ (r = 0.128), PPQ (r = -0.234), and RAP (r = -0.164). As these vowel formant frequencies increased, trans speakers' voices were rated as more feminine. As minimum SFF, PPQ, and RAP increased, trans speakers were rated as less feminine.
Only F2 of /a/ showed no statistically significant correlation with listener ratings of trans women's voice femininity.
An analysis of correlations within the cis sample again revealed some differences between participant groups. For cis participants, strong positive correlations were identified between listener ratings of voice femininity and mean SFF (r = 0.752), maximum SFF (r = 0.737), SFF variation (r = 0.706), F2 of /i/ (r = 0.781), and F1 of /u/ (r = 0.745) with listener voice femininity ratings. All other vowel formants (F1 of /a, i/, and F2 of /a, u/) were moderately positively correlated with listener ratings (see Table 4.7 for r values). So, as the value each of these variables increased, cis speakers' voices were rated as more feminine. For the cis group, minimum SFF (r = 0.199), mean intensity (r = -0.268), SPI (r = -0.222), and PPQ (r = 0.179) were all weakly correlated with listener ratings of voice femininity. As minimum SFF and PPQ increased, listeners rated the speaker's voice as more feminine. With mean intensity and SPI, the relationship was reversed; as the value of these variables increased, speakers' voices were rated as more masculine. PPQ showed no statistically significant correlation with listener ratings of voice femininity.

Research question 3: Acoustic variables and gender perception
The third research question of this study asked whether any of the acoustic variables   were detected between all other vowel formants and listener perceptions of gender (see Table 4.8 for r values). As the value of these variables increased, speakers were more likely to be identified as female. Minimum SFF (r = -0.259), mean intensity (r = -0.208), SPI (r = -0.198), and RAP (r = 0.247) were all weakly correlated with listener perceptions of gender. As minimum SFF and RAP increased, the speaker was more likely to be identified as female. As intensity and SPI increased, however, the speaker was more likely to be identified as male.

Research question 4: Differences between trans-and cisgender voices
The fourth and final research question of this study asked whether there were any statistically significant differences between trans women and cis controls on any of the acoustic variables considered above. Participants were first divided into groups based on the predominant gender perceptions recorded by listeners. Thus, trans women identified as female would be compared with cis women, and trans women identified as male would be compared with cis men. The characteristics of the t-test groups are summarized in Table 4.9 below.
Only one trans woman was predominantly identified as female. Two other trans women were frequently identified as female but did not meet the criterion for predominance (i.e., at least 75% of the time). The sample size of group 1 was thus too small for any statistical testing. Independent t-tests were conducted, however, to compare the voices of trans women identified as male and cis men. At a significance level of 0.05, these comparisons revealed no statistically significant differences between the voices of trans women and cis men identified as male on the acoustic variables evaluated. At a significance level of 0.10, however, there was a statistically significant difference in mean SFF between the two groups.

Summary of results
These results show that, for trans women, mean SFF, maximum SFF, and mean intensity had moderate-to-strong correlations with self-and listener ratings of voice femininity as well as with listener perceptions of gender. These were the only acoustic variables to show a statistically significant correlation across all dependent variablesself-rating of voice femininity, listener rating of voice femininity, and listener perceptions of gender. Other acoustic variables, such as SFF variation, vowel formants, and breathiness correlates, showed associations that were with some but not all dependent variables.
These results also show different correlations between acoustic measures and dependent variables for cis controls. That is, in some cases, an acoustic measure that was not correlated with a dependent variable for trans speakers, was so for cis controls.
And while independent t-tests could not be done to compare trans women identified as female with cis women, t-tests did show a statistically significant difference (at a=0.10) between trans women and cis men identified as male on the measure of mean SFF.

DISCUSSION
The primary purpose of this study was to investigate the relationships between acoustic variables of speech and voice, self-and listener ratings of trans women's voice femininity, and listener perceptions of gender. It also sought evidence of statistically significant differences between the voices of trans women and cis individuals in an effort to identify potential targets for treatment. The results presented in Chapter 4 show that the voices of the cis women and cis men in this study were consistent with gender norms, while greater variation was observed among trans participants. These results provide support for the findings of past studies that identified acoustic correlates of pitch and loudness as salient markers of gender. They also include novel findings such as the correlation of maximum SFF with self-ratings of voice femininity. These results are discussed within the context of past research, future directions, and clinical decision-making in the sections that follow.

Acoustic variables and voice femininity
This study found that voices of trans women that were rated as feminine tended to be those voices also perceived to be quieter and higher-pitched. This finding is based on the moderate-to-strong correlations (0.442 ≤ r ≥ 0.712) that were identified between self-and listener ratings of voice femininity of trans women and mean SFF, maximum SFF, and vocal intensity. Several past studies have demonstrated relationships between voice femininity ratings and SFF (McNeill et al., 2008;Wolfe et al., 1990). The present findings thus provide further evidence that speakers and listeners define feminine voices in part by a relatively high SFF. Speech-language pathologists are cautioned, however, not to draw the conclusion that increasing SFF is the ultimate goal of treatment of transfeminine voices. While SFF appears to correlate with ratings of voice femininity, an SFF within a feminine range may be insufficient to achieve consistent identification as female (Coleman, 1983;Gelfer & Schofield, 2000).
Furthermore, achieving a feminine SFF does not necessarily lead to voice satisfaction, which is also an important goal of treatment (Dacakis, 2000;McNeill et al, 2008).
The correlation between maximum SFF and voice femininity ratings was also demonstrated by Gelfer and Schofield (2000). Their study, however, established a relationship with listener ratings only, while the current study found a correlation with self-ratings as well. This correlation between maximum SFF and speakers' selfassessment of voice femininity is therefore a new finding. It suggests that an SFF range that reaches higher frequencies may contribute to the perception of a feminine voice.
Interestingly, the correlations between voice femininity ratings and vocal intensity in this study differed depending on who rated the voice. Listeners rated voices with lower intensity as more feminine, whereas trans women with higher vocal intensity tended to rate their own voices as more feminine. The listener results support the findings of Holmberg and colleagues (2010), who concluded that reduced intensity may help trans women achieve a successful female presentation. This suggestion, however, was based on their observation of mean intensity in the speech of specific individuals in their study, rather than on a quantitative analysis. The findings of this study thus provide statistical support for their original conclusion that lower vocal intensity is associated with feminine speech.
Self-ratings of voice femininity in this study, on the other hand, contradict the notion that lower vocal intensity is a feature of a feminine voice. This unexpected finding may be a factor of speaker confidence. Confident speakers tend to speak with greater intensity (Kimble & Seidel, 1991). So, it may be that, in this sample, trans women who rated their voices as more feminine had more confidence in their vocal presentation and thus spoke with greater intensity.
An unexpected moderate correlation between F1 of /u/ and voice femininity ratings was also identified (self: r = 0.626; listener: r = 0.431). There is no evidence to suggest that this particular formant may have a special relationship with perceptions of voice femininity. Vowel formants in general often stand as an indirect measure of resonance, with women noted to use a "forward focus resonance" (Hirsch & Gelfer, 2012;p. 221). The typically higher vowel formants of female voices are partially explained by a more anterior tongue position (Carew et al., 2007). Because F2 correlates with tongue retraction, it is more closely associated with anterior resonance than F1. A correlation with only F1 of a given vowel is thus inconsistent with evidencebased assumptions about vowel formants, resonance, and gender norms. This correlation is likely a result of the inherent variability, and thus larger margin of error, in measuring F1 and F2 in connected speech.
No statistically significant correlation was found between voice femininity ratings and minimum SFF. While no past studies have considered a relationship between minimum SFF and self-ratings of voice femininity, Gelfer and Schofield (2000) evaluated whether this variable was related to listener ratings. They found no statistically significant correlation, and thus the results of the present study are consistent with their findings. Furthermore, minimum SFF within this study's sample and within participant groups was quite homogenous, so the probability of finding a correlation with this variable was low.
Additional moderate correlations were found between listener ratings and SFF variation (r = -0.431), F2 of /i/ (r = -0.410), and SPI (r = 0.319). These variables did not correlate with self-ratings of voice femininity. This suggests that trans women may be more sensitive to a different set of speech and voice features when assessing their own voices than do unfamiliar listeners. They may also apply a more rigid definition of voice femininity. This is an important consideration for speech-language pathologists seeking to help a client achieve a voice that meets the client's own voice goals while also being perceived as gender-congruent by others outside of the treatment environment.
The correlation found between listener ratings of voice femininity and SFF variation supports the results of past studies. Wolfe et al., 1990). These studies also demonstrated that feminine voices are characterized by more varied intonation. The present study used standard deviation of SFF as a measure of intonation, which is a common convention (Oates & Dacakis, 1997). Other studies have assessed intonation using semitone range  and frequency of specific inflectional patterns (ibid.; Wolfe et al., 1990). The results presented here confirm that degree of intonational variation is relevant in perceptions of femininity, but they neither suggest nor refute that direction of inflections (i.e., upward or downward) is influential in such perceptions. Thus, targeting more varied intonation without a particular focus on upward inflections may be appropriate in a voice training program.
In addition to F1 of /u/, listener ratings of voice femininity were also found to moderately correlate with F2 of /i/. A correlation with F2 of /i/ is less surprising than one with F1 of /u/, given that F2 is related to the degree of tongue retraction.
Furthermore, /i/ is a front vowel, meaning it is articulated in the front of the mouth. It may be that this vowel formant, being associated with anterior resonance, is uniquely related to perceptions of voice femininity. Further research is warranted, however, as this finding is limited to the present study, and the only other investigation into a relationship between vowel formants and voice femininity looked solely at the vowel /a/ (Hardy et al., 2016). And again, the inherent variability of measuring vowel formants in connected speech may have been a factor here.
A single variable associated with breathiness-SPI-was moderately correlated with listener ratings of voice femininity. Neither PPQ nor RAP showed such a relationship, and none of these variables were correlated with self-ratings. Breathiness has frequently been held as a characteristic of feminine voices (e.g., Gorham-Rowan & Morris, 2006). Assessing breathiness via objective acoustic measurements is not without its challenges, however. Past studies have taken different approaches to measuring breathiness (see ibid., 2006;Hardy et al., 2016;. And while higher SPI, PPQ, and RAP values tend to correlate with increased breathiness, it is not the case that a high SPI, PPQ, or RAP will always correspond to a perceptually breathy voice. It may be that other acoustic variables associated with breathiness, such as the voice turbulence index (VTI) or noise-to-harmonics ratio (NHR), may have been shown to correlate with voice femininity ratings in this study. It may also be beneficial, as suggested by Owen and Hancock (2010), to use perceptual ratings of breathiness (e.g., Likert or visual analog scales) in future correlational studies to more directly assess the relationship of breathiness and perception of voice femininity.
An interesting finding of this study was the tendency of trans women to rate their voices as more masculine than did listeners. Half of all trans women rated their voice as very masculine, whereas only 17 percent of listener ratings did so. Seven percent of listener ratings characterized the speaker's voice as very feminine, whereas not a single trans woman rated her voice as such. This again suggests that trans women may be sensitive to a different set of voice features than are listeners or may have a more rigid definition of voice femininity. These results may also suggest the presence of vocal dysphoria among the study sample, which may lead to hypercriticism of voice. It should also be noted that most of the trans women in this study were interested in receiving voice services, which would indicate some degree of dissatisfaction with voice that could have contributed to these self-ratings of voice femininity. Finally, these results indicate that achieving voice satisfaction for some trans women may be a more challenging goal than achieving a vocal presentation that others accept as congruent with the speaker's true gender.

Acoustic variables and gender perception
This study analyzed both femininity ratings and gender perceptions as a means of capturing the nuance of gender-based communication norms and the identity assumptions made within a binary system of gender. This reflected the complexity of gender identity and recognizes that female gender and femininity are not equivalent concepts. This approach thus allowed listeners to classify a speaker as male while also rating the speaker's voice as feminine, and vice versa. In fact, outcomes in which gender perception and femininity ratings were in conflict did occur in the data and are discussed in section 5.3.
The results identifying relationships between acoustic variables and gender perceptions were nevertheless very similar to those between acoustic variables and voice femininity. They showed that voices perceived to be quieter and higher-pitched with more varied intonation tended to be identified as voices of female speakers. That is, mean and maximum SFF, SFF variation, and intensity were found to moderately correlate with listener perceptions of gender (0.330 ≤ r ≥ 0.429. These results support the findings of past research into the importance of mean SFF (Gelfer & Schofield, 2000;Hardy et al., 2016;Holmberg et al., 2010), maximum SFF (Gelfer & Schofield, 2000), and SFF variation (Hancock et al., 2014;Wolfe et al., 1990) in perceptions of gender. They also provide further statistical support for Holmberg and colleagues' (2010) conclusion that intensity may contribute to a feminine vocal presentation.
Only the correlation of F1 of /u/ with gender perception (r = -0.448) represents a new finding, but there is no evidence in the literature to suggest that this particular vowel formant is uniquely associated with gender, and so this result may be considered anomalous.
The absence of a correlation with SPI (as well as the other breathiness measures) is a departure from the results measuring relationships with voice femininity ratings in this study. While SPI was moderately correlated with listener ratings of voice femininity, it was not correlated with listener perceptions of gender nor with selfratings of voice. The discussion above regarding the challenges of using objective acoustic measures to quantify breathiness applies here and may explain this discrepancy.

Gender-femininity discrepancies
There were several instances in which a speaker was identified as male but rated as having a feminine voice. Two of these speakers were cis men, and four were trans women. The data for these speakers are summarized in Table 5.1, and a qualitative analysis of these results is provided below. The voice features of the cis men (Speakers 1 and 2) were consistent with the rest of the study sample and with gender norms. The mean SFF of these two participants fell below the expected mean of men in the general population (i.e., 120 Hz), suggesting that these participants had a perceptually deep voice. The perceived femininity of their voices may be related to some other speech characteristic not assessed in this study, perhaps rate or articulation.
The voices of the trans women in this subgroup were more varied. This subgroup included Speaker 5, whose acoustic measures were comparable to the two cis men. It also included Speaker 6, for whom the mean, maximum, and standard deviation of SFF were higher than these same values for the cis male group in this study, but still within norms for male speakers in general. The standard deviation of SFF for Speaker 6 came close to the mean value of this variable for cis women in this study but still fell within the range for cis male participants. For these speakers, listeners may have been responding to the same unknown features of speech that led them to characterize the cis men-Speakers 1 and 2-as feminine.
Speakers 3 and 4, on the other hand, had higher means, maximums, and standard deviations of SFF than those of the cis male group, those of the two cis men with gender-femininity discrepancies, and those of trans women identified as male.
These speakers were also more frequently rated as having a feminine voice than the other individuals listed in Table 5.1. This study found that these same acoustic variables (i.e., mean, maximum, and standard deviation of SFF) correlated with listener ratings of voice femininity. Thus, for Speakers 3 and 4, these voice features may have sent signals of femininity to the listener, but some other-again, undetermined-speech characteristics led the listener to perceive the speaker as male.
Findings such as these demonstrate the complexity of gender perception and gender-based communication norms. It is clearly not the case, as has been previously demonstrated in the literature, that simply achieving an SFF within a feminine range or adopting other feminine speech characteristics is sufficient to be perceived as female. Identifying and quantifying relationships between acoustic voice features and perceptions of gender and femininity are challenging. These findings thus reveal the need for continued research into the identification of those speech characteristics-or combinations thereof-that are most salient to perceptions of gender and femininity.
As the speech of trans women remains understudied, there is thus insufficient evidence to determine which features of speech and voice contribute to the listener's perception of voice femininity.

Differences between trans-and cisgender voices
The identification of statistically significant differences in the speech and voice of trans and cis participants was limited by sample size. No comparisons could be made between cis women and trans women identified as female in this study because the latter group included a single individual. No statistically significant differences between cis men and trans women identified as male were found at the 0.05 significance level.
Some qualitative observations can be made, however, based on participants' self-ratings of voice femininity. All cis participants rated their ideal voice as having the same femininity rating as their current voice. Cis women who felt their voices were only somewhat female, did not mark a very female voice as their ideal. The same pattern was observed among cis men. All but one trans woman, on the other hand, reported that their ideal voice was more feminine than their current voice, with most targeting a very female voice. This difference suggests that trans women may set their voice goals toward a hyper-feminine presentation, while cis women may be more content with a voice anywhere within the range of gender-congruence.

Limitations
The results of this study are based on a relatively small sample size. Given the probability of a type I error (i.e., a false positive), this study is therefore most reliable in identifying strong correlations between variables. Furthermore, correlational studies do not identify causation, so no conclusions can be made regarding what caused study participants to rate or classify voices in the way they did.
This study targeted the voice features and related perceptions of trans women.
Cis women and cis men were included in the study as control subjects, thus providing a broad range of voices in terms of femininity/masculinity. Nevertheless, the results may not apply to persons of other gender identities, including trans men.
This study did not assess effects of listener characteristics on gender perceptions or femininity ratings, though there is some evidence that certain characteristics, such as sexual orientation, may be influential (see Hancock & Pool, 2017). Only data on age and gender of listeners was collected. Listeners covered a broad age range, comparable to that of the speaker sample, but all listeners were female. Thus, the effect of listener gender may have been a factor.
The listener responses in this study were based on audio-recordings of speakers reading a passage. This differs from speech in most natural contexts in two ways. First, reading aloud likely accounts for a small percentage of the communication contexts encountered in daily life. It will rarely be the type of speech on which gender assumptions are made. Also, the speech produced while reading aloud has been shown to differ from spontaneous speech (Hollien et al., 1997;Van Lancker et al., 2012).
While steps were taken to minimize the effect of speech task on the results (see chapter 3, section 3.4), some caution is warranted in interpreting results based only on speech produced while reading.
Second, the context of listener responses differed from typical communication exchanges in that there was a lack of non-speech cues to gender identity. Such cues as non-verbal communication, physical appearance, and name may signal the speaker's gender and are present in many daily interactions. Van Borsel, De Cuypere, and Van den Berghe (2001), in fact, found physical appearance to have a significant effect on gender perceptions when they compared listener/viewer responses to video, audio, and audiovisual recordings of trans women. Anecdotally, one trans woman in this study reported that she is nearly always addressed with female honorifics in phone conversations at work. In this study, she was identified as female 50% of the time, less frequently than in her personal experience. The discrepancy may result from the fact that, in these exchanges, the listener is privy to additional details (e.g., name, position in the company, word choice) that lead them to classify a speaker with a genderambiguous voice as female. In short, the context in which listeners responded to speakers in this study differed in some important ways from the contexts in which daily communication may occur.
Finally, this study was a quantitative analysis of trans women's voices. While the data allowed for some subjective observations, this study did not pursue a deeper qualitative assessment of how the different experiences of the trans women in this study, changing cultural norms, or other sociolinguistic factors may have exerted influence on the speakers and listeners who participated. Further research in this area is warranted.

CONCLUSIONS
This study investigated the relationships between acoustic variables of speech and voice, self-and listener ratings of voice femininity, and listener perceptions of gender.
The results identified several acoustic variables of speech and voice as salient markers of gender and femininity. Specifically, it showed that achieving a higher mean and maximum SFF and using more varied intonation contributed to the perception of a feminine vocal presentation by unfamiliar listeners. It also presented results consistent with the assumption that lower intensity is a feature of voices perceived as female, though self-ratings of voice did not adhere to this convention. Speaker confidence may have been a confounding factor for this variable. In short, mean and maximum pitch, SFF variation, and intensity were identified as appropriate treatment targets in an evidence-based voice training program for trans women.
This study did not generate conclusive evidence regarding the respective roles of vowel formants or acoustic correlates of breathiness in perceptions of gender and femininity. Unfortunately, the question of whether these features should be included in a treatment plan designed to achieve a feminine voice thus remains unanswered. While many studies have concluded that reaching feminine ranges of important acoustic measures related to pitch are not sufficient for ensuring a gender-congruent voice, it remains unclear which other acoustic measures, if any, are necessary to effect this treatment outcome. Additional research is needed to guide clinical decisionmaking and treatment-planning.
The pursuit of knowledge leading to effective treatments for trans clients will, like most scientific pursuits, be a continuing process. It is unlikely that the gender revolution currently underway-that is, the societal challenging of gender constructs and the expectations that arise from them-will soon change how listeners perceive the voices of others and make assumptions based on those perceptions. It is certain, however, that these changing cultural norms will hasten the call for speech-language pathologists to advance their knowledge of the clinical voice needs of clients across the gender spectrum. This study contributed data towards this goal and encourages future research to investigate the most salient features of voice and speech to target in treatment.