ESL Saigon logo

ESL Saigon

Connected speech

Connected speech and learners of English

Learners of English are very often aware and self critical of the way they speak English. All of them try and wish to speak "like a native speaker". More than that, they find it difficult to understand English speakers when heard at peace. One of the main reasons is connected speech.

Showing students what happen in rapid speech and designing tasks where they can identify different features of connected speech can help them developing their understanding. This can be done from very low levels.

A very simple sentence such as "How old are you?" can provide examples of intrusive /w/, catenation, and a weak form. Vietnamese learners of English often hear "How are you?" instead of "How old are you?" and vice versa.



In rapid speech, a final consonant sound moves over to join an initial vowel sound. Even fluent speakers of English often mishear.

Examples: An apple, get up, full on, beat it etc.


In rapid speech final /t/ and /d/ sounds often disappear. This is called elision.

Examples: Next day, last chance, must try, cold lunch etc.

Weak forms

Take a look at the phonetic transcription of the following words. They are pronounced differently in different situations.

Examples: Am /æm; unstressed əm, m/
What am I doing? /əm/
Yes, I am. /æm/
Are /ɑr; unstressed ər/
These cats are lovely. /ə/
Yes, they are. /ɑ:/
At / æt; unstressed ət, ɪt/
I will meet you at the school. /ət/
What are you looking at? /æt/
Can /kæn; unstressed kən/
I can speak English. /kən/
I know you can. /kæn/


It happens across word boundaries in rapid speech making the transition between sounds easier for the speaker.

It’s in that box.
Can you ask that man?

Connected speech – Rhythm

People always tried to develop machines that reproduce human speech. Those machines were supposed to join pre-recorded words together in order to form sentences. For short sentences this method worked well but for other purposes where long paragraphs had to be spoken the speech was practically unusable.

It is an interesting thing to look at the difference between the way humans speak and what would be found in mechanical speech of a machine. For sure we can learn many lessons about pronunciation learning and teaching.

It is often said that English speech is rhythmical and that the rhythm is detectable in the regular occurrence of stressed syllables. This theory is called stress-timed theory and it says that stressed syllables tend to occur at regular intervals whether they are or not separated by unstressed syllables. Of course, in mechanical speech of a machine, this will be a completely different case.

‘Walk ‘down the ‘path to the ‘end of the ca’nal.

In the example above the stressed syllables are marked with an apostrophe at the beginning of the syllable. The first two stressed syllables are not separated by any unstressed syllable. The second and the third stressed syllable is separated by one unstressed syllable, which is "the". The third and the forth stressed syllables are separated by two unstressed syllables, which are "to" and "the". The forth and the fifth stressed syllables are separated by three unstressed syllables, which are "of", "the", and "ca-".

The stress-timed rhythm theory says that the times from each stressed syllable to the next will be the same, irrespective of the number of the unstressed syllables that separates the stressed syllable.

There are some languages (e.g. Russian) that have the same stress-timed rhythm similar to that of English and there are some languages (e.g. French) that have different rhythmical structure called syllable-timed rhythm. In syllable-timed rhythm, all syllables, whether stressed or unstressed, tend to occur at regular time intervals and the time between stressed syllables will be shorter or longer according to the number of unstressed syllables that separates them.

The foot as a unit of rhythm.

According to this theory "the foot" begins with a stress syllable and includes all unstressed syllables that follow it up, excluding the next stressed syllable. Let’s divide into feet the following sentence.

(1) ‘Walk | (2) ‘down the | (3) ‘path to the | (4) ‘end of the ca- | (5) ‘-nal

Some theories based on this theory go even further than this simple division, and says that some "feet" are stronger than others. These stronger "feet" produce strong-weak patterns in large pieces of speech above the level of the "foot".

The word twenty has one strong and one weak syllable that can be graphically represented as is shown in the image below.

connected speech

The word "places" has the same form as the word "twenty" that can be graphically represented as is shown in the image below.

connected speech

The phrase "twenty places" can be represented graphically more complex as is shown in the tree diagram below. That’s because the word "places" carries stronger stress than the word "twenty".

connected speech

Of course, we can go even further and look at an even more complex phrase, such as "twenty places further back". We will end up with a very complex tree structure.

connected speech

The purpose of analyzing speech this way is to show the relationship between strong and weak elements and of course to show the different level of stress we find. A syllable is stronger according to the "s" symbols that occur above it. If we leave out the syllables that never receive stress at any level then we will have the following diagram.

connected speech

This pattern may be correct in the case of slow speech but in the case of normal speech things might be different.

Many speakers of English will agree that in "twenty places" the right hand foot is stronger than the left one (the word "places" is stronger than the word "twenty"). When "twenty places further back" is spoken in conventional style, "twenty" is stronger than "places". Why that happens?

It is well known that English speech tends to alternate regularly between stronger and weaker and adjusts the stress levels to make this possible. As a quick example, let’s consider the words "compact" and "compact disk". The second syllable of the word "compact" is stronger when the word is pronounced alone. When it is pronounced as "compact disk", the first syllable of the word "compact" is stressed (stronger).

We can conclude that the stress is altered according to context. Why is that?

One factor, which is just an additional factor, is that we vary in how rhythmically we speak. Sometimes we speak very rhythmically (when you have a well prepared speech) and sometimes we speak without rhythm (usually when we are hesitant or nervous). So, we can say that stress-timed rhythm is not characteristic of English speech, but it is characteristic of one style of speaking.

Many foreign learners of English practice speaking English with a regular rhythm. I have seen teachers clapping hands on the stressed syllables so learners can pronounce the syllable stronger than the others. I must point out that there are many laboratory techniques to measure time in speech and time between stressed syllables in connected English speech but so far the results don’t show a clear regularity. Also, using the same techniques on different languages, it has not been possible yet to show a real difference between stress-time and syllable-time languages. For some reasons humans tend to hear speech more rhythmical than it actually is.

There are many arguments about rhythm but that doesn’t have to make us forget about the importance of understanding the difference between strong and weak syllables. Learners such as Japanese (and many others), since their native language doesn’t have weak syllables like English does, might find exercises like repeating strongly rhythmically utterances very useful, as long as they are not overdone.

Back to index