We know "the" is the most occurring word in English language. I want to find out how many 'the' occurs in a sentence normally. In a passage of 100 words, mostly 7 'the' occurs. The probability of occurrence is 0.07. Also, it is found that normally English sentence contains 19 words.
The probability of occurrence of two 'the' in a sentence of 19 words is,
= C(19,2)* 0.07^2 * 0.93 ^17 = 24.4%
Let us understand the formula step by step.
1. The probability of 'the' not being present is 1-0.07 =0.93. The probability of 'the' not being one of the 17 words is 0.93^17.
2. The chance of 'the' appearing two times is 0.07^2.
3. C(19,2) is called combination or binomial co-efficient. It gives, the no.of ways two 'the' and the remaining 17 words can be arranged. Refer foot-note. All the three factors should be multiplied to get the correct probability. It gives 0.244 or 24.4%.
Similarly what is the chance of 'the appearing one time only in 19 word sentence.
= C(19,2) * 0.07 * 0.93 ^18 =0.36=36%
Again, the chance of 'the not being present even one time is
=C(19,0)*0.07*0.93^19 = 0.252=25.2%
Add all the three probabilities.
The zero time =25.2
one time =36.0
two time = 24.4
85.6%
Hence 'the ' appearing zero, one or two times in a sentence is more than 85%. "the' occurring more than two times in a normal sentence is very unlikely.
In similar way, we can analyze English text. One more result is the average word length is 5 letters. You can do text analysis using 'advanced text analysis' featured on the website English.com.
--------------------------------------------------------------------------------------------------
Foot note:
How many ways you can arrange one A two B.
ABB, BAB, BBA - 3 ways
combination = Factorial 3/Factorial 1*factorial(3-1) = 3*2.1/1*1*2 =3
This is how we find the no. of ways of arranging two 'the' and 17 words. Online calculators are available to find combinations.
Comments
Post a Comment