Skip to main content

HOW MANY ENGLISH WORDS DO YOU USE IN YOUR WRITING?

   

         In normal dictionary, the words are arranged according to alphabetical order.  But we can arrange the words according the frequency of occurrence.  That is, most used words will appear first, less used words follows.  Then each word will have a rank and frequency.  If we multiply rank and frequency, we get a number.  That number will be the same (more or less) for all the words and it is a constant.  That is zipf's law.
[In the above paragraph, five 'the' are present out of 70 words. hence frequency percentage is 5*100/70=7.1%]
     Frequency percentage * rank = constant.
     The most used words in English is 'The'.  It occurs 7 times in 100 words.  Or the percentage of frequency is 7 % .  A list is given below.

Rank R     word       frequency            constant
                             percentage F           F * R = C

   1.            the                6.8%            1 * 6.8 =6.8 %
   2.             of                3.1                2 * 3.1 = 6.2
   3.             to                2.7                3 * 2.7 = 8.1
   4.             and             2.6                4 * 2.6 = 10.4
   5.              in               1.8                5*1.8 = 9.0
   6.              is                1.2                6*1.2 = 7.2
   7               for              1                   7*1 = 7
   8               that             0.8                8 * .8 = 6.4

     The constant of proportionality in English language is about 7.5% or .075

     In a nutshell : We can have English frequency dictionary.  The frequency of a word is inversely proportional to the rank. The constant of proportionality is 0.075.

     Coming to our title question.  Let us find out how much typical English text is made of top 1000 words.  Or most commonly used 1000 words.
     We have to note down the frequency percentage of each word (ranked 1 to 1000 ) from the dictionary.  And add all of the percentages.  Then we will get total percentage of English text that is written using top 1000 words.
     There is a mathematical short-cut.  If you are allergic to math, you can skip this portion.

     Frequency  * rank = constant
     frequency % = constant /rank
     So frequency percentage of first 1000 words
           = 0.075/1 +0.075/2 + ..... 0.075/1000
or       = 0.075(1/1+1/2+1/3+ ....+0.075/1000)
          = 0.075(log(1000) + 0.58)     math formula
          = 0.56
          =56%

     That is, more than half of English text that we write or read is only made up of 1000  words.  A lay man or learned man may mostly use 3000 words.  Even Shakespeare said to have known only around 30 thousand words.  But the mighty English language has about 300 thousands words.  How little we know?

     The zipf's law hold good for,
  Ordering 
1. companies by staff.
2. Universities by number of students.
3. Languages by number of speakers.
4. Websites by hits.
5. Cities by population
6. Countries by area.
 and so on.

FOOT NOTE:
     Amazon e books can be ranked according to daily sales and zipf's law can be applied.  Even though there are millions of e books only top few hundred books make more than 50% of the sales daily.
-------------------------------------------------------------------------------------------------

Comments

  1. Great Post with useful information. Thank you. Share more updates.
    IELTS Classes Anna Nagar

    ReplyDelete

Post a Comment

Popular posts from this blog

LISSAJOUS FIGURES

  Definition:  "When a particle is subjected to two sine wave motion or two oscillatory motion at right angles, the particle describes lissajous figures".      We know sine wave motion and circular motion is basically same.  Hence we draw two circles A and B perpendicular to each other.  The circle B rotates twice faster than circle A.  That is, frequency of circle B is two times than that of A.        A particle at the intersection of two circles is subjected to two sine wave motion   A and B at 90 degree simultaneously.  The particle will describe figures depending on the frequency and phase of A and B .  In our case, the ratio of frequency is  1:2 and the two waves are in phase.        To draw lissajous figures :  A moving point in both the circles are chosen.   Here we should remember; during the time taken by the circle A to complete one rotation, circle B completes two.  Hence the points are marked on the circles according to their speed.  Then straight lines

THE PARABOLA

          A jet of water shooting from a hose pipe will follow a parabolic path.  What is the so special about parabola.    Y= x^2 Draw a graph for the above equation.  It will result in a parabola.  This parabola is also called unit parabola.  Any equation involving square will yield a parabola. Example:  Y = 2x^2 +3x+3 (also called quadratic equation)    X= 2 and -2, both  satisfies the equation 4 = X^2.  Parabolic equations always have two solutions.     Any motion taking place freely under gravity follows parabolic path. Examples:   An object dropped from a moving train,   A bomb dropped from flying plane,  A ball kicked upwards.      If a beam of light rays fall on the parabolic shaped mirror, they will be reflected and brought to focus on a point.  This fact is made use of in Dish Antenna, Telescope mirrors, etc.      Inverted parabola shape is used in the construction of buildings and bridges.  Because the shape is able to bear more weight.      A plane

CASINO'S GAME

           Let us find out how the casino survives with mathematics.      Say, your friend invite you for a game of dice.  You must bet (wager) 2 dollars.  If you roll 'six' you will get back 8 dollars.  The game will go on for 30 rounds.  All sounds good.      The probability of rolling 'six' is 1/6.  Since the game will be played for 30 times, the 'expected win' is 30*1/6 = 5.  That is, you are expected to win 5 rounds out of 30.  Hence your gain will be 5 * 8 =40 dollars.  ok.  This also implies that you will loose 25 rounds.  Hence your loss will be 25*2 =50 dollars.  Your net gain will be gain-less = 40-50 = -10 dollars. For 30 rounds, the loss is -10 dollars, Hence, for one round =-10/30 = -1/3 dollars.  There will be a loss of -1/3 or 0.33 dollars per round.  It is not a fair game.     Let us make a simple formula to calculate  'Pay out per round\. The probability for a win = p The pay-out in case of win = V No. of rounds = n The expect