Just try to create a five letter word randomly,
tidkl,cewkx,dmwol,
vuptg,hvwjk,naqid
The words are not only meaningless but also difficult to pronounce.
Actually, how many five letter word you can make in English language.
26*26*26*26*26 =11881376 words.
because, there are 26 choices for each position.
English has nearly 200000 words. There may be 40000 real meaningful 5 letter words.
40000/11881376 = 0.003 or 0.3%
If you or a computer generates five letter words randomly, there is only 0.3% chance for getting real words.
You see, coining a real word or name is very difficult. People want new names for new born child, medicine, fictional characters etc. Let us try to get real new names.
1. All English words has vowels. Hence, add a vowel or two deliberately in each words.
Now, actot, verrs,aglgo...
Some words can be pronounced. A little improvement.
2. The frequency of each letter in English language is known and given below.
a b c d e f g h i j k l m n o p q r s t u v s t u v w x y z
82 15 28 42 127 22 20 61 70 2 8 40 24 67 75 19 1 60 63 90 27 10 24 2 20 1
That is, if a passage contain 1000 letters, 'a' is likely to appear 82 times; 'b' 15 times, z one time and so on.
Imagine die or dice having 1000 faces. 'a' is engraved on 82 faces in it; b on 15 faces; z on only one face and so on. Now roll the multifaceted die 5 times. each time note down the letter that appears and coin a five letter word. (This thought experiment can easily be simulated by a computer). Now, we get these kind of words.
elnao, segty, least, soyie, laarm
More important. Most can be pronounced. Some are close to real life words. Some success.
3. A given letter is likely to followed by certain letters. For example, 't' is mostly followed by 'h'. 'o' is followed by 'a'.
For each of 26 letters, we can find highly probable following letter.(including space)
Incorporating this idea in the computer algorithm, we get this result.
"the cur the bund hof arytowno....
Now, we get not words but sentences -of course-nonsense.
4. Give a sample passage to the computer.
1. Computer select a letter randomly say 't'
2. Using the passage, computer will find out the letter which is most likely to follow 't'. It is 'h' we know.
3. Again using the same passage computer will pick up a letter which is most likely to follow the pair "th" . It is 'e'
4. Next, computer may find out the letter that will follow the three letter combine 'the'
This process can be extended up to 4 letter or 5 letter combine.
The entire algorithm can be repeated again and again to coin 5 letter words. Here is the result.
"ther was just in time it all seemed quite natural.
Looks good. Some day computer may write a poem or even a epic.
----------------------------------------------------------------
Comments
Post a Comment