Testing the Redundancy of English
Author
Stephen Wolfram
Title
Testing the Redundancy of English
Description
What fraction of the letters in a piece of English text can be removed, while still allowing the text to be read?
Category
Educational Materials
Keywords
URL
http://www.notebookarchive.org/2019-08-98zxghc/
DOI
https://notebookarchive.org/2019-08-98zxghc
Date Added
2019-08-20
Date Last Modified
2019-08-20
File Size
14.57 kilobytes
Supplements
Rights
Redistribution rights reserved
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
data:image/s3,"s3://crabby-images/97e21/97e21d941045101921bcfd57c45c820c8eed2b93" alt=""
Testing the Redundancy of English
Testing the Redundancy of English
What fraction of the letters in a piece of English text can be removed, while still allowing the text to be read?
As a random sample of text, get the Wikipedia entry about “chicken”:
In[]:=
chtext=TextSentences[WikipediaData["chicken"]];
It contains 259 sentences:
In[]:=
Length[chtext]
Out[]=
259
Here is the 40th sentence:
In[]:=
chtext[[40]]
Out[]=
Hens cluck loudly after laying an egg, and also to call their chicks.
Pick a random sentence, removing 10% of the characters:
In[]:=
StringJoin[If[RandomReal[]<.1,"-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
B-eeds artifici-lly-developed for-egg production rarely go b--ody, and-those t-at do often-stop part-way-throu-h th- incubation-
Remove 20% of the characters, never removing a space:
In[]:=
StringJoin[If[RandomReal[]<.2&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-f -he eggs a--n't t-rned, the emb--o inside --y stick -o the s-ell and may hat-h -ith phy-ical -efect--
Remove 30% of the characters:
In[]:=
StringJoin[If[RandomReal[]<.3&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
" thr-wn ov-rboard wh-n -hey refused -o feed befor- -he bat-le o- D-e-an-, ----n- "-- they w-n'- eat, perh--- -hey wi-l d-ink."
Remove 40%:
In[]:=
StringJoin[If[RandomReal[]<.4&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
Uni-e-si-- -f -eo--i- P-es-.
Try it again:
In[]:=
StringJoin[If[RandomReal[]<.4&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
--l-s -eem- t- -ave be-- - c-nt-r -f --i--e- b-e---ng (Col--e-l-- -e R- R--ti-- -.-.4-.
Remove 50%:
In[]:=
StringJoin[If[RandomReal[]<.5&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-o-n F------ t-e ov---e-r -- --- -r--ect, st-ted -ha- -hi----- -ave ".--re--i-ed t-- -bi-ity -o m-k- t-e-h- -n--- -e-t-i- con-i--on-... -"----= -re-din- =----=-- -ri-i-- -==---- -omestic ch-ck-n i- -e------- p-i--r-l- fro- -he -ed -ung--f-w- (-a-lu- gallu-- -nd i- ------if--a--y c--ss----- -- the s--e sp---e--
Remove 70%---and the text becomes unreadable:
In[]:=
StringJoin[If[RandomReal[]<.7&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-he -ve--g- ---u-a--o- ---io- fo- ch----n- -s -- d-y- bu- --y ------ on t-- -e-------r- a-- humi-i-- -- --- -n--------
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
Cite this as: Stephen Wolfram, "Testing the Redundancy of English" from the Notebook Archive (2019), https://notebookarchive.org/2019-08-98zxghc
data:image/s3,"s3://crabby-images/afa7e/afa7e751d718eac7e65669706b85c714b1d1becc" alt=""
Download
data:image/s3,"s3://crabby-images/c9374/c9374a157002afb9ce03cd482ea9bc6b4ee16fc0" alt=""
data:image/s3,"s3://crabby-images/7630b/7630b01d225114cfa2bafc392f9b6df93ec5f7bb" alt=""