Testing the Redundancy of English
                    
                    
                    
                    
                Author
Stephen Wolfram
Title
Testing the Redundancy of English
Description
What fraction of the letters in a piece of English text can be removed, while still allowing the text to be read?
Category
Educational Materials
Keywords
URL
http://www.notebookarchive.org/2019-08-98zxghc/
DOI
https://notebookarchive.org/2019-08-98zxghc
Date Added
2019-08-20
Date Last Modified
2019-08-20
File Size
14.57 kilobytes
Supplements
Rights
Redistribution rights reserved
 Download
Download Open in Wolfram Cloud
Open in Wolfram Cloud
Testing the Redundancy of English
Testing the Redundancy of English
What fraction of the letters in a piece of English text can be removed, while still allowing the text to be read?
As a random sample of text, get the Wikipedia entry about “chicken”:
In[]:=
chtext=TextSentences[WikipediaData["chicken"]];
It contains 259 sentences:
In[]:=
Length[chtext]
Out[]=
259
Here is the 40th sentence:
In[]:=
chtext[[40]]
Out[]=
Hens cluck loudly after laying an egg, and also to call their chicks.
Pick a random sentence, removing 10% of the characters:
In[]:=
StringJoin[If[RandomReal[]<.1,"-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
B-eeds artifici-lly-developed for-egg production rarely go b--ody, and-those t-at do often-stop part-way-throu-h th- incubation-
Remove 20% of the characters, never removing a space:
In[]:=
StringJoin[If[RandomReal[]<.2&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-f -he eggs a--n't t-rned, the emb--o inside --y stick -o the s-ell and may hat-h -ith phy-ical -efect--
Remove 30% of the characters:
In[]:=
StringJoin[If[RandomReal[]<.3&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
" thr-wn ov-rboard wh-n -hey refused -o feed befor- -he bat-le o- D-e-an-, ----n- "-- they w-n'- eat, perh--- -hey wi-l d-ink."
Remove 40%:
In[]:=
StringJoin[If[RandomReal[]<.4&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
Uni-e-si-- -f -eo--i- P-es-.
Try it again:
In[]:=
StringJoin[If[RandomReal[]<.4&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
--l-s -eem- t- -ave be-- - c-nt-r -f --i--e- b-e---ng (Col--e-l-- -e R- R--ti-- -.-.4-.
Remove 50%:
In[]:=
StringJoin[If[RandomReal[]<.5&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-o-n F------ t-e ov---e-r -- --- -r--ect, st-ted -ha- -hi----- -ave ".--re--i-ed t-- -bi-ity -o m-k- t-e-h- -n--- -e-t-i- con-i--on-... -"----= -re-din- =----=-- -ri-i-- -==---- -omestic ch-ck-n i- -e------- p-i--r-l- fro- -he -ed -ung--f-w- (-a-lu- gallu-- -nd i- ------if--a--y c--ss----- -- the s--e sp---e--
Remove 70%---and the text becomes unreadable:
In[]:=
StringJoin[If[RandomReal[]<.7&&#=!=" ","-",#]&/@Characters[chtext[[RandomInteger[259]]]]]
Out[]=
-he -ve--g- ---u-a--o- ---io- fo- ch----n- -s -- d-y- bu- --y ------ on t-- -e-------r- a-- humi-i-- -- --- -n--------


Cite this as: Stephen Wolfram, "Testing the Redundancy of English" from the Notebook Archive (2019), https://notebookarchive.org/2019-08-98zxghc
		
Download

