Educational Materials

Simplified Machine-Learning Workflow #7

Anton Antonov

Author

Anton Antonov

Title

Simplified Machine-Learning Workflow #7

Description

Semantic Analysis (Part 2)

Latent Semantic Analysis (Part 2)

A Wolfram livecoding session

Anton Antonov
December 2019

Session overview

Motivational example -- full blown LSA workflow.

Fundamentals, text transformation (the hard way):

bag of words model,

stop words,

stemming.

The easy way with LSAMon.

Eat your own dog food.

Data

Out[]=

Dataset name	Collection size	Description	Why use it?	Source
"Hamlet" (play parts)	222 documents	The different parts in Shakespeare's play.	Small enough to digest.	GitHub
US State of Union speeches	252 documents	US presidents speeches.	Good English similar document length.	GitHub
Airbnb reviews	~525,000 documents	Airbnb reviews by customers.	A lot of relatively tight descriptions.	http://insideairbnb.com/get-the-data.html
Raku documentation	399 documents	Technical documentation of Raku (Perl 6).	Very instructive and easy to explain.	https://github.com/Raku/doc
Movie reviews	~10,000 documents	Movie reviews (with labels "positive"/"negative").	Good text data available in WL.	ExampleData[{"MachineLearning", "MovieReview"}]
"The Idiot" (chapters)	50 documents	Chapters of the novel by Dostoevsky in both English and Russian.	Show translation possibilities.	GitHub
Data science class questions	99 documents	Student questions from Data Science class teaching.	Eat your own dog food.	GitHub

Dimensionality reduction functions at WFR

Already approved: IndependentComponentAnalysis.

Submitted, right now available on my WRI cloud: NonNegativeMatrixFactorization.

Full LSA workflow (over Raku documentation)

Raku?

Formerly known as “Perl 6”.

In[]:=

WebImage["https://www.raku.org"]

Out[]=

Where from?

In[]:=

WebImage["https://github.com/Raku/doc"]

Out[]=

Natural language commands

In[]:=

lsaNCRakuDoc=ToLSAMonWLCommand["create with aDocuments;make the document term matrix;show data summary;apply the LSI functions IDF, TermFrequency, Cosine;extract 36 topics using the method NNMF and 12 max steps;show the topics table with 9 table columns;show thesaurus table of regex, array, chars, role, grammar;"];

Explanations the hard way

In[]:=

Topics extraction

In[]:=

movieReviews=ExampleData[{"MachineLearning","MovieReview"},"Data"];Dimensions[movieReviews]

Out[]=

{10662}

In[]:=

movieReviews〚All,2〛="tag:"<>#&/@movieReviews〚All,2〛;

In[]:=

aMovieReviews=AssociationThread[Range[Length[movieReviews]]Map[StringRiffle[List@@#," "]&,movieReviews]];RandomSample[aMovieReviews,2]

Out[]=

5468if you value your time and money , find an escape clause and avoid seeing this trite , predictable rehash . tag:negative,5384imagine the cleanflicks version of 'love story , ' with ali macgraw's profanities replaced by romance-novel platitudes . tag:negative

In[]:=

aMovieReviews=Select[aMovieReviews,StringQ[#]&&StringLength[#]>10&];RandomSample[aMovieReviews,2]

Out[]=

1135beautifully observed , miraculously unsentimental comedy-drama . tag:positive,1692flawed , but worth seeing for ambrose's performance . tag:positive

In[]:=

aMovieReviews2=DeleteStopwords[Select[StringSplit[#],StringLength[#]>0&]]&/@aMovieReviews;

In[]:=

aMovieReviews2〚1;;12〛

Out[]=

1{rock,destined,21st,century's,new,",conan,",going,make,splash,greater,arnold,schwarzenegger,,,jean-claud,van,damme,steven,segal,.,tag:positive},2{gorgeously,elaborate,continuation,",lord,rings,",trilogy,huge,column,words,adequately,describe,co-writer/director,peter,jackson's,expanded,vision,j,.,r,.,r,.,tolkien's,middle-earth,.,tag:positive},3{effective,too-tepid,biopic,tag:positive},4{sometimes,like,movies,fun,,,wasabi,good,place,start,.,tag:positive},5{emerges,rare,,,issue,movie,honest,keenly,observed,feel,like,.,tag:positive},6{film,provides,great,insight,neurotic,mindset,comics,--,reached,absolute,top,game,.,tag:positive},7{offers,rare,combination,entertainment,education,.,tag:positive},8{picture,literally,showed,road,hell,paved,good,intentions,.,tag:positive},9{steers,turns,snappy,screenplay,curls,edges,;,clever,want,hate,.,somehow,pulls,.,tag:positive},10{care,cat,offers,refreshingly,different,slice,asian,cinema,.,tag:positive},11{film,worth,seeing,,,talking,singing,heads,.,tag:positive},12{really,surprises,wisegirls,low-key,quality,genuine,tenderness,.,tag:positive}

In[]:=

lsLongForm=Join@@MapThread[Thread[{##}]&,Transpose[List@@@Normal[aMovieReviews2]]];

In[]:=

Dataset[RandomSample[lsLongForm,40]]

Out[]=

2647	tag:positive
2891	disfrutable
419	recommend
2397	tradition
8477	tag:negative
5789	o
6047	ago
146	.
3554	diverting
3325	.
1391	film
9154	.
2963	tag:positive
4075	richly
3658	tag:positive
1691	represents
4001	heft
371	.
4384	highly
8574	gander
showing 1–20 of 40

In[]:=

aStemRules=Dispatch[Thread[Rule[#,WordData[#,"PorterStem"]&/@#]]&@Union[lsLongForm〚All,2〛]];lsLongForm〚All,2〛=lsLongForm〚All,2〛/.aStemRules;

In[]:=

aTallies=Association[Rule@@@Tally[lsLongForm〚All,2〛]];aTallies=Select[aTallies,#>20&];Length[aTallies]

Out[]=

981

In[]:=

TakeLargest[aTallies,12]

Out[]=

.14010,,10037,tag:neg5331,tag:posit5331,film1659,movi1476,like805,"655,--630,make611,stori519,time463

In[]:=

lsLongForm=Select[lsLongForm,KeyExistsQ[aTallies,#〚2〛]&&StringLength[#〚2〛]>2&];

In[]:=

ctObj=ResourceFunction["CrossTabulate"][lsLongForm,"Sparse"True];

In[]:=

ResourceFunction["CrossTabulate"][RandomSample[lsLongForm,12]]

Out[]=

	bad	cinema	femal	half	happi	make	movi	rare	stori	tag:neg	tag:posit
692	0	0	0	0	0	0	0	0	0	0	1
1231	0	0	0	0	0	0	0	1	0	0	0
1779	0	1	0	0	0	0	0	0	0	0	0
1902	0	0	0	0	0	0	0	0	0	0	1
2636	0	0	0	0	0	0	0	0	1	0	0
3681	0	0	0	0	0	1	0	0	0	0	0
5214	0	0	0	0	1	0	0	0	0	0	0
5801	0	0	0	0	0	0	0	0	0	1	0
5822	1	0	0	0	0	0	0	0	0	0	0
7954	0	0	0	1	0	0	0	0	0	0	0
8277	0	0	0	0	0	0	1	0	0	0	0
9854	0	0	1	0	0	0	0	0	0	0	0

In[]:=

CTMatrixPlot[x_Association/;KeyExistsQ[x,"SparseMatrix"],opts___]:=MatrixPlot[x["SparseMatrix"],Append[{opts},FrameLabel{{Keys[x][[2]],None},{Keys[x][[3]],None}}]];CTMatrixPlot[ctObj]

Out[]=

In[]:=

matCT=N[ctObj["SparseMatrix"]];

In[]:=

ResourceFunction["ParetoPrinciplePlot"][Total[matCT,{1}]]

Out[]=

In[]:=

matCT=matCT.SparseArrayDiagonalMatrixLogDimensions[matCT]1Total[matCT,{1}];

In[]:=

matCT=matCT/Sqrt[Total[matCT*matCT,{2}]];

In[]:=

SeedRandom[8966]matCT2=matCT〚RandomSample[Range[Dimensions[matCT]〚1〛],4000],All〛

Out[]=

SparseArray

Specified elements: 25269

Dimensions: {4000,946}



In[]:=

SeedRandom[23];AbsoluteTiming{W,H}=

ResourceFunction[

]

[matCT2,24,MaxSteps12,"Normalization"Right];

Out[]=

{14.5987,Null}

In[]:=

Dimensions[W]

Out[]=

{4000,24}

In[]:=

Dimensions[H]

Out[]=

{24,946}

matCT2≈W.H

In[]:=

Multicolumn[Table[Column[{Style[ind,Blue,Bold],ColumnForm[Keys[TakeLargest[AssociationThread[ctObj["ColumnNames"]->Normal[H〚ind,All〛]],10]]]}],{ind,Dimensions[H]〚1〛}],8,DividersAll]

Out[]=

film

tag:neg

funni

minut

humor

problem

know

que

end

death

wai

emot

feel

moment

pretenti

want

origin

american

quit

try

littl

work

move

piec

end

histori

melodrama

strong

amus

better

tag:posit

come

famili

charm

heart

deliv

hard

film

affect

man

laugh

feel

lot

offer

long

better

get

leav

end

els

make

direct

great

script

drama

turn

director

know

cast

act

charact

need

piec

studi

titl

fun

plot

dialogu

movi

complet

stori

humor

sens

love

tell

told

compel

old

make

set

bad

realli

thing

plai

better

come

think

joke

tag:neg

gui

entertain

moment

tag:posit

offer

mildli

classic

tale

worth

busi

set

time

run

wast

spend

reveal

minut

look

long

take

episod

watch

fun

girl

peopl

pleasur

experi

easi

actual

dull

realiz

tag:neg

feel

drama

work

come

plai

predict

episod

script

love

screen

big

life

bore

pictur

documentari

action

human

origin

thriller

real

world

tag:posit

fun

psycholog

set

ultim

predict

year

good

intent

girl

fun

subject

certainli

idea

matter

nearli

despit

perform

cast

tag:posit

fine

lead

power

except

lot

actor

intellig

comedi

romant

black

action

hilari

surpris

charm

sci-fi

american

sweet

movi

tag:neg

dull

gener

ultim

make

start

bore

mess

action

funni

tag:posit

move

touch

ultim

realli

sharp

dramat

documentari

flick

feel

movi

plai

look

tag:posit

dai

seen

sound

work

best

year

action

year'

far

tag:posit

better

seen

worst

movi

engag

kind

look

tag:posit

filmmak

audienc

end

reveal

effect

final

just

minut

plain

past

wast

mediocr

move

disturb

bore

interest

Statistical thesaurus

In[]:=

SeedRandom[898];rinds=Flatten[Position[ctObj["ColumnNames"],#]&/@Map[WordData[#,"PorterStem"]&,{"tag:positive","tag:negative","book","amusing","actor","plot","culture","comedy","director","thoughtful","epic","film","bad","good"}]];rinds=Sort@Join[rinds,RandomSample[Range[Dimensions[H]〚2〛],16-Length[rinds]]];Multicolumn[Table[Column[{Style[ctObj["ColumnNames"]〚ind〛,Blue,Bold],ColumnForm[ctObj["ColumnNames"]〚Flatten@Nearest[Normal[Transpose[H]]"Index",H〚All,ind〛,12]〛]}],{ind,rinds}],8,DividersAll]

Out[]=

actor

solid

central

except

remark

rivet

memor

combin

terrif

line

gift

chang

bad

realli

thing

plai

better

think

joke

gui

boi

premis

sai

comfort

comedi

romant

black

hilari

surpris

action

sci-fi

american

teen

witti

sweet

romanc

director

act

write

turn

know

amount

standard

wit

view

style

throughout

point

film

minut

humor

problem

que

know

death

cliché

end

suffer

honest

ambiti

plot

dialogu

clever

star

fresh

deepli

idea

chang

sentiment

stupid

creat

complet

tag:posit

come

famili

charm

heart

deliv

hard

man

believ

affect

power

live

thought

tragedi

beyond

pack

creativ

pass

seri

soap

teen

sure

friendship

event

amus

smart

strong

melodrama

bit

product

weak

high

star

skill

deepli

plenti

book

pure

welcom

man'

femal

reflect

wed

celebr

outrag

creepi

novel

dream

cultur

rare

film'

occasion

delight

warm

disnei

fight

live

respect

seat

coming-of-ag

epic

keep

consider

detail

worthi

somewhat

triumph

marvel

tone

gorgeou

member

challeng

good

intent

girl

subject

certainli

fun

idea

nearli

matter

tediou

despit

folk

tag:neg

dull

lack

episod

exercis

predict

ultim

gener

plai

problem

start

clich

technic

confus

slow

purpos

grief

histor

voic

water

atmospher

clear

chemistri

reli

type

steal

heavi

whatev

latest

doubt

adam

given

balanc

poetic

record

tragic

The easy way

In[]:=

SeedRandom[23];lsaMovieReviews=LSAMonUnit[RandomSample[aMovieReviews,4000]]⟹LSAMonMakeDocumentTermMatrix[{},Automatic]⟹LSAMonEchoDocumentTermMatrixStatistics⟹LSAMonApplyTermWeightFunctions["IDF","TermFrequency","Cosine"]⟹LSAMonExtractTopics["NumberOfTopics"24,"MinNumberOfDocumentsPerTerm"20,Method"NNMF",MaxSteps20]⟹LSAMonEchoTopicsTable["NumberOfTableColumns"8];

Context value "documentTermMatrix":

Dimensions:

{4000,11117}

Density:

0.00104473

Number ofdocuments per termsummary



1 # documents

1st Qu	1
Median	1
Min	1
3rd Qu	3
Mean	4.17892
Max	4000



topics table:

1.000	real
0.138	life
0.128	laughs
0.069	point
0.062	positive
0.057	smart
0.057	love
0.052	negative
0.048	live
0.047	subject
0.040	plot
0.039	clever

1.000	entertaining
0.643	new
0.116	positive
0.114	film
0.076	negative
0.062	love
0.059	feature
0.051	old
0.047	minute
0.045	little
0.044	watch
0.041	directed

1.000	like
0.190	plays
0.094	film
0.092	negative
0.059	series
0.054	old
0.048	watching
0.044	title
0.043	positive
0.040	quite
0.039	special
0.038	life

1.000	amusing
0.665	think
0.176	kids
0.174	minutes
0.115	moments
0.111	premise
0.105	cast
0.103	comedy
0.094	negative
0.094	american
0.081	positive
0.067	turns

1.000	isn
0.117	end
0.088	negative
0.085	nearly
0.078	worth
0.072	engaging
0.059	film
0.055	compelling
0.054	true
0.051	original
0.042	romance
0.039	despite

1.000	don
0.108	kids
0.088	know
0.065	care
0.064	didn
0.059	think
0.056	actors
0.055	women
0.050	people
0.049	need
0.048	far
0.048	negative

1.000	funny
0.274	positive
0.170	comedy
0.163	film
0.140	surprisingly
0.112	smart
0.085	moving
0.081	sweet
0.074	romantic
0.065	pretty
0.056	quite
0.056	life

1.000	long
0.159	takes
0.140	time
0.071	minutes
0.069	negative
0.066	way
0.064	good
0.059	slow
0.057	hour
0.057	ending
0.056	seen
0.054	stuff

1.000	work
0.146	quite
0.119	piece
0.069	positive
0.058	moving
0.058	art
0.053	make
0.052	negative
0.049	tale
0.043	actors
0.041	mind
0.038	narrative

1.000	bad
0.098	negative
0.059	idea
0.047	really
0.039	film
0.037	certainly
0.037	good
0.037	acting
0.034	sort
0.031	think
0.030	cinema
0.026	entertainment

1.000	drama
0.121	comedy
0.108	positive
0.104	romance
0.070	acted
0.062	family
0.050	solid
0.049	directed
0.048	compelling
0.045	negative
0.045	screen
0.045	little

1.000	just
0.090	negative
0.058	interesting
0.053	mess
0.042	sense
0.040	going
0.039	idea
0.039	viewers
0.038	tries
0.038	didn
0.035	level
0.035	better

1.000	doesn
0.084	know
0.080	comedy
0.072	comes
0.069	cast
0.069	good
0.061	negative
0.056	feel
0.052	laughs
0.050	picture
0.046	moments
0.045	things

1.000	makes
0.153	comic
0.133	serious
0.114	silly
0.107	look
0.106	screen
0.093	positive
0.084	film
0.079	book
0.073	lacks
0.053	comedy
0.048	american

1.000	characters
0.338	plot
0.118	script
0.110	negative
0.085	film
0.084	audience
0.073	takes
0.069	good
0.056	gives
0.055	predictable
0.054	engaging
0.052	dialogue

1.000	movie
0.122	negative
0.072	positive
0.064	performances
0.057	little
0.053	make
0.049	kind
0.044	going
0.043	slow
0.037	great
0.037	want
0.036	better

1.000	really
0.644	time
0.634	hard
0.177	film
0.151	negative
0.139	tries
0.115	thing
0.089	right
0.084	say
0.082	interesting
0.072	kind
0.070	good

1.000	fun
0.123	watch
0.091	positive
0.064	family
0.060	subject
0.046	watching
0.036	going
0.035	make
0.034	takes
0.034	despite
0.030	film
0.026	comedy

1.000	lot
0.205	love
0.125	things
0.083	better
0.073	negative
0.071	thriller
0.061	material
0.047	people
0.045	positive
0.040	film
0.036	short
0.033	wit

1.000	best
0.245	year
0.115	positive
0.100	film
0.089	films
0.082	seen
0.079	thing
0.077	documentary
0.060	years
0.056	little
0.044	director
0.042	worst

1.000	story
0.132	love
0.102	positive
0.080	original
0.075	little
0.072	great
0.068	film
0.054	negative
0.052	way
0.051	wit
0.048	manages
0.047	age

1.000	movies
0.380	films
0.292	year
0.066	mess
0.065	positive
0.059	negative
0.054	worst
0.052	making
0.051	make
0.050	children
0.046	ultimately
0.046	watching

1.000	feels
0.758	feel
0.157	short
0.149	good
0.119	minutes
0.101	film
0.097	negative
0.092	young
0.085	life
0.072	book
0.063	way
0.060	ending

1.000	heart
0.150	mind
0.130	lacks
0.125	pretentious
0.104	positive
0.062	right
0.058	film
0.042	mess
0.041	charm
0.037	little
0.034	negative
0.032	way

In[]:=

lsaMovieReviews⟹LSAMonEchoStatisticalThesaurus["Words"{"film","movie","director","bad","good"}];

statistical thesaurus:

term	statistical thesaurus entries
bad	{bad,idea,negative,certainly,acting,sort,cinema,good,entertainment,dialogue,actors,direction}
director	{director,girl,say,tale,world,art,times,visual,john,bit,narrative,self}
film	{film,good,way,look,thing,positive,sweet,surprisingly,comedy,smart,documentary,original}
good	{good,way,short,ending,director,young,predictable,simply,rare,cast,ultimately,thing}
movie	{movie,negative,performances,little,make,kind,going,slow,want,great,pretty,better}

Eat your own dog food

That is of course a different version of "each one should eat his own turtles".

"To sermon or not to sermon"


Q & A with study group at Stony Brook University, New York

In[]:=

WebImage["https://github.com/antononcube/MathematicaForPrediction/blob/master/Data/Big-Data-in-Healthcare/HHA-551-Questions-for-Anton.txt"]

Out[]=

In[]:=

WebImage["https://healthtechnology.stonybrookmedicine.edu/programs/ahi/course-schedule"]

Out[]=

Calculations

In[]:=

lsQuestions=Import["https://raw.githubusercontent.com/antononcube/MathematicaForPrediction/master/Data/Big-Data-in-Healthcare/HHA-551-Questions-for-Anton.txt"];lsQuestions=StringSplit[lsQuestions,"\n"];Length[lsQuestions]

Out[]=

In[]:=

RandomSample[lsQuestions,5]

Out[]=

{ Q3: What is the very first step you take when cleaning an enormous data set., 4. What kind of impact do you see AI having on the role of a data scientist in the next few years?, 2. All the data management tools are prone to manipulation by certain individuals to achieve an end. How well can the tools be designed to minimize such manipulation?, 2. How do visualizations on R Studio compare to Tableau?, 2. You mentioned you had a background working with recommendation engines such as the one on Netflix before working in healthcare, Has this helped you in the healthcare side of your career? Have you used or created anything similar to a recommendation engine in healthcare?}

In[]:=

lsaBigDataQuestions=LSAMonUnit[lsQuestions]⟹LSAMonMakeDocumentTermMatrix[{},Automatic]⟹LSAMonApplyTermWeightFunctions["IDF","TermFrequency","Cosine"]⟹LSAMonExtractTopics["NumberOfTopics"8,"MinNumberOfDocumentsPerTerm"2,Method"NNMF",MaxSteps20]⟹LSAMonEchoTopicsTable["NumberOfTableColumns"8]⟹LSAMonFindMostImportantDocuments[12]⟹LSAMonEchoFunctionValue[GridTableForm[#,TableHeadings{"Score","Index","ID","Document"}]&];

topics table:

1.000	vice
1.000	versa
1.000	sometimes
1.000	perform
1.000	duties
0.963	analysts
0.768	scientist
0.698	data
0.113	scientists
0.073	years
0.071	mentioned
0.062	think

1.000	healthcare
0.975	working
0.895	recommendation
0.488	netflix
0.488	helped
0.488	engines
0.488	engine
0.488	created
0.447	career
0.447	similar
0.429	used
0.416	mentioned

1.000	anton
0.854	questions
0.002	program
0.002	statistical
0.001	utilize
0.001	project
0.001	present
0.000	make
0.000	data
0.000	best
0.000	learn
0.000	explain

1.000	scientist
0.974	analyst
0.761	data
0.658	future
0.574	healthcare
0.225	look
0.185	like
0.162	background
0.139	day
0.131	studio
0.080	clinical
0.078	process

1.000	etl
0.973	learning
0.925	curve
0.867	process
0.500	you’ve
0.500	standards
0.500	past
0.500	omop
0.500	cleaned
0.500	classes
0.465	read
0.462	steep

1.000	programming
0.931	languages
0.437	python
0.290	recommend
0.271	years
0.267	springboard
0.267	rstudio
0.267	prefer
0.233	learning
0.222	compared
0.188	scientist
0.173	mentioned

1.000	machine
1.000	contribute
1.000	capabilities
0.909	future
0.862	think
0.702	learning
0.058	education
0.054	scientists
0.054	experienced
0.053	day
0.051	help
0.043	work

1.000	level
1.000	entry
0.643	positions
0.466	essential
0.449	position
0.262	experience
0.245	make
0.230	tools
0.228	explain
0.225	science
0.222	kind
0.214	analytics

#	Score	Index	ID	Document
1	1.	38	id.038	10. When you were developing digital media recommendation algorithms, what feedback did you get? Did it help meaningfully improve the code? Did you find having more information about an individual improved retention rates of recommendations?
2	0.156755	35	id.035	7. How much emphasis do you place on coding techniques like scripting, especially to help process huge amounts of data within a reasonable time frame? Are there other techniques you use to help expedite data processing?
3	0.0507546	42	id.042	3. What skills does an individual require in order to be a success at data analytics?
4	0.0506242	48	id.048	2. You mentioned you had a background working with recommendation engines such as the one on Netflix before working in healthcare, Has this helped you in the healthcare side of your career? Have you used or created anything similar to a recommendation engine in healthcare?
5	0.0506242	55	id.055	2. You mentioned you had a background working with recommendation engines such as the one on Netflix before working in healthcare, Has this helped you in the healthcare side of your career? Have you used or created anything similar to a recommendation engine in healthcare?
6	0.0457088	37	id.037	9. Do you think upcoming data analysts and scientists should be more well-rounded in their education? If so, do you think this can help address some of the shortcomings experienced by current AI implementations?
7	0.0146716	31	id.031	3. In the hiring process, most organizations look out for experienced people or at least someone from a healthcare background. How can a fresh graduate with no healthcare background or sufficient experience tap into the informatics field?
8	0.00854905	56	id.056	3. In one of our other classes we are learning about the ETL process and OMOP standards, when you’ve cleaned data in the past have you used the ETL process? If so, How was it done through R?
9	0.00854905	49	id.049	3. In one of our other classes we are learning about the ETL process and OMOP standards, when you’ve cleaned data in the past have you used the ETL process? If so, How was it done through R?
10	0.00776817	69	id.069	2. What would a day in the life of an R Studio –using analyst look like?
11	0.00772403	34	id.034	6. Are there any up-and-coming programming languages we should be aware of, especially with regards to Big Data?
12	0.00657614	2	id.002	2. Is there anything besides R that you suggest students take time to learn individually?

Visualization with a bipartite graph

In[]:=

gr=lsaBigDataQuestions⟹LSAMonSetValue[None]⟹LSAMonNormalizeMatrixProduct[NormalizedLeft]⟹LSAMonMakeGraph["Type""Bipartite","Thresholds"{0.2,1}]⟹LSAMonTakeValue;

The red nodes are the questions, the blue nodes are the words.

In[]:=

HighlightGraph[Graph[gr,VertexLabels"Name",EdgeStyle{Gray,Opacity[0.4]}],Flatten[StringCases[VertexList[gr],"id."~~__]],ImageSize1000]

Out[]=

Cite this as: Anton Antonov, "Simplified Machine-Learning Workflow #7" from the Notebook Archive (2020), https://notebookarchive.org/2020-09-55srdob

Latent Semantic Analysis (Part 2)

Session overview

Data

Dimensionality reduction functions at WFR

Full LSA workflow (over Raku documentation)

Raku?

Where from?

Natural language commands

Explanations the hard way

Topics extraction

Statistical thesaurus

The easy way

Eat your own dog food

"To sermon or not to sermon"

Q & A with study group at Stony Brook University, New York

Calculations

Visualization with a bipartite graph

"To sermon or not to sermon"
