Simplified Machine-Learning Workflow #6
Author
Anton Antonov
Title
Simplified Machine-Learning Workflow #6
Description
Semantic Analysis (Part 1)
Category
Educational Materials
Keywords
URL
http://www.notebookarchive.org/2020-09-55sqlz6/
DOI
https://notebookarchive.org/2020-09-55sqlz6
Date Added
2020-09-11
Date Last Modified
2020-09-11
File Size
430.56 kilobytes
Supplements
Rights
Redistribution rights reserved



Latent Semantic Analysis (Part 1)
Latent Semantic Analysis (Part 1)
A Wolfram livecoding session
Anton Antonov
December 2019
December 2019
This LSA course through questions
This LSA course through questions
- What data is going used?
- What is a typical workflow example?
- What are the typical applications of LSA?
- Why use LSA?
- What it the fundamental philosophical or scientific assumption for LSA?
- What is the most important and/or fundamental step of LSA?
- What is the difference between LSA and Latent Semantic Indexing (LSI)?
- What are the alternatives?
- Using Neural Networks instead?
- Using Neural Networks instead?
- How is LSA used to derive similarities between two given texts?
- How is LSA used to evaluate the proximity of phrases?
- How the main dimension reduction methods compare?
SVD, NNMF, ICA
What data is going used?
What data is going used?
Hamlet
USA State of the Union speeches
Airbnb reviews
What is a typical workflow example?
What is a typical workflow example?
In[]:=
lsaHamlet=ToLSAMonWLCommand["create from aTextHamlet; create document term matrix;echo document term summary;apply LSI functions IDF, None, Cosine; extract 12 topics using NNMF, max steps 20, and min number of documents per term 10;echo topics table with 6 columns;echo statistical thesaurus for ghost, king, castle",True];
»
Context value "documentTermMatrix":
Dimensions:{223,4440} | Density:0.0111603 | |||||||||||||||||
|
»
topics table:
|
|
|
|
|
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
|
|
|
|
|
»
statistical thesaurus:
term | statistical thesaurus entries |
castle | {castle,elsinore,scene,room,act,state,near,welcome,wind,hands,gentlemen,goes} |
ghost | {ghost,father,poor,heaven,blood,revenge,thy,bear,spirit,bed,away,word} |
king | {king,polonius,lord,shall,know,hath,father,dead,tis,laer,think,make} |
What are the typical applications of LSA?
What are the typical applications of LSA?
Text feature engineering. (To be used with ML)
Document clustering.
Document summarization
In[]:=
aTextHamlet//Length
Out[]=
223
In[]:=
lsaHamlet⟹LSAMonTakeW
Out[]=
SparseArray
|
In[]:=
MatrixPlot[lsaHamlet⟹LSAMonTakeW]
Out[]=
In[]:=
?FeatureExtraction
Out[]=
Symbol | |
FeatureExtraction[{ example 1 example 2 extractor 1 extractor 2 extractor i spec 1 ext 1 spec 2 ext 2 ext i spec i | |
In[]:=
lsaHamlet⟹LSAMonTakeDocumentTermMatrix
Out[]=
SparseArray
|
In[]:=
MatrixPlot[lsaHamlet⟹LSAMonTakeDocumentTermMatrix]
Out[]=
In[]:=
MatrixForm[(lsaHamlet⟹LSAMonTakeDocumentTermMatrix)〚30;;42,1000;;1042〛]
Out[]//MatrixForm=
departed | depends | depriv | deprive | desert | deserve | deserved | design | desire | desires | desirous | desk | desp | desperate | desperation | despis | despite | destroy | detecting | determination | determine | device | devices | devil | devis | devise | devotion | devoutly | dew | dews | dexterity | diadem | diameter | dicers | diction | didest | dido | didst | die | died | [dies | dies | diet | |
id.0030 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0031 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0032 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0033 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0034 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0035 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0036 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |
id.0037 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0038 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 2 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0039 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0040 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0041 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
id.0042 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
In[]:=
aTextHamlet["id.0038"]
Out[]=
Mar. Lord Hamlet! Hor. Heaven secure him! Ham. So be it! Mar. Illo, ho, ho, my lord! Ham. Hillo, ho, ho, boy! Come, bird, come. Mar. How is't, my noble lord? Hor. What news, my lord? Mar. O, wonderful! Hor. Good my lord, tell it. Ham. No, you will reveal it. Hor. Not I, my lord, by heaven! Mar. Nor I, my lord. Ham. How say you then? Would heart of man once think it? But you'll be secret? Both. Ay, by heaven, my lord. Ham. There's neer a villain dwelling in all Denmark But he's an arrant knave. Hor. There needs no ghost, my lord, come from the grave To tell us this. Ham. Why, right! You are in the right! And so, without more circumstance at all, I hold it fit that we shake hands and part; You, as your business and desires shall point you, For every man hath business and desire, Such as it is; and for my own poor part, Look you, I'll go pray. Hor. These are but wild and whirling words, my lord. Ham. I am sorry they offend you, heartily; Yes, faith, heartily. Hor. There's no offence, my lord. Ham. Yes, by Saint Patrick, but there is, Horatio, And much offence too. Touching this vision here, It is an honest ghost, that let me tell you. For your desire to know what is between us, O'ermaster't as you may. And now, good friends, As you are friends, scholars, and soldiers, Give me one poor request. Hor. What is't, my lord? We will. Ham. Never make known what you have seen to-night. Both. My lord, we will not. Ham. Nay, but swear't. Hor. In faith, My lord, not I. Mar. Nor I, my lord- in faith. Ham. Upon my sword. Mar. We have sworn, my lord, already. Ham. Indeed, upon my sword, indeed.
Why use LSA?
Why use LSA?
Fast computations using sparse matrix algebra algorithms.
Easier representation interpretation. (From mathematical and statistical point of view.)
Different dimension reduction algorithms.
What it the fundamental philosophical or scientific assumption for LSA?
What it the fundamental philosophical or scientific assumption for LSA?
What is the most important and/or fundamental step of LSA?
What is the most important and/or fundamental step of LSA?
What is the difference between LSA and Latent Semantic Indexing (LSI)?
What is the difference between LSA and Latent Semantic Indexing (LSI)?
What are the alternatives?
What are the alternatives?
How is LSA used to derive similarities between two given texts?
How is LSA used to derive similarities between two given texts?
In[]:=
text=StringJoin[txt1,txt2];sentences=StringSplit[text,{".","?","!",";"}];
In[]:=
fed=FeatureExtraction[DeleteStopwords@sentences,"TFIDF","FeatureDistance"]
Out[]=
FeatureDistance
|
In[]:=
fed[txt1,txt2]
Out[]=
0.333781
How is LSA used to evaluate the proximity of phrases?
How is LSA used to evaluate the proximity of phrases?
How the main dimension reduction methods compare?
How the main dimension reduction methods compare?
SVD
SVD
In[]:=
pnts=Transpose[Table[RandomReal[NormalDistribution[RandomReal[{0.3,14}],RandomReal[{0.5,6}]],600],{3}]];pnts=pnts.RotationMatrix[π/3,{1,1,1}];
In[]:=
{U,S,V}=SingularValueDecomposition[pnts,3];
In[]:=
MatrixForm[Transpose[V]]
Out[]//MatrixForm=
-0.529181 | -0.736016 | -0.422194 |
0.825513 | -0.561638 | -0.0555941 |
-0.196202 | -0.377946 | 0.904799 |
In[]:=
With[{c=15},Graphics3D[{Point[pnts],Red,Line[{{0,0,0},#/c}]&/@(S.Transpose[V])},PlotRange{{-c,c},{-c,c},{-c,c}},AxesTrue]]
Out[]=
In[]:=
qs=(#〚3〛-#〚1〛&)/@Table[Quartiles[pnts〚All,i〛],{i,Dimensions[pnts]〚2〛}];cpnts=Transpose[Table[(pnts〚All,i〛-Median[pnts〚All,i〛])/qs〚i〛,{i,Dimensions[pnts]〚2〛}]];medianPoint=Median/@Transpose[pnts];
In[]:=
cpnts=Transpose[Standardize/@Transpose[pnts]];
In[]:=
{U1,S1,V1}=SingularValueDecomposition[cpnts,3];
In[]:=
MatrixForm[Transpose[V1]]
Out[]//MatrixForm=
-0.61962 | 0.734069 | 0.277874 |
-0.503743 | -0.100418 | -0.857997 |
-0.601926 | -0.671609 | 0.432003 |
In[]:=
With[{c=4},Graphics3D[{Point[cpnts],Red,Arrow[{{0,0,0},#/6}]&/@(S1.Transpose[V1])},PlotRange{{-c,c},{-c,c},{-c,c}},AxesTrue]]
Out[]=
NNMF
NNMF
In[]:=
{W,H}=GDCLS[pnts,2];
In[]:=
MatrixForm[H]
Out[]//MatrixForm=
44.731 | 2.64025 | 10.6537 |
0. | 16.0886 | 6.71657 |
In[]:=
With[{c=40},Graphics3D[{Point[pnts],Blue,Arrow[{{0,0,0},#}]&/@Normal[H]},PlotRange{{-c,c},{-c,c},{-c,c}},AxesTrue]]
Out[]=
ICA
ICA
In[]:=
{A,S}=ResourceFunction["IndependentComponentAnalysis"][pnts,2];
In[]:=
Dimensions[A]
Out[]=
{600,2}
In[]:=
MatrixForm[S]
Out[]//MatrixForm=
-2.19859 | 0.356863 | -3.4822 |
2.2057 | -3.34072 | -2.33556 |
In[]:=
With[{c=30},Graphics3D[{Point[pnts],Blue,Arrow[{{0,0,0},#}]&/@(5*S)},PlotRange{{-c,c},{-c,c},{-c,c}},AxesTrue]]
Out[]=
References
References
[AA1] Anton Antonov, A monad for Latent Semantic Analysis workflows, (2019), MathematicaForPrediction at GitHub.


Cite this as: Anton Antonov, "Simplified Machine-Learning Workflow #6" from the Notebook Archive (2020), https://notebookarchive.org/2020-09-55sqlz6

Download

