[WSC19] Classifying Stocks by Their Return "Fingerprints"
Author
Aaliyah Sayed
Title
[WSC19] Classifying Stocks by Their Return "Fingerprints"
Description
Using FeatureSpacePlot to visualize similarities between stock returns
Category
Essays, Posts & Presentations
Keywords
finance, camp, student, wolfram summer camp, data science, machine learning, stocks, economics, computer science, stocks, visualization
URL
http://www.notebookarchive.org/2019-07-5kbmexv/
DOI
https://notebookarchive.org/2019-07-5kbmexv
Date Added
2019-07-12
Date Last Modified
2019-07-12
File Size
4. megabytes
Supplements
Rights
Redistribution rights reserved
Download
Open in Wolfram Cloud
WOLFRAM SUMMER SCHOOL 2019
Classifying Stocks by Their Return “Fingerprints”
Classifying Stocks by Their Return “Fingerprints”
By Aaliyah SayedIn the stock market, the returns (% change in price of the stock between days) of any particular stock create a fingerprint unique to that stock. For example, the returns of Tesla and Johnson & Johnson are very different, since they are shaped by variable factors such as industry, size, and volatility. In this project, I used machine learning to analyze the “return fingerprint” of stocks in the S&P 500. Would the computer tell me that Facebook and Twitter are similar, if I gave it no context? To isolate the return fingerprint from the rest of the variable factors, I trained the computer on pure Date List Plots. I could analyze the impact of the variable factors, how people can use what they know about particular stocks to get a better overview of the market, and the accuracy and precision of computer results. I aimed to correctly group stocks by fingerprint (within the time frame of a year), and analyze the correlation within the subgroups.
Importing Data
Importing Data
The Wolfram databases store a wide range of financial data, so I was able to import all of my data directly and without parsing. I imported the returns of all stocks in the S&P 500 and created a list of DateListPlots to store the graphs. I removed the axes, but the range of all of the plots is { -10,10}.
The first line creates an association map of the imported data, and the second line uses the data and creates a list of date list plots.
The first line creates an association map of the imported data, and the second line uses the data and creates a list of date list plots.
In[]:=
moreReturns=AssociationMapFinancialData[#,"Return","Jan. 1 2019"]&,EntityList
;
S&P 500 | FINANCIAL ENTITIES |
In[]:=
stockReturnsPlot=DateListPlot[#,FrameTicksNone]&/@DeleteMissing[moreReturns];Take[stockReturnsPlot,5]
Out[]=
3M
,
Abbott Laboratories
,
AbbVie
,
Abiomed
,
Accenture
Initial Feature Space Plot
Initial Feature Space Plot
With all of the DateListPlots formatted correctly, I FeatureSpacePlotted the results. I discovered some patterns when hovering over datapoints (Adobe and Microsoft are near each other), but there are not enough features to visualize or comprehend the results.
In[]:=
FeatureSpacePlot[stockReturnsPlot]
Out[]=
Grouping By Sector
Grouping By Sector
The next step was to obtain a list of industries and bucket the stocks by their industry.
These two lines create a list of the industries in the S&P 500. The third line buckets the stocks.
In[]:=
snpSectorsAll=AssociationMap[EntityValue[#,"Sector"]&,Keys[moreReturns]];
In[]:=
plotSectors=Values[snpSectorsAll]//Union
Out[]=
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,Missing[NotAvailable]
Advertising And Marketing Services |
Aerospace And Defense |
Airlines |
Application Software |
Autos |
Banks |
Beverages Alcoholic |
Brokers And Exchanges |
Building Materials |
Business Services |
Chemicals |
Communication Equipment |
Communication Services |
Computer Hardware |
Consulting And Outsourcing |
Consumer Packaged Goods |
Credit Services |
Drug Manufacturers |
Engineering And Construction |
Entertainment |
Health Care Providers |
Homebuilding And Construction |
Industrial Distribution |
Industrial Products |
Insurance |
Insurance Specialty |
Manufacturing Apparel And Furniture |
Medical Diagnostics And Research |
Medical Distribution |
Metals And Mining |
Personal Services |
Real Estate Services |
REI Ts |
Restaurants |
Retail Apparel And Specialty |
Retail Defensive |
Semiconductors |
Tobacco Products |
Transportation And Logistics |
Travel And Leisure |
Truck Manufacturing |
Utilities Regulated |
In[]:=
stockSectorsAll=GroupBy[Normal[snpSectorsAll],LastFirst];Take[stockSectorsAll,1]
Out[]=
,,,,,,,,,,,,,,,,,,,
Industrial Products |
3M
Ametek
A.O. Smith
Avery Dennison
Cummins
Dover
Eaton
Emerson
Flowserve
General Electric
Honeywell
Illinois Tool Works
Ingersoll-Rand
Parker Hannifin
Pentair
Rockwell Automation
Roper Industries
Snap-On
Stanley Black & Decker
Xylem
Mapping Sectors to Colors
Mapping Sectors to Colors
I created a color table of 43 colors:
In[]:=
colorTable=Table[Hue[n/50],{n,1,43}]
Out[]=
,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,
And I created an association thread to map all of the industries to different colors:
In[]:=
sectorsMapped=AssociationThread[Keys[stockSectorsAll]colorTable]
Out[]=
,Missing[NotAvailable],
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
,
Industrial Products |
Drug Manufacturers |
Application Software |
Retail Apparel And Specialty |
Semiconductors |
Utilities Regulated |
Medical Diagnostics And Research |
REI Ts |
Chemicals |
Airlines |
Consulting And Outsourcing |
Credit Services |
Tobacco Products |
Insurance |
Communication Services |
Medical Distribution |
Computer Hardware |
Brokers And Exchanges |
Autos |
Consumer Packaged Goods |
Metals And Mining |
Business Services |
Banks |
Aerospace And Defense |
Travel And Leisure |
Beverages Alcoholic |
Manufacturing Apparel And Furniture |
Real Estate Services |
Entertainment |
Restaurants |
Transportation And Logistics |
Communication Equipment |
Retail Defensive |
Health Care Providers |
Homebuilding And Construction |
Insurance Specialty |
Industrial Distribution |
Engineering And Construction |
Personal Services |
Advertising And Marketing Services |
Building Materials |
Truck Manufacturing |
Initial Prototype
Initial Prototype
This code creates a dummy manipulate to simulate the user interface of my project.
sample=Manipulate[FeatureSpacePlot[stockReturnsPlot],{sectors,plotSectors,ControlTypeTogglerBar,Appearance"Row"}]
Out[]=
$Aborted
Mapping Colors
Mapping Colors
Once I had the stocks mapped to the sectors and the sectors mapped to the colors, I wrote a function to map the stocks to the colors.
This function takes a particular stock as a parameter, and it called on sectorsMapped and snpSectorsAll to return the corresponding colors.
In[]:=
stockColorFunction[a_]:=sectorsMapped[snpSectorsAll[a]]
This line gives me a list of the color values for each stock, without the stock names.
In[]:=
stockColorFunction/@Keys[moreReturns]
Out[]=
Ordered, Static Confetti
Ordered, Static Confetti
I then feature space plotted the DateListPlots, and I got the following results. The results were a blast of not-so-random confetti. In this plot, datapoints of similar colors cluster together in groups ranging from two to 12 points. This shows that the machine classifies the stocks by sector. The user can hover over a datapoint to view the stock and industry.
I used tooltip to label the points by stock name and sector when the mouse is hovering over them.
In[]:=
FeatureSpacePlot[(Tooltip[Style[#2,stockColorFunction[#1]],Column[{#1,snpSectorsAll[#1]/._Missing"Uncategorized"},Center]])&@@@Normal[stockReturnsPlot],LabelingFunctionNone]
Out[]=
This nonfunctional manipulate calls on stockColorFunction and the stock names to color code the names.
In[]:=
Manipulate[Style[#["Name"],stockColorFunction[#]]&/@Keys[moreReturns],{stylizer,Reverse/@Normal[sectorsMapped],ControlTypeTogglerBar,Appearance"Row"}];
Dynamic Stock Manipulator
Dynamic Stock Manipulator
The next step was to make my plot dynamic, because the toggler buttons did not yet work. To do this, I had to break down my code and put aside the FeatureSpacePlot.
This code uses conditionals and MemberQ to determine if a stock is a member of a particular sector. All of the stock names are white to begin with, and toggling one of the sector buttons reveals all of the corresponding stocks.
In[]:=
Prototype=Manipulate[Style[#["Name"],If[MemberQ[stylizer,stockColorFunction[#]],stockColorFunction[#],White]]&/@Keys[moreReturns],{stylizer,Reverse/@Normal[sectorsMapped],ControlTypeTogglerBar,Appearance"Row"}]
Out[]=
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Results!
Results!
Once I had my dynamic stock color manipulator, I put the code back into the FeatureSpacePlot to create an interactive stock sector visualizer. The user can toggle buttons to highlight the corresponding stocks.
In[]:=
Manipulate[FeatureSpacePlot[(Tooltip[Style[#2,If[MemberQ[stylizer,stockColorFunction[#]],stockColorFunction[#],White]],If[MemberQ[stylizer,stockColorFunction[#]],Column[{#1,snpSectorsAll[#1]/._Missing"Uncategorized"},Center]]])&@@@Normal[stockReturnsPlot],LabelingFunctionNone],{stylizer,Reverse/@Normal[sectorsMapped],ControlTypeTogglerBar,Appearance"Row"}]
Out[]=
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Rasterized
Rasterized
As a small extension, I rasterized the DateListPlots to see how the machine would react to pictures instead of plots. This result would ideally be more similar to how humans would compare two stocks, as we can’t analyze individual data points in a graph.
In[]:=
FeatureSpacePlot[(Tooltip[Style[#2,stockColorFunction[#1]],Column[{#1,snpSectorsAll[#1]/._Missing"Uncategorized"},Center]])&@@@Normal[Rasterize/@stockReturnsPlot],LabelingFunctionNone]
Out[]=
Condensing the Sectors
Condensing the Sectors
Although I achieved the results I wanted, the interactive plot was not very user friendly. Some industries, such as Banks and Insurance, were completely different colors, despite being in the same industry. In addition, forty three toggler buttons was too much. After remapping the industries to sectors, I created an association to assign the sectors to 12 colors. The final plot is a lot more user friendly, and it is easier to read the data.
The “Sector” property of FinancialData returns one of 50+ industries, so I manually condensed these down to the 11 commonly known sectors and one for missing sector data.
In[]:=
twelveStockSectors=<|FlattenstockSectorsAll/@
,
,
"Communication Services",FlattenstockSectorsAll/@
,
,
,
,
,
,
,
"Consumer Discretionary",FlattenstockSectorsAll/@
,
,
,
"Consumer Staples",FlattenstockSectorsAll/@
,
,
,
,
,
"Financials",FlattenstockSectorsAll/@
,
,
,
"Healthcare",FlattenstockSectorsAll/@
,
,
,
,
,
,
,
,
"Industrials",FlattenstockSectorsAll/@
,
,
"Information Technology",FlattenstockSectorsAll/@
,
,
"Materials",stockSectorsAll
"Real Estate",stockSectorsAll
"Utilities",stockSectorsAll[Missing["NotAvailable"]]"Unclassed"|>
Entertainment | FINANCIAL ENTITIES |
Communication Equipment | FINANCIAL ENTITIES |
Communication Services | FINANCIAL ENTITIES |
Travel And Leisure | FINANCIAL ENTITIES |
Advertising And Marketing Services | FINANCIAL ENTITIES |
Personal Services | FINANCIAL ENTITIES |
Manufacturing Apparel And Furniture | FINANCIAL ENTITIES |
Homebuilding And Construction | FINANCIAL ENTITIES |
Restaurants | FINANCIAL ENTITIES |
Autos | FINANCIAL ENTITIES |
Retail Apparel And Specialty | FINANCIAL ENTITIES |
Retail Defensive | FINANCIAL ENTITIES |
Beverages Alcoholic | FINANCIAL ENTITIES |
Consumer Packaged Goods | FINANCIAL ENTITIES |
Tobacco Products | FINANCIAL ENTITIES |
Insurance Specialty | FINANCIAL ENTITIES |
Banks | FINANCIAL ENTITIES |
Brokers And Exchanges | FINANCIAL ENTITIES |
Insurance | FINANCIAL ENTITIES |
Credit Services | FINANCIAL ENTITIES |
REI Ts | FINANCIAL ENTITIES |
Health Care Providers | FINANCIAL ENTITIES |
Medical Distribution | FINANCIAL ENTITIES |
Medical Diagnostics And Research | FINANCIAL ENTITIES |
Drug Manufacturers | FINANCIAL ENTITIES |
Truck Manufacturing | FINANCIAL ENTITIES |
Engineering And Construction | FINANCIAL ENTITIES |
Industrial Distribution | FINANCIAL ENTITIES |
Transportation And Logistics | FINANCIAL ENTITIES |
Aerospace And Defense | FINANCIAL ENTITIES |
Business Services | FINANCIAL ENTITIES |
Airlines | FINANCIAL ENTITIES |
Consulting And Outsourcing | FINANCIAL ENTITIES |
Industrial Products | FINANCIAL ENTITIES |
Computer Hardware | FINANCIAL ENTITIES |
Semiconductors | FINANCIAL ENTITIES |
Application Software | FINANCIAL ENTITIES |
Building Materials | FINANCIAL ENTITIES |
Metals And Mining | FINANCIAL ENTITIES |
Chemicals | FINANCIAL ENTITIES |
Real Estate Services | FINANCIAL ENTITIES |
Utilities Regulated | FINANCIAL ENTITIES |
This creates a table of distinct, easy to see colors for the 12 sectors.
In[]:=
smallColorTable={RGBColor[0.11,0.84,0.85],RGBColor[0.7,0.34,0.8],RGBColor[0.86,0.1,0.15],RGBColor[0.08,0.7,0.15],RGBColor[0.54,0.5,0.95],RGBColor[0.38,0.34,0.35],RGBColor[0.93,0.44,0.07],RGBColor[0.56,0.32,0.],RGBColor[0.09,0.22,0.94],RGBColor[0.86,0.83,0.06],RGBColor[0.79,0.45,0.59]}
Out[]=
,,,,,,,,,,
In[]:=
condensedSectorsMapped=AssociationThread[Values[twelveStockSectors]smallColorTable]
Out[]=
Communication Services,Consumer Discretionary,Consumer Staples,Financials,Healthcare,Industrials,Information Technology,Materials,Real Estate,Utilities,Unclassed
This maps all of the stocks to their new sector.
In[]:=
snpSectorsAll2=Association@Flatten[Function[{stockNames,sectorName},#sectorName&/@stockNames]@@@Normal[twelveStockSectors]];
updated stockColorFunction.
In[]:=
stockColorFunction2[a_]:=condensedSectorsMapped[snpSectorsAll2[a]]
new and improved
new and improved
I replotted the data points, but with the new color mapping. The results are a lot more user friendly, and it is easier to spot patterns.
In[]:=
alright=Manipulate[FeatureSpacePlot[(Tooltip[Style[#2,If[MemberQ[sectors,stockColorFunction2[#]],stockColorFunction2[#],White]],If[MemberQ[sectors,stockColorFunction2[#]],Column[{#1,snpSectorsAll2[#1]/._Missing"Uncategorized"},Center]]])&@@@Normal[stockReturnsPlot],LabelingFunctionNone,PlotStylePointSize[Medium]],{sectors,Reverse/@Normal[condensedSectorsMapped],ControlTypeTogglerBar,Appearance"Row"}]
Out[]=
| |||||
|
Here, the Financials, Information Technology, and Utilities sectors are selected. The correlation between sectors of various stocks is quite strong here. All of the yellow Utility stocks cluster in the left, while the Financial and IT sectors cluster in smaller subgroups all around the plot.
Future Work
Future Work
I focused on the visualization of stocks by sector for a fixed time frame, but there are many ways to expand the scope and give more comprehensive results. One such extension is adjusting the size of the dots to indicate market capitalization. This would show additional correlation, as one would expect larger stocks to cluster together. Another extension is to create an adjustable timeframe to see the market change over a period of time. This would be great to visualize the dynamic market, as well as to predict where the market is headed.
Cite this as: Aaliyah Sayed, "[WSC19] Classifying Stocks by Their Return "Fingerprints"" from the Notebook Archive (2019), https://notebookarchive.org/2019-07-5kbmexv
Download