Data Science with Andreas Lauschke (#4)
Author
Andreas Lauschke
Title
Data Science with Andreas Lauschke (#4)
Description
Association and the Dataset functions
Category
Educational Materials
Keywords
URL
http://www.notebookarchive.org/2020-09-4lm7z60/
DOI
https://notebookarchive.org/2020-09-4lm7z60
Date Added
2020-09-10
Date Last Modified
2020-09-10
File Size
98.54 kilobytes
Supplements
Rights
Redistribution rights reserved
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
data:image/s3,"s3://crabby-images/97e21/97e21d941045101921bcfd57c45c820c8eed2b93" alt=""
Associations and Dataset, Part 3
Associations and Dataset, Part 3
Andreas Lauschke, June 4 2019
Andreas Lauschke, June 4 2019
We started with a gentle introduction to Associations and Dataset and will finish up the assos today and the Dataset next time, along with an intro of the Query.
today’s session: finishing up on Associationsnext session: finishing up on Dataset introduction of Query free data: web scraping (“traditional”, html, XML, and with the new WebExecute from M12)
Associations
Associations
Select, Cases, DeleteCases, Scan, Map
Select, Cases, DeleteCases, Scan, Map
Select works on association *values*:
In[]:=
Select[<|a1,b2,c3,d4|>,#>2&]
application: Select only those rows for which the row-totals of the HilbertMatrix meet a certain criterion:
In[]:=
(*turn5x5HilbertMatrixintorow-indexedasso*)temp=With[{length=5},{Range@length,HilbertMatrix@length}];asso=AssociationThread[temp[[1]]temp[[2]]]
In[]:=
(*selectrowsbasedonrowtotal*)Select[asso,Total@#>1&]
let’s check:
In[]:=
N@%
In[]:=
asso//Total//N
Selecting on values means the keys are irrelevant (squares can never be prime):
In[]:=
Select[Association[Table[i^2i,{i,20}]],PrimeQ]
Same for Cases: it works on *values*.
In[]:=
Cases[<|a1,b2,cx,dy,ePi,RedBlue,YellowGreen|>,_Integer|_RGBColor]
define a new asso:
In[]:=
asso=<|aRed,b{Green,Red,Blue},c{Yellow,Red},d{1,2,Pi},e"Hello",fHilbertMatrix@2,"Good""Morning"|>
taking all sublists from that asso gives
In[]:=
Cases[asso,_List]
and all 2-elem sublists gives:
In[]:=
Cases[asso,{_,_}]
The pattern matcher only applies on the first level ...
In[]:=
Cases[<|1->c,2-><|31,ab|>|>,_Symbol]Cases[<|1->c,c-><|cc,cc|>|>,_Symbol]Cases[<|1->c,c-><|cc,cc|>|>,_Association]
... unless you specify deeper:
In[]:=
Cases[<|1->c,2-><|31,ab|>|>,_Symbol,Infinity]Cases[<|1->c,c-><|cc,cc|>|>,_Symbol,Infinity]Cases[<|1->c,c-><|cc,cc|>|>,_Association,Infinity]
Important difference between Select and Cases: Select returns the whole matching rule of the Association (based on the *values*), Cases only returns the *values* from the match.
In[]:=
asso=<|11,32,53,74,95,116|>;Select[asso,EvenQ]Cases[asso,_?EvenQ]
mini-application: salary filtering
In[]:=
staff=AssociationThread[{"Stacy","Carol","Trish","Mike","Donald","Chris","Ralph","Oliver","Keenan","Peter","Camilla"}->Range[100,200,10]]
In[]:=
Select[staff,#>150&]Cases[staff,_?(#>150&)]
However, subsequent functions operate on the values (even though part of the return of Select was the keys!!!):
In[]:=
Select[staff,#>150&]//TotalCases[staff,_?(#>150&)]//Total
You can also use the Associations in the pattern specification:
In[]:=
listofassos={<|a1|>,<|b2|>,<|c2|>};(*keyagoestosomething*)Cases[listofassos,<|a_|>](*anythinggoesto2*)Cases[listofassos,<|_2|>]
note that localization is preserved (num is green and slanted, your global num will be unaffected):
In[]:=
num=55;Cases[{<|a1|>,<|b2|>,<|c2|>},<|num_2|>]num
the pattern matcher imposes no limitations on Associations. In fact, the key-value nature of the asso makes the pattern matcher more expressive. This finds all cases where “anything goes to 1” is in the second Association element:
In[]:=
Cases[{<|b1,z1,a1|>,<|c1,y2,aaa7|>,<|aa2,bb3,cc4|>,<|aa2,foo1,bbPi/2|>},<|_,_1,_|>]
Condition is a powerful "enhancer" of Cases: here we take all Associations for which the key meets a certain criterion (they key!!! that’s key here!!!).
In[]:=
listofassos=<|#RandomInteger@#|>&/@Range@20;Cases[listofassos,<|a_/;EvenQ[a]_|>]
The above basically says: give me all Cases in which the key, which must be even, goes to anything
But obviously we can also use Condition on a value:
In[]:=
Cases[listofassos,<|_->a_/;EvenQ[a]|>]
note that the above has a whole asso in the pattern matcher! Therefore you get a list of assos. This is handy when you need to match on the keys and values *across* different assos!
but often we want to pattern-match over the elements of an asso, and if the keys in the lists of assos are unique, we can simply throw Association around the list:
In[]:=
aa=Cases[listofassos,_?EvenQ]bb=Cases[Association@listofassos,_?EvenQ]cc=Select[listofassos,EvenQ]dd=Select[Association@listofassos,EvenQ]Values@dd===bb
observe:
In[]:=
Association@{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>}
so these should result in the same:
In[]:=
(*asso*)Cases[110,211,312,413,514,_?EvenQ](*assoofListofassos*)Cases[Association@{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>},_?EvenQ]
but this can’t work if the keys in the multiple assos are not unique (last one prevails):
In[]:=
Association@{<|110|>,<|111|>}Association@{<|110|>,<|111|>,<|112|>,<|1"hello"|>,<|127|>,<|193|>}
so this can be a problem or a solution. If you use it judiciously and *intend* to overwrite earlier keys, this is a very efficient way to do that. But if keys are not unique, this is *not* a proper way to merge assos into one.
obviously things may sometimes be easier with Select (because it just checks on a true/false level). Note the diff:
In[]:=
(*listofassos*)Select[{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>},First@*EvenQ]Select[{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>},EvenQ/*First]Select[{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>},EvenQ@#[[1]]&](*asso*)Select[Association@{<|110|>,<|211|>,<|312|>,<|413|>,<|514|>},EvenQ]Select[110,211,312,413,514,EvenQ]
Listable Operations
Listable Operations
Associations are Listable (in their values). So for example you can add or multiply or log or sqrt or exponentiate Associations.
In[]:=
<|"a"1,"c"2|>+<|"a"10,"c"-7|><|"a"1,"c"2|><|"a"10,"c"-7|><|"a"1,"c"2|>Log@<|"a"10,"c"-7|><|"a"1,"c"2|>Sqrt@<|"a"10,"c"-7|><|"a"1,"c"2|>^<|"a"10,"c"-7|>
If the keys don’t match, you can’t perform the operation properly. You can’t add “b”->2 and “c”-> -7:
In[]:=
<|"a"1,"b"2|>+<|"a"10,"c"-7|>
You can apply Listable functions directly to the Association. It operates (“auto-maps”) on the values (no changes to the keys). Compare:
In[]:=
asso=<|foo1.0,bar2.0,bass0,halibut2Pi,troutPi/2,carpPi/4,pikePi,grouper3/2Pi|>;Sin[asso]#^7&/@asso
Sin and #^7 are very similar here: Sin has the attribute Listable set, #^7 doesn’t, but we apply it as a pure function in the mapping operation. So, bingo.
In[]:=
Attributes@Sin
in case I didn’t make it clear: Mapping and function application happen *only* on the values -- so we can also do simple function arithmetic:
In[]:=
rule=Range@10Range[11,20];AssociationThread[rule]AssociationThread[rule]12
Association-Specific Operations
Association-Specific Operations
Several functions return their results as Associations:
In[]:=
CountsBy[Range[100],PrimeQ]
In[]:=
PositionIndex[{a,b,c,a,c,a}](*elementsbecomekeys,positionsbecomevalues!*)
In[]:=
(*groupbysecondelements,withthekeysbeingtheelements*)GroupBy[{{a,b},{a,c},{b,c},HilbertMatrix@2,{Red,Green}},Part[#,2]&]
In[]:=
(*groupbyvaluesofanasso*)GroupBy[<|a1,b2,c3|>,EvenQ]
In[]:=
GroupBy[Range@100,PrimeQ]
In[]:=
Merge[{<|a1,b2|>,<|a5,b10|>},g](*keepgsymbolic*)
In[]:=
(*pickaspecificg*)listofassos={<|a1,b2|>,<|a5,b10|>};Merge[listofassos,Total]Merge[listofassos,#[[1]]^#[[2]]&]Merge[listofassos,N@*Sin]Merge[listofassos,#^5&]
In[]:=
(*applynofunctionatall,Ithinkofthisas"reshuffle"*)Merge[{<|a1,b2|>,<|a5,b10|>},Identity]Merge[{<|a1,b2|>,<|b4,c5|>},Identity]
mini-application of the simplest version (Identity): who works where after the merger of ABC and XYZ companies?
In[]:=
abc=<|"IT""Chris","Accounting""Stacey","Legal""Mike","HR""Peggy","backups""Jerome"|>
In[]:=
xyz=<|"Accounting""Klaus","IT""Jason","HR""Myrtle","Legal""Jim","Security""Paul"|>
after the merger:
In[]:=
Merge[{abc,xyz},Identity]
KeySelect, KeyTake, KeyUnion, KeyDrop, KeyMap,...
KeySelect, KeyTake, KeyUnion, KeyDrop, KeyMap,...
self-explanatory. Very useful, because most functions operate on the values, not the keys. So if you want an operation to operate on the keys, you need to use the Key... functions.
Compare:
In[]:=
asso=<|ab,cd|>;KeyMap[g,asso](*means:maponlyonthekeys!Don'ttouchvalues!*)Map[g,asso]g/@asso(*equiv,/@isshortcutforMap*)
you can see, with KeyMap g wraps around the keys, with Map g wraps around the values.
Compare :
In[]:=
asso=Association[<|#[[1]]#[[2]]|>&/@RandomInteger[20,{20,2}]](*selectbasedonkeys!allkeyswillbeodd!*)KeySelect[asso,OddQ](*selectbasedonvalues!allvalueswillbeodd!*)Select[asso,OddQ]
In[]:=
KeySelect[asso,PrimeQ](*allkeyswillbeprime*)Select[asso,PrimeQ](*allvalueswillbeprime*)
you can see, with KeySelect the criterion wraps around the keys, with Select it wraps around the values.
this Key<something> logic percolates through (KeyComplement, KeyDrop, KeyDropFrom, KeyExistsQ, KeyFreeQ, KeyIntersection, KeyMap, KeyMemberQ, KeySelect, KeySort, KeySortBy, KeyTake, KeyUnion, KeyValueMap, KeyValuePattern):
In[]:=
asso2=Association[<|#[[1]]#[[2]]|>&/@RandomInteger[20,{20,2}]]
In[]:=
KeyComplement[{asso,asso2}]KeyExistsQ[asso,5]KeyFreeQ[asso,5]KeyIntersection[{asso,asso2}]KeyMemberQ[asso,5]KeyUnion[{asso,asso2}]KeySort@asso
Compare:
In[]:=
KeyUnion[{<|a1,b2|>,<|b2|>}]Union[{<|a1,b2|>,<|b2|>}]KeySort[<|b3,a2,c1|>]Sort[<|b3,a2,c1|>]
In[]:=
KeyTake[{<|a1,b2|>,<|b2|>},{a,b}]KeyTake[{<|a1,b2,c3|>},{a,b}](*listofoneasso!*)KeyTake[<|a1,b2,c3|>,{a,b}]KeyTake[<|a1,b2,c3|>,{b,c}]<|a1,b2,c3|>//KeyTake[{b,c}](*KeyTakehasanoperatorform!*)KeyTake[{<|a1,b2,c3|>,<|b"hello",d"world",c11|>},{b,c}]KeyTake[{<|a1,b2,c3|>,<|b"hello",d"world"|>},{b,c}]
KeyDrop: no surprises, works as intuitively expected: drop elements with the specified key:
In[]:=
KeyDrop[<|a1,b2|>,a]
and multiple keys in one asso:
In[]:=
KeyDrop[<|a1,b2,c3,d4|>,{a,d}]
and across multiple assos:
In[]:=
KeyDrop[{<|a1,b2|>,<|c3,d4|>},{a,d}]
can also specify the association as a List, then it assumes it’s an asso (and returns an asso):
In[]:=
KeyDrop[{a1,b2},a](*note:noassohere!*)
delete from several lists:
In[]:=
KeyDrop[{<|a1,b2|>,<|b3|>},b]KeyDrop[{{a1,b2},{b3}},{b}]
can also delete from a mix of assos and rules (interpreted as assos):
In[]:=
KeyDrop[{<|a1,b2|>,{b3,c3}},b](*listofassoandlist*)
has an operator form:
In[]:=
KeyDrop[<|a1,b2,c3,d4|>,{a,d}]<|a1,b2,c3,d4|>//KeyDrop[{a,d}]
KeyDropFrom *changes* the asso:
In[]:=
asso=<|a1,b2|>;
In[]:=
KeyDropFrom[asso,a]asso
In[]:=
asso=<|a1,b2,c3,d4|>;
In[]:=
KeyDropFrom[asso,{a,d}]asso
Various Bits and Pieces / Loose Ends
Various Bits and Pieces / Loose Ends
nested lookups of values are assos themselves: Look up “a”, that gives an asso, and then look up the second key.
In[]:=
<|"a"<|"b"z|>,"c"y|>[["a","b"]](*part-style*)<|"a"<|"b"z|>,"c"y|>["a","b"](*function-style*)
A rule for the Association in a list is “flattened” or “merged”:
In[]:=
Association[{{a1},b2}]
An Association on the element level of an Association is merged as well (and reminder: duplicates are removed):
In[]:=
Association[a1,b2,Association[b2],Association[c3,d4]]
This can encourage sloppy programming! But it can also be very pragmatic (sloppiness can be very practical if used consciously and not just due to laziness).
Obviously, if a key or a value is an Association, that must be kept:
In[]:=
Association[a1,bAssociation[c2],Association[c3]5,d<|e8|>]
We can also use Extract on Associations. And we obviously can use Map inside Associations. And all other functions that the kernel is aware of.
In[]:=
expr=<|1->1+x+x^2+x^3,2Total[(y^#&/@Range[0,4])],3Sum[z^i,{i,0,9,3}],4(Integrate[Sin@x,{x,0,Pi/2#}]&/@Range[0,16]),5(Plot[Tan[#x],{x,-Pi/2,Pi/2}]&/@Range@5)|>
equivalent:
In[]:=
Extract[expr,Key[1]]Extract[expr,{Key[1]}]
but now we want to get deeper and get every sub-element, based on its position (a list of monomials), now we have to specify the indices in the position list:
In[]:=
Extract[expr,{Key[1],#}]&/@Range@4
In[]:=
Extract[expr,{Key[2],#}]&/@Range@5
In[]:=
Extract[expr,{Key[3],#}]&/@Range@4
In[]:=
Extract[expr,{Key[4],#}]&/@Range[1,17,4]Extract[expr,{Key[4],#}]&/@Range[2,17,4]Extract[expr,{Key[4],#}]&/@Range[3,17,4]
In[]:=
Extract[expr,{Key[5],#}]&/@Range@5
expressions are evaluated first, before they become keys (or values):
In[]:=
AssociationThread[Integrate[Sin@x,{x,0,Pi/2#}]&/@Range[0,2]{a,b,c}]
More Info: Stephen’s page on assos in his Elementary Introduction:
“http://www.wolfram.com/language/elementary-introduction/2nd-ed/34-associations.html”
“http://www.wolfram.com/language/elementary-introduction/2nd-ed/34-associations.html”
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
Cite this as: Andreas Lauschke, "Data Science with Andreas Lauschke (#4)" from the Notebook Archive (2020), https://notebookarchive.org/2020-09-4lm7z60
data:image/s3,"s3://crabby-images/afa7e/afa7e751d718eac7e65669706b85c714b1d1becc" alt=""
Download
data:image/s3,"s3://crabby-images/c9374/c9374a157002afb9ce03cd482ea9bc6b4ee16fc0" alt=""
data:image/s3,"s3://crabby-images/7630b/7630b01d225114cfa2bafc392f9b6df93ec5f7bb" alt=""