Meijer-G Learning
Author
Sotiris Michos
Title
Meijer-G Learning
Description
Using the Meijer-G function for symbolic regression
Category
Essays, Posts & Presentations
Keywords
Symbolic Regression, Machine Learning, Meijer-G, Symbolic Pursuit, XAI
URL
http://www.notebookarchive.org/2021-07-6fkrg5n/
DOI
https://notebookarchive.org/2021-07-6fkrg5n
Date Added
2021-07-14
Date Last Modified
2021-07-14
File Size
0.93 megabytes
Supplements
Rights
Redistribution rights reserved



WOLFRAM SUMMER SCHOOL 2021
Meijer-G Learning
Meijer-G Learning
Sotirios Michos
This project explores the potential of the Meijer-G function to learn a black-box process and perform symbolic regression on a given set of data generated by such a function. The black-box process can be a system for which we have limited understanding, such as a neural network, or it can be any natural or artificial phenomenon that leads to the generation of data and for which we wish to increase our comprehension. The basis for this work is the referenced article where the authors propose a novel method for modeling data that comprises of the repetitive fitting of Meijer-G terms in order to capture the behaviour present in the data. The use of Meijer-G as a ridge function is motivated by its excellent analytic characteristics and its great expressibility that, especially for common functions, is usually achieved through only a small number of parameters. The same characteristics allow the Meijer-G based models to learn and reveal hidden information about the process that generated the data, ranging from an explanation of how the different factors contribute to the result to the actual law that generates the data. All these provide excellent interpolation and extrapolation capabilities that are highly sought after in modeling and regression analysis. In this project, we implemented the algorithm presented in the referenced article and utilizing the advanced optimization capabilities of Mathematica, we explore some key examples that illustrate both the power of this approach as well as recognize important issues and challenges that lay ahead and may lead to even greater future improvements.
Theory of the Meijer-G Function
Theory of the Meijer-G Function
The Meijer-G function is one of the most generic special functions that can express all the elementary functions as well as many advanced special functions like many kinds of Hypergeometric functions, Bessel functions etc. The main definition of the Meijer-G function employes a contour integration over the complex plain.
mn
G
pq
a 1 a n a n+1 a p |
b 1 b m b m+1 b q |
1
2πi
Γ(1-- s)…Γ(1-- s)Γ(+ s)…Γ(+s)
a
1
a
n
b
1
b
m
Γ(+s)…Γ(+s)Γ(1--s)…Γ1--s
a
n+1
a
p
b
m+1
b
q
-s
z
where and , for , , and are integer numbers, all the differences - for the various and and being non-integer and . There are many choices for the path of integration, one of which is shown in the following picture. The main idea in this choice if that the path comes from γ- ∞, for some real γ, and wiggle around the poles of the integrand in such a way that it leaves the (red) poles of the terms for to the left and all (green) poles of the terms for to the right.
0≤m≤q
0≤n≤p
m
n
p
q
a
i
b
j
i∈{1,...,n}
j∈{1,...,m}
z≠0
Γ(+ s)
b
i
i∈{1,...,m}
Γ(1-- s)
a
i
i∈{1,...,n}
In order to see the scope of this function, we present some of the better known expressions that cover the usual elementary functions.
z
1,0
G
0,1
- |
0 |
Sin(z)=
π
1,0
G
0,2
2
z
4
- |
1 2 |
Sinh(z)=--
π
1,0
G
0,2
2
z
4
- |
1 2 |
ArcTan(z)=
1
2
1,2
G
2,2
2
z
1 2 |
1 2 |
Log(1+z)=z
.
1,2
G
2,2
1,1 |
1,0 |
Also, most of the advanced special functions are also supported like the lower and upper incomplete gamma functions
γ(a,z)
1,1
G
1,2
1 |
1,a |
Γ(a,z)
2,0
G
1,2
1 |
a,0 |
the Bessel function of the first and second kind along with their modified variants
J n 1,0 G 0,2 2 z 4
| Y n 2,0 G 1,3 2 z 4
| ||||
I n -n 1,0 G 0,2 2 z 4
| K n 1 2 2,0 G 0,2 2 z 4
|
as well as most hypergeometric functions like the Gauss’ hypergeometric function or even the generalized hypergeometric function (a;b;z) for , of which it is a generalization.
p
F
q
p>q+1
2 F 1 a 1 a 2 b 1 Γ( b 1 Γ( a 1 a 2 1,2 G 2,2
| ||||
p F q
Γ( b 1 b q Γ( a 1 a q 1,p G p,q+1
|
In addition to this wide functional coverage, Meijer-G is characterized by excellent analytic properties like the h-th order derivative, being simply
h
z
h
d
h
dz
mn
G
pq
a 1 a n a n+1 a p |
b 1 b m b m+1 b q |
mn+1
G
p+1q+1
0, a 1 a n a n+1 a p |
b 1 b m b m+1 b q |
as well as the celebrated convolution theorem
∞
∫
0
mn
G
pq
a 1 a p |
b 1 b q |
μν
G
στ
c 1 c σ |
d 1 d τ |
1
η
n+μ,m+ν
G
q+σ,p+τ
ω
η
- b 1 b m c 1 c σ b m+1 b q |
- a 1 a n d 1 d τ a n+1 a p |
1
ω
m+ν,n+μ
G
q+τ,q+σ
η
ω
a 1 a n d 1 d τ a n+1 a n |
b 1 b m c 1 c σ b m+1 b q |
that allows for direct definite integration of a vast range of functions.
Meijer-G as a Symbolic Model
Meijer-G as a Symbolic Model
The properties presented in the previous section show that Meijer-G is an excellent function to be used for modeling, learning and symbolic regression purposes. In particular, when we are presented with a set of data, we are interested in finding a symbolic expression that fits the data as good as possible. Linear regression is the most prominent representative of this class of problems, where one tries to balance a line (or more generally a hyperplane) through the data in an optimal way to minimize the approximation error that is left out.
Symbolic regression is a more advanced kind of regression where, instead of trying to fit a simple line, one tries to discover the best symbolic expression among a particular class, that best approximates the data. If this class is large enough and contains the usual functions that commonly appear in practise, one can even hope to discover the actual law that generated the data. In this sense, Meijer-G has all the desirable characteristic in order to achieve this goal. It has a vast coverage of functions and can express all the common ones typically encountered in practical problems.
What is more, the Meijer-G is an analytic function over the whole complex domain, with the possible exception of the origin and the unit circle . That means that in the case of multidimensional data aggregated in its input, we can take it’s Taylor series and find out which factors contribute the most to each output. This is of fundamental importance in machine learning, as it may provide insights to how a black-box model (like a neural network) operates and which inputs affect a particular output. The authors of the referenced paper utilize this observation in order to propose an algorithm that attempts to build a superposition of Meijer-G functions that best fit the data generated by a neural-network in order to explain its behavior. In the next section, we will present their algorithm with more details.
In this project, we implement the above algorithm in Wolfram Language, and study it form the perspective of symbolic regression. In particular, we are interested in studying its behavior and assess its capability to model various kinds of data. Furthermore, we want to examine its capacity to discover the function that generated the data. For the purposes of this exposition withing the time confines of the summer school, we limit ourselves to the study only some specific, simple examples that serve as a proof of concept for the suitability of Meijer-G in performing symbolic regression and also allow us to recognize some key issues, that need further investigation.
Symbolic regression is a more advanced kind of regression where, instead of trying to fit a simple line, one tries to discover the best symbolic expression among a particular class, that best approximates the data. If this class is large enough and contains the usual functions that commonly appear in practise, one can even hope to discover the actual law that generated the data. In this sense, Meijer-G has all the desirable characteristic in order to achieve this goal. It has a vast coverage of functions and can express all the common ones typically encountered in practical problems.
What is more, the Meijer-G is an analytic function over the whole complex domain, with the possible exception of the origin
z=0
z=1
In this project, we implement the above algorithm in Wolfram Language, and study it form the perspective of symbolic regression. In particular, we are interested in studying its behavior and assess its capability to model various kinds of data. Furthermore, we want to examine its capacity to discover the function that generated the data. For the purposes of this exposition withing the time confines of the summer school, we limit ourselves to the study only some specific, simple examples that serve as a proof of concept for the suitability of Meijer-G in performing symbolic regression and also allow us to recognize some key issues, that need further investigation.
The Basic Theorem
The Basic Theorem
The fundamental result, showed in the referenced paper is that it takes only a small number of Meijer-G families in order to cover all the important functions. In particular, the theorem states that:
Consider the set of Meijer-G functions of the form
Consider the set of Meijer-G functions of the form
f
mn
G
pq
r
z
a 1 a n a n+1 a p |
b 1 b m b m+1 b q |
where ,...,,...,∈; and the hyperparameters belonging to the configuration set . This set of functions includes al the functions with the form
a
1
a
p,
b
1
b
q
r,s∈
(m,n,p,q)∈={(1,0,0,2),(0,1,3,1),(2,1,2,3),(2,2,3,3),(2,0,1,3)}
f(z)=Φ(w)
l
z
t
z
with ; where ,, are the Bessel functions and is Euler’s Gamma function.
w,l,t∈
Φ∈id,sin,cos,sinh,cosh,exp,log(1+.),arcsin,arctan,,,,,Γ
J
n
Y
n
I
n
1
1+.
J
n
Y
n
I
n
Γ
In particular, the first family is covers the functions and , the second the as well as the various monomials, the third the and the gamma functions such as , the fourth the and , and the fifth the various Bessel function such as , and .
sin,cos,sinh
cosh
id
exp
Γ
log(1+.),arcsin
arctan
J
n
Y
n
I
n
The Fitting Algorithm: Symbolic Pursuit
The Fitting Algorithm: Symbolic Pursuit
The fitting algorithm is call symbolic pursuit and conceptually, it is very close to the well-know projection pursuit. The main idea is that we have an error or more properly a residual function whose norm expresses how far we are from actually capturing the data. What we to do then is attempt to fit a model function to the residual, in order to “explain” a part of it, and make the residual as small as possible. The model function in our case in the Meijer-G function. Since our model function will typically be unable to fully explain the whole residual function, we repeatedly add more terms trying to further reduce (or “hunt down”) the residual, one step at a time and hopefully in the most optimal way possible. The process stops when the improvement in the residual has become too small (here we select a relative improvement less that 10%), effectively meaning that what is left is probably beyond the reach of even a linear combination of our model functions. The reason for saying “probably” is because the optimization problem in each step is non-convex and the search surface may contain regions that have not being explored which may lead to better approximations that what we discovered.The way the method supports multidimensional data is by utilizing each Meijer-G function as a ridge function, passing the data first through an affine transformation before handing the over to the Meijer-G. The general form of the affine transformation is +...+, where the are the coefficients of the transformation and the are the individual features or factors of the data. However, since the authors of the paper try to avoid encounter or surpassing the unit circle in the complex plain, instead of this term they use the normalized form which for data generated within the n-dimensional hypercube is guaranteed to be between 0 and 1. The sing indicates that if the quantity is below zero, is taken to be zero. Of course, as we will see, this creates a series of issues for symbolic regression, such as not being able to capture the functional behaviour outside the hypercube (which can be partially remedied with normalizing the data) as well as not allowing allowing particular forms of functions to be expressed. However, we opt to keep these limitations for now, since we are interested in giving a proof of concept of the methods capabilities for symbolic regression and leave the various enhancements for the future.The detailed algorithm in pseudocode is presented in the supplemental material for the referenced paper so we will not give a detailed presentation where. However, we will go over the actual code line by line, explaining the intermediate steps and clarifying the implementation details of the algorithm.
v
1
x
1
v
n
x
n
v
i
x
i
+
<v,x>
v
n
+
Implementing Symbolic Pursuit
Implementing Symbolic Pursuit
Our code first starts with initializing some helper structures and functions that allow us to easily build up and access the different Meijer-G families symbolically. In particular, H is a list of four associations giving the hyperparameters for each family.
(m,n,q,p)
H=Association@@MapThread[Rule,{{"m","n","p","q"},#}]&/@{{1,0,0,2},{0,1,3,1},{2,1,2,3},{2,2,3,3},{2,0,1,3}};
Then we construct the function meijerG that accepts a single argument from 1 to 5, returning the corresponding Meijer-G function, with hyperparameters set to their corresponding family values and containing the parameters in symbolic form. The function returned is a pure function.
meijerG[f_]:=(MeijerG[{Table[,{i,H[[f,"n"]]}],Table[,{i,H[[f,"n"]]+1,H[[f,"p"]]}]},{Table[,{i,H[[f,"m"]]}],Table[,{i,H[[f,"m"]]+1,H[[f,"q"]]}]}//N,#]&);
a
i
a
i
b
i
b
i
The parameters in symbolic form for each Meijer-G function (including the coefficients of the affine transformation) are generated using the parameters function which accepts as arguments the Meijer-G family as well as the dimensions of the data.
parameters[f_,dim_]:=Flatten@{w,Table[,{i,dim}],Table[,{i,H[[f,"n"]]}],Table[,{i,H[[f,"n"]]+1,H[[f,"p"]]}],Table[,{i,H[[f,"m"]]}],Table[,{i,H[[f,"m"]]+1,H[[f,"q"]]}]}
v
i
a
i
a
i
b
i
b
i
The final helper function is lossF that returns the loss for a particular Meijer-G term in symbolic form. The symbolic quantities are the Meijer-G parameters, the affine transform coefficients and the weight of the Meijer-G. The lossF function takes as arguments the explanatory variables, the corresponding value of the response variables (which typically are the values of the residual function) and the family of the used Meijer-G. Note that the Max function in the affine transformation is used to keep the computation away from a potential singularity at zero. The loss function that is evaluated and returned is the mean squared difference between the response and the estimated Meijer-G model values.
lossF[xTrain_List,yTrain_List,f_Integer]:=Module{dim,n,vVec,zVec,affine},dim=Length[xTrain[[1]]];n=Length[xTrain];(*AffineTransformCoefficients*)vVec=Table[,{i,dim}];(*ExplanatoryVariables*)zVec=Table[,{i,dim}];(*AffineTransform.*)affine=Max0.01,;(*Returnthesymbolicloss.Thesymbolicvariablesaretheparameterstooptimize*)Mean@Table[,{i,1,n}]
v
i
z
i
vVec.zVec
Norm[vVec]Sqrt[dim]
2
(yTrain[[i]]-wmeijerG[f][affine]/.MapThread[Rule,{zVec,xTrain[[i]]}])
The first important function of our implementation has the task to fit a single Meijer-G function to a particular set of data. It is called FitMeijer and expects as arguments the values of the explanatory variables in xTrain, the response (or residual function) values in yTrain, the families the Meijer-G over which to search for an optimal fit in fCheck and whether the search method should be done "Locally", "Globally" or using "Both" approaches. If these two arguments are not provided, the method will search among all the five families using both local and global search and return the best result it finds. Local and Global search is performed using Wolfram Languages’ own FindMinimum and NMinimize functions respectively. The authors of the referenced paper opt to use Conjugate Gradient Descent, but we choose to let Mathematica’s methods to decide the best optimization algorithm for each case. The user has also the option to define the accuracy and precision goals for the optimization methods using AG and PG, in order to tweak them towards better and slower or quicker but less accurate results. The default value we have selected for these arguments are 3 digits, which usually give a pretty good tradeoff between accuracy/precision and time. We give some additional explanations as comments in the code that follows. The function returns the chosen fitted term as a list containing the family, the loss and a list of rules connecting the symbolic parameters to their optimal values.
FitMeijerG[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Both",AG_:3,PG_:3]:=Module[{dim,n,optimizeLocally,optimizeGlobally,optimizeList,weightedLoss,minPos},(*Datadimensions*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*Checkwhetherthealocalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sFindMinimumfunctionthatperformsalocalsearch.*)If[opt=="Both"||opt=="Locally",optimizeLocally=Table[Echo[{f}~Join~FindMinimum[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->AG,PrecisionGoal->PG]],{f,fCheck}],optimizeLocally={}];(*Checkwhethertheaglobalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sNMinimizefunctionthatperformsaglobalsearch.*)If[opt=="Both"||opt=="Globally",optimizeGlobally=Table[Echo[{f}~Join~NMinimize[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->10,PrecisionGoal->10]],{f,fCheck}],optimizeGlobally={}];(*Combinethepreviousresultinasinglelist.*)optimizeList=Join[optimizeLocally,optimizeGlobally];(*Ifsomewistoobig,weightthelossesappropriately.*)If[Norm[optimizeList[[All,3,1]][[All,2]]]>100,weightedLoss=optimizeList[[All,2]]*optimizeList[[All,3,1]][[All,2]];(*Ifsomeweightswarebig,weightthelossvaluesbythecorrespondingweightsandselectthebestresult.*)minPos=Flatten@Position[weightedLoss,Min[weightedLoss]];,(*Ifweightwisnotbig,returntheresultwiththeminimumlossvalue.*)minPos=Flatten@Position[optimizeList[[All,2]],Min[optimizeList[[All,2]]]];];Return[Flatten[optimizeList[[minPos]],1]];]
We would like to comment the last few lines of code here. We noticed that in some cases, the optimization algorithm may return a fitted Meijer-G with an exceedingly high value for the weight , while an equivalent choice with similar loss value and a much smaller w is present (usually this choice would be closer to the way the data were generated). For this reason, we want the solution with the big value to be selected only when the corresponding gain in loss is significant. One simple way to do this is to multiply the losses with the w values and select the best option. However, there are times where the optimization would go the other way, and providing a fitted Meijer-G function with an exceedingly small weight w and a very bad level of loss. So, in this case we should select the optimum result without multiplying by the weights, as they will lead us to selecting something extremely sub-par. That is why we choose to perform the weighted selection only when the norm of the weights vector is high (heuristically we have put the value 100).
Finally, the second important function is LearnSymbolicModel. This is a rather long function whose purpose is to build upon FitMeijerG in order to successively add new Meijer-G terms and create the final model. Due to the term-by-term operation of the function and in order to pay closer attention to the inner-workings of each step, we chose to use mainly procedural programming for constructing it.
The function accepts as arguments the data for the explanatory and the response variables as xTrain and yTrain respectively. The optional data are the same as the ones for FitMeijerG, with the only difference of selecting local search for the default method, since it tends to being faster and yielding better results. The basic operation LearnSymbolicModel does is to iteratively apply the FitMeijerG function, organize the returned results and keep a track of the progress made. Every new term is placed in the list curModelTerms and the family it belongs to is cataloged in fSet. preResidual and curResidual keep track of the progress and when its relative improvement becomes less that 10% the algorithm is terminated. This is being taken care of the While loop in the function.
From the second model term onwards, backfitting is performed for each new term that gets in the model. The purpose of backfitting is to adjust the previous terms in the presence of the new term, in order to improve the overall model. Backfitting is performed by the For loop that resides inside the While loop of the function. In particular, what is does is the function visits each of the previous terms, takes away their contribution to minimizing the residual, and fitting them again to the resulting residual. After a term has been backfitted, its new contribution to the residual is being incorporated, in theory leading to a smallest residual. The reason we say “in theory” is that in practise, as we also comment in an example further down, we have observed that there are cases where the backfitting leads to worst results. We attribute this to the fact that, as it is implemented, backfitting does not start from the current set of parameters but, as it customary, selects a random initial point and optimizes from there. This may lead the final result to a worse minimum compared to where it was. There are many ways that the implementation can be improved, starting from making sure the current state of the model parameters are taken into account, but for now we choose to discard the last iteration along with the new term should this happen, and recover its immediate previous step, stored in the curModelTermsBackup list.
w
w
Finally, the second important function is LearnSymbolicModel. This is a rather long function whose purpose is to build upon FitMeijerG in order to successively add new Meijer-G terms and create the final model. Due to the term-by-term operation of the function and in order to pay closer attention to the inner-workings of each step, we chose to use mainly procedural programming for constructing it.
The function accepts as arguments the data for the explanatory and the response variables as xTrain and yTrain respectively. The optional data are the same as the ones for FitMeijerG, with the only difference of selecting local search for the default method, since it tends to being faster and yielding better results. The basic operation LearnSymbolicModel does is to iteratively apply the FitMeijerG function, organize the returned results and keep a track of the progress made. Every new term is placed in the list curModelTerms and the family it belongs to is cataloged in fSet. preResidual and curResidual keep track of the progress and when its relative improvement becomes less that 10% the algorithm is terminated. This is being taken care of the While loop in the function.
From the second model term onwards, backfitting is performed for each new term that gets in the model. The purpose of backfitting is to adjust the previous terms in the presence of the new term, in order to improve the overall model. Backfitting is performed by the For loop that resides inside the While loop of the function. In particular, what is does is the function visits each of the previous terms, takes away their contribution to minimizing the residual, and fitting them again to the resulting residual. After a term has been backfitted, its new contribution to the residual is being incorporated, in theory leading to a smallest residual. The reason we say “in theory” is that in practise, as we also comment in an example further down, we have observed that there are cases where the backfitting leads to worst results. We attribute this to the fact that, as it is implemented, backfitting does not start from the current set of parameters but, as it customary, selects a random initial point and optimizes from there. This may lead the final result to a worse minimum compared to where it was. There are many ways that the implementation can be improved, starting from making sure the current state of the model parameters are taken into account, but for now we choose to discard the last iteration along with the new term should this happen, and recover its immediate previous step, stored in the curModelTermsBackup list.
LearnSymbolicModel[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Locally",AG_:3,PG_:3]:=Module{fSet,curResidual,preResidual,curModelTerms,fitResult,dim,n,vVec,zVec,affine,impPer,curModelTermsBackup,curResidualBack,j},(*Keeparecordofthefamiliesoftermspresentinthemodel.*)fSet={};(*Tworesidualtermstokeeptrackoftheimprovement.*)curResidual=yTrain;preResidual={};(*Alistofthetermscomprisingourmodel.*)curModelTerms={};Print["Fitting first term"];(*InvokeFitMeijerGtofitthefirsttermofthemodel.*)fitResult=FitMeijerG[xTrain,yTrain,fCheck,opt];Print["Selected: "<>ToString[First@fitResult]];(*Dimentionsoftheexplanatorydata*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*AffineTransformationCoefficients*)vVec=Table[,{i,dim}];(*ExplanatoryVariables*)zVec=Table[,{i,dim}];(*AffineTransformation*)affine=Max0.01,;(*Addnewmodeltothelist*)curModelTerms=curModelTerms~Join~{wmeijerG[First@fitResult][affine]/.Last[fitResult]};(*Avoidarithmeticinconsistencies*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Catalogthehyperparametersused*)fSet=fSet~Join~{First@fitResult};(*UpdateResiduals*)preResidual=curResidual;curResidual=curResidual-Table[Total[curModelTerms]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Calculaterelativeimprovement.*)impPer=(Norm[preResidual]-Norm[curResidual])Norm[preResidual];Print["Improvement: "<>ToString@(impPer*100)<>"%"];(*Proceduarallyrepeatuntiltherelativeerrorstopsimprovingatleastby10%*)While[impPer>0.01,(*Backupcurrentmodelterms*)curModelTermsBackup=curModelTerms;(*Fitanewterm*)Print["Fitting new term."];fitResult=FitMeijerG[xTrain,curResidual,fCheck,opt];Print["Selected: "<>ToString@First@fitResult];(*Addthenewtermtothemodellist*)curModelTerms=curModelTerms~Join~{wmeijerG[First@fitResult][affine]/.Last[fitResult]};(*Avoidarithmeticinconsistencies.*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Catalogthehyperparameters.*)fSet=fSet~Join~{First@fitResult};(*Updatetheresiduals*)preResidual=curResidual;curResidual=curResidual-Table[Last[curModelTerms]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Backfitthepreviousterms*)For[j=1,j<Length[curModelTerms],j++,(*Excludej-thtermfromresidual*)curResidualBack=curResidual+Table[curModelTerms[[j]]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Backfitthej-thterm,keepingitsfamily*)Print["Backfitting term: "<>ToString@j];fitResult=FitMeijerG[xTrain,curResidualBack,{fSet[[j]]},opt];(*Updatethetermlist*)curModelTerms[[j]]=wmeijerG[First@fitResult][affine]/.Last[fitResult];(*Aoidarithmeticinconsistencies*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Reconstructtheupdatedresidual*)curResidual=curResidualBack-Table[curModelTerms[[j]]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];];(*Checktherelativeimprovementfromaddinganewtermandbackfittingthepreviousones.*)impPer=(Norm[preResidual]-Norm[curResidual])/Norm[preResidual];Print["Improvement: "<>ToString@(impPer*100)<>"%"];(*Iftherelativeimprovementisnegative,completelydiscardthelaststep.*)If[impPer<0,Print["Dropping last iteration."];curModelTerms=curModelTermsBackup];];Return[curModelTerms]
v
i
z
i
vVec.zVec
Norm[vVec]Sqrt[dim]
One final issue we would like to comment is the command:
curModelTerms=Rationalize[curModelTerms,0.0001]//N;
As we state in the relevant comment, this has been included to avoid some arithmetic inconsistencies we observed when working with the MeijerG function. In particular, during our experiments we encounter the following MeijerG function.
In[]:=
MeijerG[{{1.3744564427009764`},{1.3934841550391641`}},{{0.6992373899390802`,0.6992373899390809`},{0.7612861841404257`}},Max[0.01`,1.`]]//TraditionalForm
z
1
Out[]//TraditionalForm=
2,1
G
2,3
z
1
1.37446,1.39348 |
0.699237,0.699237,0.761286 |
Plotting this function returned the following erratic results:
In[]:=
Plot[0.5263925535902085`MeijerG[{{1.3744564427009764`},{1.3934841550391641`}},{{0.6992373899390802`,0.6992373899390809`},{0.7612861841404257`}},Max[0.01`,1.`]],{,0,1}]
z
1
z
1
Out[]=
However, after rationalizing the parameters with high accuracy and re-evaluating them, lead to the correct output.
In[]:=
meijerGR=Rationalize[0.5263925535902085`MeijerG[{{1.3744564427009764`},{1.3934841550391641`}},{{0.6992373899390802`,0.6992373899390809`},{0.7612861841404257`}},Max[0.01`,1.`]],0.0001]//NPlot[meijerGR,{,0,1}]
z
1
z
1
Out[]=
0.526316MeijerG[{{1.37436},{1.39344}},{{0.699187,0.699187},{0.761194}},Max[0.01,]]
z
1
Out[]=
Key Examples
Key Examples
In this section, we provide some simple but key examples that illustrate how the method behaves for single and multi-dimensional symbolic regression and serve as a proof of concept for its potential.
A Linear Function
A Linear Function
Here, we generate the synthetic data that will be used to fit the Meijer-G model. We use the function examined in the referenced paper which is ) , verifying their results. We draw uniformly 30 test points and limit the search to the first two families (to see examine the reconstruction of the original function) and perform a local optimization in the lowest setting for search for efficiency.
f(,,
x
1
x
2
x
3
=+2+3
x
1
x
2
x
3
f[x1_,x2_,x3_]:=x1+2x2+3x3dim1=3;n1=30;xTrain1=RandomReal[{0,1},{n1,dim1}];yTrain1=f@@@xTrain1;
FitMeijerG[xTrain1,yTrain1,{1,2}]
»
1
»
{0.020412,{w12.4068,0.404515,0.875877,1.61407,1.07297,-0.501746}}
v
1
v
2
v
3
b
1
b
2
»
2
»
{0.0000343944,{w7.55924,0.764899,1.50037,2.25175,2.00547,1.89173,1.89173,1.00591}}
v
1
v
2
v
3
a
1
a
2
a
3
b
1
dim1=Length[xTrain1[[1]]];n1=Length[xTrain1];(*AffineTransformCoefficients*)vVec1=Table[,{i,dim1}];(*IndependentVariables*)zVec1=Table[,{i,dim1}];(*AffineTransformation*)affine1=Max0.01,;meijerG[2][affine1]/.{w7.5592426571296665`,0.7648989617722781`,1.5003693665863185`,2.251754413632342`,2.0054710884060425`,1.891733848858679`,1.8917338328753186`,1.0059131857219576`}//TraditionalForm
v
i
z
i
vVec1.zVec1
Norm[vVec1]Sqrt[dim1]
v
1
v
2
v
3
a
1
a
2
a
3
b
1
Out[]//TraditionalForm=
0.9997450.000442097;0.886263,0.886263;
1.00547
max(0.01,1.)
z
1
1
F
2
1
max(0.01,1.)
z
1
(*De-noisethroughRationalize*)wmeijerG[2][affine1]/.Rationalize[{w7.5592426571296665`,0.7648989617722781`,1.5003693665863185`,2.251754413632342`,2.0054710884060425`,1.891733848858679`,1.8917338328753186`,1.0059131857219576`},0.01]//TraditionalForm
v
1
v
2
v
3
a
1
a
2
a
3
b
1
Out[]//TraditionalForm=
68max(0.01,)
z
1
9
2
Γ
8
9
In[]:=
Expand//TraditionalForm
680.205357++
10
z
1
13
3
z
2
2
9
z
3
4
9
2
Gamma
8
9
Out[]//TraditionalForm=
1.02752+2.00366+3.00549
z
1
z
2
z
3
Discussion
Discussion
We observe here the method faithfully capturing the symbolic expression, besides the low accuracy and precision settings. What is surprising is that local optimization works better than global optimization, which many times fail to yield the correct result. Currently the method is rather slow, but given the fact that local optimization works so well, we can expect that there is a huge potential for further optimizations (the authors of the referenced paper themselves use the Conjugate Gradient Descent method, to perform the optimization). An intriguing point is the use of Rationalize[] to perform de-noising (here the noise is due to the low arithmetic accuracy/precision we selected). It is based on the reasoning that most forms of Meijer-G that actually reduce to simpler forms, have parameters that are either integers, half integers or generally fractions involving small numerators or denominators. Thus, we expect that trying to approximate these parameters with a simple fraction has a good chance of revealing the underlying function that produced the data.
Sinus of Sqrt
Sinus of Sqrt
In this example, we generate data from the composition of a sinus and square root function. The reason we are doing this is twofold. First, we would like to test the method against a more complex function. Secondly, a limitation of the algorithm as implemented in the referenced paper is that it does not search for powers of the argument or coefficients with a non-unity norm. This was done to avoid encountering singularities of the Meijer-G function. The result is that functions like Sin or ArcTan are left outside of the search space. Although, we deem that this is acceptable for this proof of concept presentation of the algorithm, it is one of the key issues that need to be address in order to have an acceptably working version of the method and given that it can be supplemented quite easily, is one of the first improvement we are planning to include in the implementation.
We start again with generating the data. Note that, in order for the method the be expose to the whole behavior of the function, we generated data beyond the usual interval, extending it over to [0,10]. This as we said before, has the risk of encountering singularities, that has to be taken into account by a full implementation. However, for this example, we did not encountered such issues. Also AccuracyGoal and PrecisionGoal were left to their default values.
Of course, as we will see in a following example, due to the high plasticity of the Meijer-G function, the algorithm still manages to perfectly capture the behaviour of these functions in the supporting domain of the data, but it has poor extrapolation and of course precludes the possibility of actually discovering the generating function itself.
We start again with generating the data. Note that, in order for the method the be expose to the whole behavior of the function, we generated data beyond the usual interval, extending it over to [0,10]. This as we said before, has the risk of encountering singularities, that has to be taken into account by a full implementation. However, for this example, we did not encountered such issues. Also AccuracyGoal and PrecisionGoal were left to their default values.
Of course, as we will see in a following example, due to the high plasticity of the Meijer-G function, the algorithm still manages to perfectly capture the behaviour of these functions in the supporting domain of the data, but it has poor extrapolation and of course precludes the possibility of actually discovering the generating function itself.
f[x_]:=Sin[2
x
]dim1=1;n1=30;xTrain1=RandomReal[{0,10},{n1,dim1}]yTrain1=f@@@xTrain1Out[]=
{{6.92272},{0.588294},{7.37604},{4.57938},{7.09196},{6.94798},{1.52044},{7.76752},{4.61766},{8.12614},{5.01827},{0.135216},{1.06523},{1.58039},{4.42753},{0.428353},{1.85137},{7.75495},{0.501541},{8.10403},{4.54416},{3.3693},{5.91545},{3.28924},{6.71158},{8.27365},{4.91209},{9.31893},{0.978866},{0.698642}}
Out[]=
{-0.852616,0.999323,-0.752212,-0.907925,-0.81749,-0.847565,0.625265,-0.651173,-0.915261,-0.549618,-0.973188,0.67091,0.880726,0.586981,-0.875634,0.96592,0.408027,-0.654591,0.988103,-0.556085,-0.900889,-0.505135,-0.988478,-0.466794,-0.892041,-0.505875,-0.961127,-0.176866,0.917933,0.994914}
We proceed with fitting the model, using again a subset of the whole famil to see how the algorithm will fare with the correct one.
FitMeijerG[xTrain1,yTrain1,{1,2}]
»
1
»
{1.59851×,{w1.77319,1.,0.499797,-0.00054121}}
-7
10
v
1
b
1
b
2
»
2
»
{0.456793,{w1.11145,1.,-0.820628,0.951051,0.951051,-2.81424}}
v
1
a
1
a
2
a
3
b
1
Derive Meijer-G from fitted coefficients
dim1=Length[xTrain1[[1]]];n1=Length[xTrain1];(*AffineTransformCoefficients*)vVec1=Table[,{i,dim1}];(*IndependentVariables*)zVec1=Table[,{i,dim1}];affine1=Max0.01,;
v
i
z
i
vVec1.zVec1
Norm[vVec1]Sqrt[dim1]
The raw Meijer-G that was returned by the algorithm is:
wmeijerG[1][affine1]/.{w1.7731859624851862`,1.`,0.4997967885006436`,-0.0005412102542344671`}//TraditionalForm
v
1
b
1
b
2
Out[]//TraditionalForm=
1.773192
0.249628
max(0.01,1.)
z
1
J
0.500338
max(0.01,1.)
z
1
We notice that it is a special function form with a coefficients very close to . Using Rationalize[] to perform the de-noising, we get:
1
2
wmeijerG[1][affine1]/.{w1.7731859624851862`,1.`,Rationalize[0.4997967885006436`,0.1],Rationalize[-0.0005412102542344671`,0.1]}//TraditionalForm
v
1
b
1
b
2
Out[]//TraditionalForm=
1.00041cos1.5708-2.
max(0.01,1.)
+0.z
1
Removing the Max operator, and querying Wolfram Alpha for the irrational seeming number in the cosine, we find out that this is in fact .
π
2
In[]:=
Cos[1.5707963267948966-2.]
z
1
In[]:=
closed form 1.5707963267948966 | |||
|
Out[]=
π
2
Substituting, yields the correct result.
In[]:=
Cos-2//TraditionalForm
π
2
z
1
Out[]//TraditionalForm=
sin2
z
1
It is worth noting that FindFormula[], which is the systems native function for performing symbolic regression fails to recognize the correct form (due to the low number of available data). So this means means that the particular method may have to offer improvements to the way this system function works internally.
In[]:=
FindFormula[Transpose@{Flatten@xTrain1,yTrain1},x]
Out[]=
-0.0723712+Cos[0.50x]
In[]:=
plt1=ListPlot[Transpose@{Flatten@xTrain1,yTrain1}];plt2=Plot[-0.07237118383366715`+Cos[0.5`2.x],{x,0,10}];Show[plt1,plt2]
Out[]=
Discussion
Discussion
In this example, we see the great potential of the method to capture more complex functional forms, but we also encountered the first obstacles that need to be addressed. In order for the sinusoidal form to be expressed, he had to go to a larger interval [0,10] and increase the AccuracyGoal/PrecisionGoal to their default values. The method increases the FindFormula[] capabilities but it has to be extended in order to account for pure trigonometric functions and be able to capture the whole expressive range of the basic theorem.
Logarithmic Function
Logarithmic Function
Now we examine an interesting case, that although symbolically will not be recognized with local optimization, it shows the great possibilities of this method for extrapolation. Again, we create 30 random sample points from out test function that we will feed to the method.
f[x_]:=Log[1+x]dim1=1;n1=30;xTrain1=RandomReal[{0,1},{n1,dim1}]yTrain1=f@@@xTrain1
Out[]=
{{0.309572},{0.563462},{0.0734089},{0.953388},{0.347701},{0.896949},{0.31881},{0.00620702},{0.495482},{0.60002},{0.283853},{0.0263019},{0.499749},{0.976985},{0.121882},{0.405182},{0.114685},{0.675611},{0.144866},{0.448218},{0.685102},{0.0656391},{0.335974},{0.850288},{0.388904},{0.199831},{0.915295},{0.527962},{0.290065},{0.492428}}
Out[]=
{0.2697,0.446902,0.0708394,0.669565,0.2984,0.640247,0.27673,0.00618784,0.402449,0.470016,0.249866,0.025962,0.405298,0.681573,0.115008,0.340167,0.108572,0.516178,0.135288,0.370334,0.521826,0.0635747,0.289661,0.615342,0.328515,0.182181,0.649872,0.423935,0.254692,0.400405}
In[]:=
FindFormula[Transpose@{Flatten@xTrain1,yTrain1},x]
Out[]=
1.x-0.500003+0.333334-0.249726+0.197365-0.154348+0.107842-0.0585491+0.0206478-0.00341631
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
x
10
x
Fitting a Meijer-G yields the following result
FitMeijerG[xTrain1,yTrain1,{1,2,3,4,5},"Locally"]
»
1
»
{0.00019281,{w1.24826,1.,1.25559,0.690974}}
v
1
b
1
b
2
»
2
»
{0.154867,{w1.,1.,1.,1.,1.,1.}}
v
1
a
1
a
2
a
3
b
1
»
3
»
{0.0000636741,{w1.11436,1.,1.21785,0.9575,1.05304,1.05304,0.790169}}
v
1
a
1
a
2
b
1
b
2
b
3
»
4
»
6.99886×,{w1.05228,1.,1.11981,1.11981,0.981349,1.03148,1.03148,0.891361}
-6
10
v
1
a
1
a
2
a
3
b
1
b
2
b
3
Derive Meijer-G from fitted coefficients
Derive Meijer-G from fitted coefficients
dim1=Length[xTrain1[[1]]];n1=Length[xTrain1];(*AffineTransformCoefficients*)vVec1=Table[,{i,dim1}];(*IndependentVariables*)zVec1=Table[,{i,dim1}];affine1=Max0.01,;
v
i
z
i
vVec1.zVec1
Norm[vVec1]Sqrt[dim1]
The Meijer-G returned by the fitting algorithm is:
wmeijerG[4][affine1]/.{w1.0522767182195811`,1.`,1.1198052415108566`,1.1198052415108566`,0.981349437811322`,1.0314821147175017`,1.0314821147175017`,0.8913610693471419`}//TraditionalForm
v
1
a
1
a
2
a
3
b
1
b
2
b
3
Out[]//TraditionalForm=
1.05228max(0.01,1.)
2,2
G
3,3
z
1
1.11981,1.11981,0.981349 |
1.03148,1.03148,0.891361 |
Here we observe something strange. The actual function is covered by the forth family, and it has the following expression
wmeijerG[4][affine1]/.{w1,1.`,1,1,1,1,1,0}//TraditionalForm
v
1
a
1
a
2
a
3
b
1
b
2
b
3
In[]:=
Hypergeometric2F1Regularized[1,1,2,-]//TraditionalForm
z
1
z
1
Out[]//TraditionalForm=
log(+1)
z
1
However, we see that the optimization algorithm chooses to keep the term away from 0. This could be attributed to the optimization being local (however trying the global optimization approach still does not yield the above form). However, it is interested to visualize the end result.
b
3
Plot[{Log[1+],wmeijerG[4][affine1]/.{w1.0522767182195811`,1.`,1.1198052415108566`,1.1198052415108566`,0.981349437811322`,1.0314821147175017`,1.0314821147175017`,0.8913610693471419`}},{,0,3}]
z
1
v
1
a
1
a
2
a
3
b
1
b
2
b
3
z
1
Out[]=
In contrast, we can compare the above result with the fitted MacLaurin expansion given by the FindFormula[] method.
In[]:=
Plot[{Log[1+],1.0000001486555359`x-0.5000025249619084`+0.33333404744665834`-0.24972552408590057`+0.19736470560234445`-0.1543483340347282`+0.1078423343232992`-0.0585491303492433`+0.020647753101444084`-0.0034163128028319475`/.{x->}},{,0,3},PlotRange->{-10,2}]
z
1
2
x
3
x
4
x
5
x
6
x
7
x
8
x
9
x
10
x
z
1
z
1
Out[]=
Discussion
Discussion
What we see in this example is that the Meijer-G fitting method, not only interpolates almost perfectly the correct function in the range [0, 1] but also extrapolates it exceedingly well! Although the end result does not discover the actual form of a function, the algorithm manages to find a set of different parameters that give an excellent fit to the given data, as well as beyond, up to an extend. In other words, we observe the big plasticity that characterises the Meijer-G function, providing multiple ways of approximating and producing insights for a given dataset.
Approximating a two term, two variable function
Approximating a two term, two variable function
In this example, we examine how the full method of successively fitting a new Meijer-G function and backfitting the previous terms works.
fun[z1_,z2_]:=z1+Exp[z2]dim1=2;n1=30;nt1=10;xTrain1=RandomReal[{0,1},{n1,dim1}];yTrain1=fun@@@xTrain1;
In[]:=
LearnSymbolicModel[xTrain1,yTrain1,{1,2,3}]
Fitting first term
»
1
»
{1,0.014829,{w66.518,1.91227,3.6619,0.595597,-3.1776}}
v
1
v
2
b
1
b
2
»
2
»
{2,0.00467592,{w2.83967,0.723879,1.30212,1.60347,1.827,1.827,0.617931}}
v
1
v
2
a
1
a
2
a
3
b
1
»
3
»
{3,0.0324931,{w1.27307,0.980997,1.01897,1.32458,1.19375,0.800335,0.800335,0.675894}}
v
1
v
2
a
1
a
2
b
1
b
2
b
3
Selected: 2
Improvement: 96.739%
Fitting new term.
»
1
»
{1,0.0023664,{w-7.33799,0.335749,1.5784,3.51127,4.02163}}
v
1
v
2
b
1
b
2
»
2
»
{2,0.0046448,{w1.,1.00021,0.99979,0.999684,0.999912,0.999912,0.999774}}
v
1
v
2
a
1
a
2
a
3
b
1
»
3
»
{3,0.00226747,{w-0.00175184,0.650312,3.03507,-3.11235,2.00145,3.01612,3.03471,3.20228}}
v
1
v
2
a
1
a
2
b
1
b
2
b
3
Selected: 3
Backfitting term: 1
»
2
»
{2,0.00159015,{w2.77337,0.76055,1.27633,1.58143,1.82303,1.82303,0.595206}}
v
1
v
2
a
1
a
2
a
3
b
1
Improvement: 41.6687%
Fitting new term.
»
1
»
{1,0.00137629,{w-0.966484,0.568943,1.72488,2.38991,2.92066}}
v
1
v
2
b
1
b
2
»
2
»
{2,0.00155794,{w1.,1.,0.999998,0.999949,0.999999,0.999999,1.00004}}
v
1
v
2
a
1
a
2
a
3
b
1
»
3
»
{3,0.00146391,{w-0.147152,-0.549823,2.51801,0.429241,0.577789,1.42118,1.42118,2.72123}}
v
1
v
2
a
1
a
2
b
1
b
2
b
3
Selected: 1
Backfitting term: 1
»
2
»
{2,0.00123736,{w2.70709,0.775967,1.269,1.56988,1.84826,1.84826,0.584008}}
v
1
v
2
a
1
a
2
a
3
b
1
Backfitting term: 2
»
3
»
{3,0.0017666,{w-1.16709,-0.857967,2.71457,0.477301,0.468992,1.0593,1.0593,2.28324}}
v
1
v
2
a
1
a
2
b
1
b
2
b
3
Improvement: -8.6115%
Dropping last iteration.
Out[]=
3.33563HypergeometricPFQ{0.0136986},{1.24167,1.24167},,-0.00175747MeijerG[{{-3.11236},{2.00146}},{{3.01613,3.03478},{3.20238}},Max[0.01,0.227848(0.650407+3.03509)]]
1
Max[0.01,0.475862(0.760563+1.27632)]
z
1
z
2
25/43
Max[0.01,0.475862(0.760563+1.27632)]
z
1
z
2
z
1
z
2
In[]:=
Total[3.3356260365964623`HypergeometricPFQ[{0.0136986301369863`},{1.2416666666666667`,1.2416666666666667`},1/Max[0.01`,0.47586206896551725`(0.7605633802816901`+1.2763157894736843`)]],-0.0017574692442882249`MeijerG[{{-3.1123595505617976`},{2.0014556040756912`}},{{3.0161290322580645`,3.034782608695652`},{3.2023809523809526`}},Max[0.01`,0.22784810126582278`(0.6504065040650406`+3.0350877192982457`)]]]//TraditionalForm
z
1
z
2
25/43
Max[0.01`,0.47586206896551725`(0.7605633802816901`+1.2763157894736843`)]
z
1
z
2
z
1
z
2
Out[]//TraditionalForm=
3.335630.0136986;1.24167,1.24167;-0.00175747max(0.01,0.227848(0.650407+3.03509))
25/43
max(0.01,0.475862(0.760563+1.27632))
z
1
z
2
1
F
2
1
max(0.01,0.475862(0.760563+1.27632))
z
1
z
2
2,1
G
2,3
z
1
z
2
-3.11236,2.00146 |
3.01613,3.03478,3.20238 |
Fitted Meijer - G Model
Fitted Meijer - G Model
The algorithm returns a sum of two fitted functions, one hypergeometric and one Meijer-G.
First, we see that the expression indeed corresponds to two terms, as before, but it is rather hard to acquire the exact simpler closed form of the functions, due to the the two arguments being entangled withing both terms. However, checking the accuracy of the fitted model we take the following result.
First, we see that the expression indeed corresponds to two terms, as before, but it is rather hard to acquire the exact simpler closed form of the functions, due to the the two arguments being entangled withing both terms. However, checking the accuracy of the fitted model we take the following result.
Plot3D[{+Exp[],Total[curModelTerms1]},{,0,1},{,0,1}]
z
1
z
2
z
1
z
2
Out[]=
Examining the model’s capacity to extrapolate the data, if we plot the initial function against the model for the extended square we get:
2
[0,2]
In[]:=
Plot3D[{+Exp[],Total[curModelTerms1]},{,0,2},{,0,2}]
z
1
z
2
z
1
z
2
Out[]=
Discussion
Discussion
The first thing we notice in the first (interpolation) image is the excellent fit, heralded by the excellent loss being reported by the algorithm during the fitting process. We also see the appearance of a singularity at the origin, that however does not affect the general quality of the model. In the second picture, we see an excellent extrapolation of the data in the direction of as well as pretty good extrapolation results towards , loosing track of the actual function only after about a range equal to 60%-70% the data’s original support has been traversed.
One may wonder why we have an negative relative improvement in the final step. This happens due to the way we implement the backfit portion of the algorithm. In particular, when we re-fit a previous term, we leave the search to start the optimization from any point in the search space it wishes. This creates an opening for finding a better approximation but also makes the algorithm forget the previous parameter values. So there is the case that the new search gets trapped in a worse minimum than before, making degrading end result and giving a negative improvement coefficient. The solution we have opted for now is to completely discard the final step of the optimization, should something like this occur. However, this is a rather blunt solution and there are many thing that can be done to improve the backfit algorithm.
z
1
z
2
One may wonder why we have an negative relative improvement in the final step. This happens due to the way we implement the backfit portion of the algorithm. In particular, when we re-fit a previous term, we leave the search to start the optimization from any point in the search space it wishes. This creates an opening for finding a better approximation but also makes the algorithm forget the previous parameter values. So there is the case that the new search gets trapped in a worse minimum than before, making degrading end result and giving a negative improvement coefficient. The solution we have opted for now is to completely discard the final step of the optimization, should something like this occur. However, this is a rather blunt solution and there are many thing that can be done to improve the backfit algorithm.
What about Sin[z]?
What about Sin[z]?
The last experiment we did was to ask whether the plasticity of the linear combination of Meijer-Gs was such that, although the trigonometric functions were not properly covered by this version of the algorithm, it would still allow for an acceptable approximation. The first encouraging results come from trying to fit in the interval [0, 1].
Sin(z)
Regression in the interval [0, 1]
Regression in the interval [0, 1]
fun[z1_]:=Sin[z1];dim1=1;n1=30;nt1=10;xTrain1=RandomReal[{0,1},{n1,dim1}];yTrain1=fun@@@xTrain1;
LearnSymbolicModel[xTrain1,yTrain1,{1,2,3}]
Fitting first term
»
1
»
{1,0.000102044,{w1.46034,1.,1.27114,0.559577}}
v
1
b
1
b
2
»
2
»
{2,0.306447,{w1.,1.,1.,1.,1.,0.999998}}
v
1
a
1
a
2
a
3
b
1
»
3
»
{3,0.0000138077,{w1.11489,1.,1.36281,0.962791,1.05786,1.05786,0.800286}}
v
1
a
1
a
2
b
1
b
2
b
3
Selected: 3
Improvement: 99.3283%
Fitting new term.
»
1
»
{1,0.0000136632,{w-0.00476392,1.,1.18769,1.49816}}
v
1
b
1
b
2
»
2
»
{2,0.0000117181,{w1.,1.,1.,1.,1.,1.}}
v
1
a
1
a
2
a
3
b
1
»
3
»
{3,0.0000135874,{w-0.00979396,1.,0.75987,0.914937,1.12335,1.12335,1.62448}}
v
1
a
1
a
2
b
1
b
2
b
3
Selected: 2
Backfitting term: 1
»
3
»
{3,0.000013824,{w1.11946,1.,1.36164,0.972018,1.06369,1.06369,0.803548}}
v
1
a
1
a
2
b
1
b
2
b
3
Improvement: -0.00291277%
Dropping last iteration.
Out[]=
{1.11494MeijerG[{{1.36275},{0.962733}},{{1.05785,1.05785},{0.800384}},Max[0.01,]]}
z
1
Fitted Meijer - G Model
Fitted Meijer - G Model
The algorithm returns a fitted Meijer-G of the first family, staying in deed close to the family that the trigonometric functions do reside.
In[]:=
(model=1.1149425287356323`MeijerG[{{1.3627450980392157`},{0.9627329192546584`}},{{1.0578512396694215`,1.0578512396694215`},{0.800383877159309`}},Max[0.01`,]])//TraditionalForm
z
1
Out[]//TraditionalForm=
1.11494max(0.01,)
2,1
G
2,3
z
1
1.36275,0.962733 |
1.05785,1.05785,0.800384 |
This form can be simplified to a hypergeometric function, if we Rationalize[] the terms themselves. However, what is interesting is to see the interpolation and extrapolation capabilities of the returned model.
In[]:=
GridBox[{{Plot[{Sin[],Total[model]},{,0,1}],Plot[{Sin[],Total[model]},{,0,3}]}}]//DisplayForm
z
1
z
1
z
1
z
1
Out[]//DisplayForm=
The picture on the left, illustrates an excellent interpolation capability of the method, while the picture to the right shows that the model stops tracking the original function as soon as we step outside if the initial support of the dataset as soon as the “trend” of the sinus function changes.
Regression in the interval [0, 10]
Regression in the interval [0, 10]
In order to give a chance to the Meijer-G fitting algorithm to be exposed to the whole periodic nature of the sinus, we repeat the process for data generated over a larger interval.
fun[z1_]:=Sin[z1];dim1=1;n1=30;nt1=10;xTrain1=RandomReal[{0,10},{n1,dim1}];yTrain1=fun@@@xTrain1;
LearnSymbolicModel[xTrain1,yTrain1]
Fitting first term
»
1
»
{1,0.351128,{w1.07434,1.,0.178834,-0.37939}}
v
1
b
1
b
2
»
2
»
{2,0.357327,{w1.2156,1.,-1.15961,0.0505375,0.0505375,-2.68056}}
v
1
a
1
a
2
a
3
b
1
»
3
»
{3,0.0605652,{w0.000174541,1.,-5.32891,4.92828,4.36817,5.25521,-0.928072}}
v
1
a
1
a
2
b
1
b
2
b
3
»
4
»
{4,0.460483,{w0.795002,1.,0.489244,0.489244,0.84975,1.15025,1.15025,1.51076}}
v
1
a
1
a
2
a
3
b
1
b
2
b
3
»
5
»
{5,0.44083,{w-0.164703,1.,1.54853,0.155043,0.155043,0.185529}}
v
1
a
1
b
1
b
2
b
3
Selected: 3
Improvement: 63.4901%
Fitting new term.
»
1
»
{1,0.0604736,{w-0.0219147,1.,0.31282,0.365407}}
v
1
b
1
b
2
»
2
»
{2,0.059124,{w0.999999,1.,1.00235,1.,1.,0.997655}}
v
1
a
1
a
2
a
3
b
1
»
3
»
{3,0.0462061,{w0.971952,1.,0.981344,1.31688,0.797501,0.797501,0.981312}}
v
1
a
1
a
2
b
1
b
2
b
3
»
4
»
{4,0.0767087,{w0.689922,1.,0.239181,0.239181,0.763839,1.23616,1.23616,1.76082}}
v
1
a
1
a
2
a
3
b
1
b
2
b
3
»
5
»
{5,0.0605627,{w-27.0869,1.,8.57902,1.59185,1.59185,-10.7918}}
v
1
a
1
b
1
b
2
b
3
Selected: 3
Backfitting term: 1
»
3
»
{3,0.187965,{w1.43205,1.,-0.630958,-0.397494,0.89601,0.911194,0.787159}}
v
1
a
1
a
2
b
1
b
2
b
3
Improvement: -76.1699%
Dropping last iteration.
Out[]=
{0.000174551MeijerG[{{-5.32895},{4.92821}},{{4.3681,5.25517},{-0.928}},Max[0.01,]]}
z
1
Fitted Meijer - G Model
Fitted Meijer - G Model
Now the algorithm take us the the third family, that typically covers the exponential and the gamma function. The model itself comprises of a single Meijer-G function.
In[]:=
(model=0.00017455053237912376`MeijerG[{{-5.328947368421052`},{4.9282051282051285`}},{{4.368098159509202`,5.255172413793104`},{-0.928`}},Max[0.01`,]])//TraditionalForm
z
1
Out[]//TraditionalForm=
0.000174551max(0.01,)
2,1
G
2,3
z
1
-5.32895,4.92821 |
4.3681,5.25517,-0.928 |
The coefficient themselves show that it took a lot “squeezing around” this function to make it come close to the sinus. Plotting the function as before, however, will illustrate the quality of the result.
In[]:=
GridBox[{{Plot[{Sin[],Total[model]},{,0,10}],Plot[{Sin[],Total[model]},{,0,20}]}}]//DisplayForm
z
1
z
1
z
1
z
1
Out[]//DisplayForm=
What we see here is that within the support of the dataset, the model manages to follow somewhat the behavior of the sinus function. However, as soon as it leaves the support it diminishes and goes to zero. In other words, both the interpolation as well as the extrapolation that are performed by this model are rather poor. These results illustrate the importance of extending the implementation as to account for the trigonometric and other related functions.
Concluding remarks
Concluding remarks
In this project we studied the utilization of the Meijer-G function as a symbolic model and its capability to learn and perform symbolic regression on a given dataset. We implemented two main functions, along with three helper functions. The first function fits a single Meijer-G to the given data and the second one iterated the first in a controlled manner in order to build a symbolic model comprise of a linear combination of Meijer-Gs. The method shows great potential for further development both towards modeling and understanding the generating process behind the data as well as towards discovering simpler closed form expressions that may guide this process.
The immediate future direction for this method is to extend the basic model in order to cover the whole functional range presented in the the basic theorem. Wolfram Language provides the following generalized form for the Meijer-G function which can be readily applied to implement this extension:
The immediate future direction for this method is to extend the basic model in order to cover the whole functional range presented in the the basic theorem. Wolfram Language provides the following generalized form for the Meijer-G function which can be readily applied to implement this extension:
mn
G
pq
a 1 a n a n+1 a p |
b 1 b m b m+1 b q |
r
2πi
Γ(1-- rs)…Γ(1-- rs)Γ(+ rs)…Γ(+rs)
a
1
a
n
b
1
b
m
Γ(+rs)…Γ(+rs)Γ(1--r s)…Γ1--rs
a
n+1
a
p
b
m+1
b
q
-s
z
Furthermore, a parallel step that can be taken is the discovery of various heuristics that could make the optimization process run faster. The fact that local optimization and gradient descent methods yield such good results hold the promise of great potential gains with respect to optimization speed. Studying and further understanding the search space is expected to give important insights towards this direction.
Extending this method so that it robustly works outside the [0, 1] interval and handling the potential singularities is also of great importance, as it will allow it to capture long-range behaviours that are not expressed within the small confines of this integral and without having to perform normalization on the original data.
We saw how Rationalize was able to remove noise and let a Meijer-G function fall to a simpler for, revealing in essence the function that created the data. Having the ability to discover the law behind the data is one of the most appealing characteristics of using the Meijer-G function and an further exploration on discovering simpler closed forms that are “close” to the fitted parameters beyond Rationalize is a key future target.
Another important issue, relevant to the previous one, is the robustness of the method and its ability to yield useful results in the presence of noise. How noise affect its performance, what levels of noise can the method handle and it potential to perform denoising in a symbolic manner are all imperative questions.
During our experiments, we have encountered numerous cases where different families yield comparable results. This shows that there is an important overlap in the expressibility of each family. Studying this overlap and deciding which family is the more appropriate to choose for each case, so that the method as a whole may yield better results is also an important research venue.
The authors of the referenced paper proposed this method as a way to provide explainability of a black box process, such as a neural network. Although, we have not studied this aspect here, it is one of the most powerful characteristics of this approach to modeling. Being able to capture a wide range of phenomena, in a continuous manner and by a model containing only a few parameters hold the promise of potentially very important insights. In the referenced paper, Taylor expansion is proposed as a means to understand the non-linear contributions of factors to the end result, generalizing methods such as LIME that are able to talk about only linear dependencies. However, the Meijer-G function accepts a wide range of transformations such as Laplace and Fourier. This fact, along with it great ability to extrapolate from the given data may lead to further methods and even more advanced ways to provide explanations about the behaviour of a black-box process.
Extending this method so that it robustly works outside the [0, 1] interval and handling the potential singularities is also of great importance, as it will allow it to capture long-range behaviours that are not expressed within the small confines of this integral and without having to perform normalization on the original data.
We saw how Rationalize was able to remove noise and let a Meijer-G function fall to a simpler for, revealing in essence the function that created the data. Having the ability to discover the law behind the data is one of the most appealing characteristics of using the Meijer-G function and an further exploration on discovering simpler closed forms that are “close” to the fitted parameters beyond Rationalize is a key future target.
Another important issue, relevant to the previous one, is the robustness of the method and its ability to yield useful results in the presence of noise. How noise affect its performance, what levels of noise can the method handle and it potential to perform denoising in a symbolic manner are all imperative questions.
During our experiments, we have encountered numerous cases where different families yield comparable results. This shows that there is an important overlap in the expressibility of each family. Studying this overlap and deciding which family is the more appropriate to choose for each case, so that the method as a whole may yield better results is also an important research venue.
The authors of the referenced paper proposed this method as a way to provide explainability of a black box process, such as a neural network. Although, we have not studied this aspect here, it is one of the most powerful characteristics of this approach to modeling. Being able to capture a wide range of phenomena, in a continuous manner and by a model containing only a few parameters hold the promise of potentially very important insights. In the referenced paper, Taylor expansion is proposed as a means to understand the non-linear contributions of factors to the end result, generalizing methods such as LIME that are able to talk about only linear dependencies. However, the Meijer-G function accepts a wide range of transformations such as Laplace and Fourier. This fact, along with it great ability to extrapolate from the given data may lead to further methods and even more advanced ways to provide explanations about the behaviour of a black-box process.
Keywords
Keywords
◼
Symbolic Regression
◼
Machine Learning
◼
Meijer-G
◼
Symbolic Pursuit
◼
Explainability
Acknowledgment
Acknowledgment
The author would like to thank Stephen Wolfram, for his insightful input to this project and the illuminating discussions that the author had the joy of participating. Furthermore, the author would like to express his graditude towards his mentor Tuseeta Banerjee that supported and encouraged him throughout the Summer School, always providing thoughful guidance and making sure the project’s goals would be realistic and compatible with the given timeframe. Also, Tigran Ishkhanyan was instrumental in bringing this work to fruition, lending his experience and discussing various important aspects with the author, assisting him to further realize the potential of this work and helping him along with Stephen and Tuseeta in selecting and prioritizing the various tasks among a wide range of options.
Finally, the author would like to express his deep appreciation and respect towards all the people behind the organization of Wolfram Summer School 2021: Mads Bahrami, Erin Cherry, Danielle Roman Mayer, the TAs Jesse Friedman and Daniel Sanchez, the multiple contributors to the online discussions and his dear classmates. Your support, enthusiasm and approachability created a warm and inviting atmosphere that lead to maximum engagement with the school and provided the additional support needed to timely finish this work.
Finally, the author would like to express his deep appreciation and respect towards all the people behind the organization of Wolfram Summer School 2021: Mads Bahrami, Erin Cherry, Danielle Roman Mayer, the TAs Jesse Friedman and Daniel Sanchez, the multiple contributors to the online discussions and his dear classmates. Your support, enthusiasm and approachability created a warm and inviting atmosphere that lead to maximum engagement with the school and provided the additional support needed to timely finish this work.
References
References
Appendix: The full code
Appendix: The full code
Hyperparameter Sets, Custom Meijer-G, Parameter Generation, Loss Function
Hyperparameter Sets, Custom Meijer-G, Parameter Generation, Loss Function
In[]:=
(*HyperparametersSets*)H=Association@@MapThread[Rule,{{"m","n","p","q"},#}]&/@{{1,0,0,2},{0,1,3,1},{2,1,2,3},{2,2,3,3},{2,0,1,3}};(*Meijer-Gfunctionconstrainedto5hyperparametersets.*)meijerG[f_]:=(MeijerG[{Table[,{i,H[[f,"n"]]}],Table[,{i,H[[f,"n"]]+1,H[[f,"p"]]}]},{Table[,{i,H[[f,"m"]]}],Table[,{i,H[[f,"m"]]+1,H[[f,"q"]]}]}//N,#]&);(*Parameters*)parameters[f_,dim_]:=Flatten@{w,Table[,{i,dim}],Table[,{i,H[[f,"n"]]}],Table[,{i,H[[f,"n"]]+1,H[[f,"p"]]}],Table[,{i,H[[f,"m"]]}],Table[,{i,H[[f,"m"]]+1,H[[f,"q"]]}]}(*Lossfunctionforaparticularhyperparameterset.*)lossF[xTrain_List,yTrain_List,f_Integer]:=Module{dim,n,vVec,zVec,affine},dim=Length[xTrain[[1]]];n=Length[xTrain];(*AffineTransformCoefficients*)vVec=Table[,{i,dim}];(*ExplanatoryVariables*)zVec=Table[,{i,dim}];(*AffineTransform.*)affine=Max0.01,;(*Returnthesymbolicloss.Thesymbolicvariablesaretheparameterstooptimize*)Mean@Table[,{i,1,n}]
a
i
a
i
b
i
b
i
v
i
a
i
a
i
b
i
b
i
v
i
z
i
vVec.zVec
Norm[vVec]Sqrt[dim]
2
(yTrain[[i]]-wmeijerG[f][affine]/.MapThread[Rule,{zVec,xTrain[[i]]}])
The single term fit function
The single term fit function
This function returns the kind, the error and the calculated coefficients of the fitted Meijer-G function. Input: xTrain: the input to the black-box function yTrain: the output of the black-box function fCheck: List of hyperparameters to check (default: check all) opt: Kind of optimization. “Locally”: Perform a local optimization “Globally: Perform a global optimization “Both”: Perform both optimizations AG: AccuracyGoal, default setting 3 PG: PrecisionGoal, default setting 3 Output: {f, error, {coeff}} f: The hyperparameter set selected error: The error of the approximation coeff: The coefficients of the fitted Meijer-G function
In[]:=
FitMeijerG[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Both",AG_:3,PG_:3]:=Module[{dim,n,optimizeLocally,optimizeGlobally,optimizeList,weightedLoss,minPos},(*Datadimensions*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*Checkwhetherthealocalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sFindMinimumfunctionthatperformsalocalsearch.*)If[opt=="Both"||opt=="Locally",optimizeLocally=Table[Echo[{f}~Join~FindMinimum[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->AG,PrecisionGoal->PG]],{f,fCheck}],optimizeLocally={}];(*Checkwhethertheaglobalsearchshouldtakeplace.ThesymboliclossfunctionsreturnedfromlossFisgiventotheMathematica'sNMinimizefunctionthatperformsaglobalsearch.*)If[opt=="Both"||opt=="Globally",optimizeGlobally=Table[Echo[{f}~Join~NMinimize[lossF[xTrain,yTrain,Echo@f],parameters[f,dim],AccuracyGoal->10,PrecisionGoal->10]],{f,fCheck}],optimizeGlobally={}];(*Combinethepreviousresultinasinglelist.*)optimizeList=Join[optimizeLocally,optimizeGlobally];(*Ifsomewistoobig,weightthelossesappropriately.*)If[Norm[optimizeList[[All,3,1]][[All,2]]]>100,weightedLoss=optimizeList[[All,2]]*optimizeList[[All,3,1]][[All,2]];(*Ifsomeweightswarebig,weightthelossvaluesbythecorrespondingweightsandselectthebestresult.*)minPos=Flatten@Position[weightedLoss,Min[weightedLoss]];,(*Ifweightwisnotbig,returntheresultwiththeminimumlossvalue.*)minPos=Flatten@Position[optimizeList[[All,2]],Min[optimizeList[[All,2]]]];];Return[Flatten[optimizeList[[minPos]],1]];]
Create the multiple term fit function (with backfit)
Create the multiple term fit function (with backfit)
This function fits a superposition of Meijer-G functions to the given data and returns a list with the fitted terms. Input: xTrain: the input to the black-box function yTrain: the output of the black-box function fCheck: List of hyperparameters to check (default: check all) opt: Kind of optimization. “Locally”: Perform a local optimization “Globally: Perform a global optimization “Both”: Perform both optimizations AG: AccuracyGoal, default setting 3 PG: PrecisionGoal, default setting 3 Output: A list of the fitted Meijer-G function. Totaling the list, gives the fitted model.
In[]:=
LearnSymbolicModel[xTrain_List,yTrain_List,fCheck_:{1,2,3,4,5},opt_:"Locally",AG_:3,PG_:3]:=Module{fSet,curResidual,preResidual,curModelTerms,fitResult,dim,n,vVec,zVec,affine,impPer,curModelTermsBackup,curResidualBack,j},(*Keeparecordofthefamiliesoftermspresentinthemodel.*)fSet={};(*Tworesidualtermstokeeptrackoftheimprovement.*)curResidual=yTrain;preResidual={};(*Alistofthetermscomprisingourmodel.*)curModelTerms={};Print["Fitting first term"];(*InvokeFitMeijerGtofitthefirsttermofthemodel.*)fitResult=FitMeijerG[xTrain,yTrain,fCheck,opt];Print["Selected: "<>ToString[First@fitResult]];(*Dimentionsoftheexplanatorydata*)dim=Length[xTrain[[1]]];(*Numberofsamples*)n=Length[xTrain];(*AffineTransformationCoefficients*)vVec=Table[,{i,dim}];(*ExplanatoryVariables*)zVec=Table[,{i,dim}];(*AffineTransformation*)affine=Max0.01,;(*Addnewmodeltothelist*)curModelTerms=curModelTerms~Join~{wmeijerG[First@fitResult][affine]/.Last[fitResult]};(*Avoidarithmeticinconsistencies*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Catalogthehyperparametersused*)fSet=fSet~Join~{First@fitResult};(*UpdateResiduals*)preResidual=curResidual;curResidual=curResidual-Table[Total[curModelTerms]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Calculaterelativeimprovement.*)impPer=(Norm[preResidual]-Norm[curResidual])Norm[preResidual];Print["Improvement: "<>ToString@(impPer*100)<>"%"];(*Proceduarallyrepeatuntiltherelativeerrorstopsimprovingatleastby10%*)While[impPer>0.01,(*Backupcurrentmodelterms*)curModelTermsBackup=curModelTerms;(*Fitanewterm*)Print["Fitting new term."];fitResult=FitMeijerG[xTrain,curResidual,fCheck,opt];Print["Selected: "<>ToString@First@fitResult];(*Addthenewtermtothemodellist*)curModelTerms=curModelTerms~Join~{wmeijerG[First@fitResult][affine]/.Last[fitResult]};(*Avoidarithmeticinconsistencies.*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Catalogthehyperparameters.*)fSet=fSet~Join~{First@fitResult};(*Updatetheresiduals*)preResidual=curResidual;curResidual=curResidual-Table[Last[curModelTerms]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Backfitthepreviousterms*)For[j=1,j<Length[curModelTerms],j++,(*Excludej-thtermfromresidual*)curResidualBack=curResidual+Table[curModelTerms[[j]]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];(*Backfitthej-thterm,keepingitsfamily*)Print["Backfitting term: "<>ToString@j];fitResult=FitMeijerG[xTrain,curResidualBack,{fSet[[j]]},opt];(*Updatethetermlist*)curModelTerms[[j]]=wmeijerG[First@fitResult][affine]/.Last[fitResult];(*Aoidarithmeticinconsistencies*)curModelTerms=Rationalize[curModelTerms,0.0001]//N;(*Reconstructtheupdatedresidual*)curResidual=curResidualBack-Table[curModelTerms[[j]]/.MapThread[Rule,{zVec,xTrain[[i]]}],{i,n}];];(*Checktherelativeimprovementfromaddinganewtermandbackfittingthepreviousones.*)impPer=(Norm[preResidual]-Norm[curResidual])/Norm[preResidual];Print["Improvement: "<>ToString@(impPer*100)<>"%"];(*Iftherelativeimprovementisnegative,completelydiscardthelaststep.*)If[impPer<0,Print["Dropping last iteration."];curModelTerms=curModelTermsBackup];];Return[curModelTerms]
v
i
z
i
vVec.zVec
Norm[vVec]Sqrt[dim]


Cite this as: Sotiris Michos, "Meijer-G Learning" from the Notebook Archive (2021), https://notebookarchive.org/2021-07-6fkrg5n

Download

