Asymptotics of the Volume of Manifolds and Graphs
Author
Samuele Mongodi
Title
Asymptotics of the Volume of Manifolds and Graphs
Description
In order to understand what can be inferred from a discrete sample of points about the geometry of the manifold, we will simulate a random process that generates points according to the volume density of the manifold in a given parameter space.
Category
Essays, Posts & Presentations
Keywords
graphs, manifolds, stochastic point processes
URL
http://www.notebookarchive.org/2021-07-62urncv/
DOI
https://notebookarchive.org/2021-07-62urncv
Date Added
2021-07-13
Date Last Modified
2021-07-13
File Size
34. megabytes
Supplements
Rights
Redistribution rights reserved
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
data:image/s3,"s3://crabby-images/97e21/97e21d941045101921bcfd57c45c820c8eed2b93" alt=""
WOLFRAM SUMMER SCHOOL 2021
Asymptotics of the Volume of Manifolds and Graphs
Asymptotics of the Volume of Manifolds and Graphs
SamueleMongodi
Politecnico di Milano
In a computational setting, we are rarely so lucky to have an explicit equation or a nice parametrization as in the usual examples of manifold . At times, of a given manifold, we only know a bunch of points, maybe as many as we want, but finitely many.
In order to understand what can be inferred from a discrete sample of points about the geometry of the manifold, we will simulate a random process that generates points according to the volume density of the manifold in a given parameter space. We will then try to give to these points a graph structure, based either on the local distances between them or on the local expected connectivity of a graph lying on a k-dimensional manifold (in terms of the average degree of a vertex); in this way, we have a notion of distance between such points which is not the Euclidean one and hopefully is linked to the geometry of the manifold.
However, our attempts at computing the limiting behavior of the volume, in the small and in the large, will show that the estimator for curvature is too sensitive to changes in the parameters that we use to construct the graph and, even if it can be calculated, it has close to no relationship with the geometry of the underlying manifold.
In order to understand what can be inferred from a discrete sample of points about the geometry of the manifold, we will simulate a random process that generates points according to the volume density of the manifold in a given parameter space. We will then try to give to these points a graph structure, based either on the local distances between them or on the local expected connectivity of a graph lying on a k-dimensional manifold (in terms of the average degree of a vertex); in this way, we have a notion of distance between such points which is not the Euclidean one and hopefully is linked to the geometry of the manifold.
However, our attempts at computing the limiting behavior of the volume, in the small and in the large, will show that the estimator for curvature is too sensitive to changes in the parameters that we use to construct the graph and, even if it can be calculated, it has close to no relationship with the geometry of the underlying manifold.
Poisson Point Processes
The main tool we will be using to generate points that are uniformly distributed on our manifold is the inhomogeneous Poisson point process, which produces independent draws from a given distribution.
We set up the parameter space dimension (n), the embedding space dimension (k), the parametrization function (u) and the parameter region (ℛ) given by our second example from before (we just exchange z and y for convenience) and we proceed to compute a bunch of auxiliary functions and quantities, like the area or ℛ or the area of the piece of manifold parametrized by u over ℛ, which we will call M:
We set up the parameter space dimension (n), the embedding space dimension (k), the parametrization function (u) and the parameter region (ℛ) given by our second example from before (we just exchange z and y for convenience) and we proceed to compute a bunch of auxiliary functions and quantities, like the area or ℛ or the area of the piece of manifold parametrized by u over ℛ, which we will call M:
In[]:=
k=3;n=2;ℛ=Rectangle[{-1,-1},{1,1}];uservars={x,y};(*variablesusedtodescribeuandanysubregionofℛasanImplicitRegion*)u[x_,y_]:={x,y,Sqrt[1-x^2+y^2]};
We create a compiled version of u, compute the volume density of u on ℛ, create a "callable" version of such volume density to be used in our inhomogeneous Poisson point process later on:
In[]:=
variables=Table[Unique["x"],{n}];cu=Compile@@{{#,_Real}&/@#,u@@#,RuntimeOptions"Speed",RuntimeAttributes{Listable},ParallelizationTrue,CompilationOptions{"InlineExternalDefinitions"True}}&[variables];descrℛ=(#[[1]]/.Thread[#[[2]]variables])&[RegionConvert[ℛ,"Implicit"]];canonicalbasis=Table[Table[Boole[i==j],{i,1,n}],{j,1,n}];volmeasureM[vars_List]:=FullSimplify[Norm[Minors[Function[p,(Derivative@@p)[u]@@vars]/@canonicalbasis,n]],vars∈ℛ]callvolmeasureM=Function[#,Evaluate@volmeasureM[#]]&[variables];cutvolmeasureM=Piecewise[{{volmeasureM[{#1,#2}],descrℛ/.{variables[[1]]->#1,variables[[2]]->#2}}}]&;
Here, we use the volume density function in the most basic way: its integral over ℛ gives the area of the piece of manifold M (hyperboloid, in our case) which is obtained by computing u on the points of ℛ. By comparison, the area of ℛ is (obviously) just 4:
In[]:=
volM=NIntegrate[callvolmeasureM[Sequence@@variables],variables∈ℛ]
Out[]=
5.55794
In[]:=
volR=RegionMeasure[ℛ,n]
Out[]=
4
So, the function volmeasureM (or its cognate callvolmeasureM) tells us which parts of ℛ contribute more to the area of M and which parts contribute less: plotting M over ℛ makes it clear that equal parts of ℛ may correspond to pieces of M with different extension:
Show[ParametricPlot3D[u[x,y],{x,y}∈ℛ,AxesFalse,BoxedFalse,BoxRatios{1,1,1},ViewPoint{1,-1,.5},Mesh5],ParametricPlot3D[{x,y,0},{x,-1,1},{y,-1,1},PlotTheme"Marketing",PlotStyleOpacity[0.2]]]
Out[]=
We can now build our Poisson process; however, the number of points we will be getting out of a RandomPointConfiguration call with such a PointProcess will depend by the MeanPointDensity of our point process; in order to be able to draw as many points as we want, we will just have to multiply our function by a suitable correction constant:
In[]:=
myPoissonPointProcess=InhomogeneousPoissonPointProcess[callvolmeasureM,n];correction=1./(volRMeanPointDensity[{myPoissonPointProcess,ℛ}])
Out[]=
0.1844
Random Points Generator
Random Points Generator
We are now ready to define our function to generate random points on ℛ according to the volume density of M. This functions accepts in input the desired number of points and, optionally, the number of configurations of points with such cardinality. Given the random nature of such a process and the mean involved in the definition of the correction from before, the function will return a number of points which is slightly less than requested:
In[]:=
genRand[numpts_Integer/;numpts>0,nconf_:1]:=Module[{nvM,tPPP},nvM=Function[#,Evaluate[numpts*correction*volmeasureM[#]]]&[variables];tPPP=InhomogeneousPoissonPointProcess[nvM,n];If[nconf==1,Return[RandomPointConfiguration[tPPP,ℛ]["Points"]],Return[RandomPointConfiguration[tPPP,ℛ,nconf]["PointsList"]]]]
As an example, we produce from a thousand to a million points and use Histogram3D to see if their distribution actually approximates the volume distribution function, which we will plot in the end:
In[]:=
GraphicsColumn[Histogram3D[genRand[10^#]]&/@Range[3,6],ImageSize->400]
Out[]=
Plot3D[callvolmeasureM[x,y],{x,y}∈ℛ]
Out[]=
As we can see, when the number of points (1K on the first line of the table, 1M on the last) increases, the approximation of the density is more accurate; this is also reflected in the uniform distribution of the images of such points via u on M. To see this, we compute again random points as before and then we apply to them the function u (or, better, it's compiled version). We go up only to 100K as ListPointPlot3D does not handle well bigger lists; as a comparison, we plot below the manifold M (image of the function u on ℛ).
In[]:=
GraphicsColumn[ListPointPlot3D[cu@@@genRand[10^#],AspectRatio1.2]&/@Range[3,5],ImageSize->400]
Out[]=
ParametricPlot3D[u[x,y],{x,y}∈ℛ,PlotTheme"Business"]
Out[]=
One can easily see how the distribution is uniform on M, but it is not when projected on ℛ, where, as we know, the point distribute following the law given by the volume density function, which is bigger at some points and smaller at others. This is particularly clear from the first line of plots, with fewer points
Volume Approximation
Volume Approximation
Caution: we will be indistinctly taking about volume and area, because our case-of-study manifold is a surface, but everything we say can be applied to higher dimensional manifolds (with all the obvious drawbacks in terms of execution times and memory usage).
A first test to understand in which sense these points approximate the volume density function is to compute the volume of a subregion in ℛ by counting how many points end up in it; the ratio with the total number of points should converge (as we increase the total number of points) to the ratio between the area of the piece of manifold over the subregion and the area of M:
We define three test subregions in ℛ and group them in the family . Next, we plot them in white over orange:
={ImplicitRegion[x^2+y^2<1/10,{x,y}],ImplicitRegion[Abs[x+y]<1/16&&Abs[x-y]<1/16,{x,y}],ImplicitRegion[Abs[y]<Abs[x]^(2/3)&&0<x&&x<1/7,{x,y}]};
In[]:=
GraphicsRow[Region[RegionDifference[ℛ,#],PlotTheme"Scientific"]&/@]
Out[]=
By integrating the volume density function over these three regions, we obtain the ratio between the volumes of the pieces of manifold parametrized by there three regions and the volume of M:
In[]:=
vol=Association[#->NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@];Values[vol]
Out[]=
{0.0579181,0.00140656,0.00852415}
By using the function genRand and a compiled ad hoc version of Select, we construct a function that generates several configuration of a given number of points and counts how many end up in a given region:
In[]:=
checkVol[_,npts_Integer/;npts>0,iter_Integer/;iter>0]:=Module[{descr,select,batches},descr=Function[a,Evaluate[[[1]]/.Thread[uservars->Table[Part[a,j],{j,1,n}]]]];select=Compile[{{lst,_Real,2}},Length[Select[lst,descr]]/Length[lst],RuntimeOptions->"Speed",Parallelization->True,CompilationOptions->{"InlineExternalDefinitions"->True},CompilationTarget"C",RuntimeAttributes{Listable}];batches=genRand[npts,iter];{Mean[#],#}&[select[batches]]]
We use the function checkVol to test the goodness of the approximations given by averaging on several configurations of points, for m=3,4,5,6,7:
m
10
In[]:=
Dataset@Association[Region[#,ImageSize->Tiny,BaselinePositionTop]-><|Function[j,j->Association[Function[k,StringJoin[{ToString[k]," iter"}]->Quiet@checkVol[#,10^j,k][[1]]]/@Range[20,60,20]]]/@Range[3,5]|>&/@]
Out[]=
We see that the approximations are converging slowly (it is important here to look not only at the “correct” digits but also at the error: if the 3rd digit is correct but then we have an error of , then we can actually improve the error by having the 3rd digit wrong, but staying closer to the right result.
9
-4
x10
Metropolis algorithm
Metropolis algorithm
We define a function Metropolis as an input of Random`DistributionVector, to implement the Metropolis algorithm inside RandomVariate, in order to produce samples from a target probability density function (pdf):
1
.starting from a point u0,
2
.we “propose” a jump du, which is a random vector drawn from a normal distribution of mean 0 and variance s (=1 by default),
3
.we compute our target probability density function (pdf) at u0 and at u0+du,
4
. we generate a random real number r between 0 and 1, with uniform probability, and we compute pdf(u0+du)/pdf(u0)
5
.if this last number is bigger than r, we accept the jump and our next point is u0+du, otherwise our next point is still u0.
In this way, we obtain a Markov chain whose probability law asymptotically tends to the target pdf; therefore, we run the Markov chain for n+n0 steps and we ignore the first n0, obtaining n points sampled from the target pdf. We run a number (num=200 by default) of chains in parallel.
The problem of this algorithm is that we often obtain repeated points:
The problem of this algorithm is that we often obtain repeated points:
In[]:=
Metropolis/:Random`DistributionVector[Metropolis[pdf_,u0_,s_:1,n0_:100,chains_:200],n_Integer,prec_?Positive]:=Module[{u,du,p,p1,accept,cpdf},cpdf=Compile@@{{#,_Real}&/@#,pdf@@#,RuntimeAttributes->{Listable},RuntimeOptions->"Speed",Parallelization->True}&[Unique["x",Temporary]&/@u0];u=ConstantArray[u0,chains];p=cpdf@@Transpose[u];(Join@@Table[du=RandomVariate[NormalDistribution[0,s],{chains,Length[u0]}];p1=cpdf@@Transpose[u+du];accept=UnitStep[p1/p-RandomReal[{0,1},chains]];p+=(p1-p)accept;u+=duaccept,{Ceiling[(n0+n)/chains]}])[[n0+1;;n0+n]]];
In[]:=
genRandMet[scale_,numrand_]:=ParallelTable[RandomVariate[Metropolis[cutvolmeasureM,{0.,0.}],scale],{numrand}]
The problem with this algorithm is that, even if the running time is comparable or even better than the RandomPointConfiguration with a inhomogeneous Poisson Point Process, there are repeated points, which will prevent us to use it to construct graphs. Removing the duplicates will noticeably lower the precision of the approximation:
We use the same test as before, on the regions in , to see how good is the approximation given; in order to do so, we define another version of our checkVol function:
In[]:=
={ImplicitRegion[x^2+y^2<1/10,{x,y}],ImplicitRegion[Abs[x+y]<1/16&&Abs[x-y]<1/16,{x,y}],ImplicitRegion[Abs[y]<Abs[x]^(2/3)&&0<x&&x<1/7,{x,y}]};
In[]:=
vol=Association[#->NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@];Values[vol]
Out[]=
{0.0579181,0.00140656,0.00852415}
In[]:=
checkVolMet[_,npts_Integer/;npts>0,iter_Integer/;iter>0]:=Module[{descr,select,batches},descr=Function[a,Evaluate[[[1]]/.Thread[uservars->Table[Part[a,j],{j,1,n}]]]];select=Compile[{{lst,_Real,2}},Length[Select[lst,descr]]/Length[lst],RuntimeOptions->"Speed",Parallelization->True,CompilationOptions->{"InlineExternalDefinitions"->True},CompilationTarget"C",RuntimeAttributes{Listable}];batches=genRandMet[npts,iter];{Mean[#],#}&[select[batches]]]
In[]:=
Dataset@Association[Region[#,ImageSize->Tiny,BaselinePositionTop]-><|Function[j,j->Association[Function[k,StringJoin[{ToString[k]," iter"}]->Quiet@checkVolMet[#,10^j,k][[1]]]/@Range[20,60,20]]]/@Range[3,5]|>&/@]
Out[]=
The Metropolis algorithm needs more points to converge, but its noticeably faster than the RandomPointConfiguration; the only problem in generating point is the memory necessary to store them.
8
10
Connecting the Dots
In order to extract some more information from our random points, we try to give them a meaningful graph structure, by connecting with a(n unoriented) edge two points when they are, in some sense, near to each other; we can do that in two ways.
◼
ϵ-Neighbors Graph : (ϵ-graph) we connect two points if their distance is less than a given threshold ϵ.
◼
k-Nearest-Neighbors Graph : (knn-graph) we connect each point with the k nearest points among the others.
Both the previous graphs can be constructed using the function NearestNeighborGraph.
These two choices are, in a way, independent and opposite: the knn-graph will almost always produce something connected and with each vertex of degree exactly k, but it may happen that two fairly distant points are connected because one of them lies in a scarcely populated region.
On the other hand, in a ϵ-graph, all the connected points will always be close, under the ϵ threshold; however, this could result in a high average degree or, if this is kept reasonably low, in a disconnected graph. Moreover, the ϵ-graph is very sensitive with respect to small perturbations of the points, as the following examples show.
On the other hand, in a ϵ-graph, all the connected points will always be close, under the ϵ threshold; however, this could result in a high average degree or, if this is kept reasonably low, in a disconnected graph. Moreover, the ϵ-graph is very sensitive with respect to small perturbations of the points, as the following examples show.
The Perturbed Hexagonal and Square Lattices
The Perturbed Hexagonal and Square Lattices
The following example shows how "fragile" is the behavior of an ϵ-graph when the points are allowed to have "errors" or perturbations:
In[]:=
hexalattice=Flatten[Join[Table[{3n+1,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n-1,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n+1/2,(m+1/2)Sqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n-1/2,(m-1/2)Sqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n+3/2,(m+1/2)Sqrt[3]},{n,-3,3},{m,-3,3}]],1];
In[]:=
hexagraph=NearestNeighborGraph[hexalattice,{All,1}]
Out[]=
Already if we use machine precision coordinates, we will fall into rounding errors:
In[]:=
mphexagraph=NearestNeighborGraph[hexalattice//N,{All,1}]
Out[]=
And if we allow for some “noise”, the result is even worse:
In[]:=
perturbedhexagraph=NearestNeighborGraph[hexalattice+RandomReal[{-10^(-9),10^(-9)},{hexalattice//Length,2}]//N,{All,1}]
Out[]=
With a knn graph we would not have such a problem:
In[]:=
perturbedhexagraph2=NearestNeighborGraph[VertexList[perturbedhexagraph],6]
Out[]=
However, who told us that 6 is the right number? If we had come from a square grid, 6 would have produced some awkward results:
In[]:=
squaregrid=Flatten[Outer[N@List,Range[10],Range[10]],1];latticegraph=Labeled[NearestNeighborGraph[squaregrid,{All,1}],"perfect ϵ-graph"]perturbedlattice=Labeled[NearestNeighborGraph[squaregrid+RandomReal[{-10^(-9),10^(-9)},{Length[squaregrid],2}],{All,1}],"noisy ϵ-graph"]perturbedlattice2=Labeled[NearestNeighborGraph[VertexList[perturbedlattice[[1]]],6],"6nn-graph"]
Out[]=
Out[]=
Out[]=
From Parametrized Manifolds to Graphs
From Parametrized Manifolds to Graphs
In the Function Repository, one can find two functions to produce graphs from manifolds (ResourceFunction["ExtrinsicCurvedManifoldToGraph"] and ResourceFunction["IntrinsicCurvedManifoldToGraph"] ), which are however limited in use, either by some hypotheses on the nature of the density function we can use or for their prohibitive execution times on large collections of points.
From the documentation, we take one of the examples and increase the number of points:
In[]:=
timesICMTG=((ResourceFunction["IntrinsicCurvedManifoldToGraph"][x^2+2y^2+3x^3+4y^4+5x^5+6y^6,{x,-1,1},{y,-1,1},0.3,#];//AbsoluteTiming)&/@{200,600,1800,5400})
Out[]=
{{8.50489,Null},{24.6342,Null},{75.2945,Null},{230.097,Null}}
We propose another function, speeding up the computation in case one is interested only in the graph object per se and getting rid of the restriction on the density function. :
This function goes through the steps that we executed in the previous chapter of this notebook, to produce a set of random points which is then fed to the NearestNeighborGraph function:
In[]:=
parametricManifoldToGraph[uexpr_,reg_,uservars_,npts_,par_,OptionsPattern[{"Neighbors"->"Epsilon","Space"->"Parameters"}]]:=Module[{n,vars,u,canbasis,vmM,cvmM,myPPP,corr,nvM,tPPP,pts},n=Length[uservars];vars=Table[Unique["x"],{n}];u=Function[#,uexpr/.Thread[uservars->#]]&[vars];canbasis=Table[Table[Boole[i==j],{i,1,n}],{j,1,n}];vmM[varz_List]:=FullSimplify[Norm[Minors[Function[p,(Derivative@@p)[u]@@varz]/@canbasis,n]],varz∈ℛ];cvmM=Function[#,Evaluate@vmM[#]]&[vars];myPPP=InhomogeneousPoissonPointProcess[cvmM,n];corr=1./(RegionMeasure[reg,n]MeanPointDensity[{myPPP,reg}]);nvM=Function[#,Evaluate[npts*corr*cvmM@@#]]&[vars];tPPP=InhomogeneousPoissonPointProcess[nvM,n];If[OptionValue["Space"]=="Parameter",pts=RandomPointConfiguration[tPPP,reg]["Points"],pts=u@@@RandomPointConfiguration[tPPP,reg]["Points"]];If[OptionValue["Neighbors"]=="Epsilon",Return[NearestNeighborGraph[pts,{All,par}]],Return[NearestNeighborGraph[pts,par]]]]
It is important to notice that the function given as an argument to IntrinsicCurvedManifoldToGraph is already the probability density function from which we want to draw samples; here we pass as an argument the parametrization of the manifold, i.e. the function u, the region ℛ and the list of user variables (used to define u and ℛ). We will test our function on the same number of points, with the same epsilon parameter, on a different function:
In[]:=
timespMTG=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,0.3]//AbsoluteTiming)&/@{200,600,1800,5400,16200});
In[]:=
AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTG]//Dataset
Out[]=
To have a visual idea of what is happening, we plot the first three graphs:
In[]:=
GraphicsRow[GraphPlot@*Last/@timespMTG[[1;;3]],ImageSize->1000]
Out[]=
As a reference, the manifold we are passing to our new function is the following; we can see a resemblance with the first two graphs, which have a sufficiently low number of vertices to be visualized properly. As a side note, we remark that the image of u is not technically an embedded manifold, because it has a curve of singular points where all the “petals” meet:
In[]:=
ParametricPlot3D[{x^2+y,xy,x^4+y^6},{x,y}∈ℛ,Boxed->False,Axes->False,PlotTheme->"Business"]
Parameters vs Embedding - Epsilon vs KNN
Parameters vs Embedding - Epsilon vs KNN
In the parametricManifoldToGraph function, we included an option called “Space”; the default value is “Parameters”, as opposed to “Embedding”: our function u goes from ℛ to (in our examples, k=3), so we can decide if we want to measure distanced between points in ℛ or in . Both make sense and are locally equivalent. We present the results of the previous benchmark with the “Embedding” option:
k
k
In[]:=
timespMTGemb=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,0.3,"Space"->"Embedding"]//AbsoluteTiming)&/@{200,600,1800,5400,16200});
In[]:=
AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTGemb]//Dataset
Out[]=
|
We plot the first three graphs, which still have a manageable number of vertices and edges - sort of:
In[]:=
GraphicsRow[GraphPlot@*Last/@timespMTGemb[[1;;3]],ImageSize->1000]
Out[]=
The times are comparable, however the same value of ϵ gives slightly different structures:
In[]:=
AssociationThread[{"parameters space","embedding space"}->(AssociationThread[ToString/@{200,600,1800,5400,16200}->#]&/@{Mean@*VertexDegree/@timespMTG[[All,2]]//N,Mean@*VertexDegree/@timespMTGemb[[All,2]]//N})]//Dataset
Out[]=
|
Another small comparison we have to make is the use of the KNN method to build the graph; we follow the same procedure as before, tabulating the times and plotting the first three graphs.
In[]:=
timespMTGknn=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,6,"Neighbors"->"KNN"]//AbsoluteTiming)&/@{200,600,1800,5400,16200});
In[]:=
AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTGknn]//Dataset
Out[]=
|
In[]:=
GraphicsRow[GraphPlot@*Last/@timespMTGknn[[1;;3]],ImageSize->1000]
Out[]=
We can see that there are some peculiar differences with the previous ones, typical of the knn method; moreover, producing less edges, this method is faster.
The remaining comparison, the knn method in the embedding space, would not bring any surprise: applying u will vary small distances in a very similar way, unless we are in points with very different singular values of the Jacobian matrix, which are usually localized phenomena and will not alter the global structure of the graph.
The remaining comparison, the knn method in the embedding space, would not bring any surprise: applying u will vary small distances in a very similar way, unless we are in points with very different singular values of the Jacobian matrix, which are usually localized phenomena and will not alter the global structure of the graph.
The Choice of The ϵ
The Choice of The ϵ
It can be easily guessed that choosing some value of ϵ instead of some other will drastically change the appearance of our graph; one important parameter is the connectedness of the graph, another is the average vertex degree. Both are impossible to determine precisely without actually computing the graph. However, we can approximate the latter with a Monte Carlo method.
This function draws num points from the set of points pts and checks how many points have distance ≤epsilon from each of them, to have an estimate of the average vertex degree of the resulting graph for such a value of epsilon:
In[]:=
estimatedeg[pts_,epsilon_,num_]:=Mean[(-1+Length[Nearest[pts,#,{Infinity,epsilon}]])&/@RandomChoice[pts,num]]//N
We check that the estimate is precise enough:
In[]:=
With[{pts=genRand[10^5]},{estimatedeg[pts,#,300],Mean[VertexDegree[NearestNeighborGraph[pts,{All,#}]]]//N}&/@Range[0.005,0.05,0.005]]//Dataset
Out[]=
|
Using this function, we can determine, given the set of points, which values of epsilon will give reasonable results (i.e. losely in the interval [5,20], where the lower end gives a geometrically significant behavior, whereas the upper end will usually guarantee connectedness, even tho this last fact is not obvious, nor certain).
An empirical analysis shows that this function gives, for a given scale of magnitude, an interesting range of values of epsilon:
An empirical analysis shows that this function gives, for a given scale of magnitude, an interesting range of values of epsilon:
In[]:=
epsilons[a_]:=Range[2.5/Sqrt[10^a],5.5/Sqrt[10^a],(1./3)/Sqrt[10^a]]
We run a couple of tests; the results are always shown so that different rows refer to different scales of magnitude and different columns to different values of ϵ.
We estimate the average vertex degree of the resulting graph, to see that we generally end up in the interval [5,20]:
In[]:=
With[{pts=genRand[10^#]},Table[estimatedeg[pts,j,300],{j,epsilons[#]}]]&/@Range[3,6]//Grid
Out[]=
4.44 | 5.64 | 6.74 | 8.08 | 10.2433 | 11.72 | 14.1067 | 16.4367 | 17.55 | 20.5067 |
5.46667 | 6.48 | 8.40667 | 10.2233 | 11.6967 | 13.3167 | 16.22 | 19.8633 | 22.1833 | 24.1967 |
5.05667 | 6.69667 | 8.53333 | 9.95333 | 11.7833 | 14.25 | 16.31 | 19.9333 | 22.0233 | 24.9667 |
5.03 | 6.43667 | 8.25 | 10.2167 | 12.4633 | 14.1967 | 17.3033 | 19.4067 | 21.7233 | 24.36 |
We construct some ϵ-graphs for the 4 largest values of epsilons[a] and check for their connectedness:
In[]:=
With[{pts=genRand[10^#]},Table[ConnectedGraphQ[NearestNeighborGraph[pts,{All,j}]],{j,Take[epsilons[#],-4]}]]&/@Range[3,6]//Grid
Out[]=
True | True | True | True |
True | True | True | True |
True | True | True | True |
False | True | True | True |
Given that in this intervals for ϵ we find graphs either respecting the combinatorial geometry of planar configurations of points or having the fundamental property of being connected, we will use such ranges of values in all the following simulations.
Increasing Graph-Geodesic Balls
Increasing Graph-Geodesic Balls
Once we have a graph, we have a notion of distance on it, given by the length of the shortest path between two vertices; therefore, we can define balls with respect to this geodesic distance and we can measure their volume growth in terms of their radius:
The following function uses VertexComponent to incrementally compute the volume of the geodesic balls centered around a point of the manifold, until the volume stabilizes or a certain maximum radius is met:
In[]:=
ballsvolumes[graph_,center_,radius_]:=With[{start=Nearest[VertexList[graph],center,1]},{ConnectedGraphQ[graph],Mean@VertexDegree[graph]//N,VertexCount[graph],NestWhileList[{#[[1]]+1,Length@VertexComponent[graph,start,#[[1]]+1]}&,{0,1},#1[[2]]<#2[[2]]&[[1]]<radius&,2]}]
We start by generating a set of random points and we try to estimate the volume growth of the ϵ-graphs obtained with the values in epsilon[5]:
5
10
In[]:=
volumegrowth=With[{pts=genRand[10^5]},{#,ballsvolumes[NearestNeighborGraph[pts,{All,#}],{0.,0.},100]}&/@epsilons[5]];
By extracting data from volumegrowth, we can have a fairly complete picture of how the choice of epsilon affects the geometry of the graph in terms of distances, neighborhoods and connectivity. For example, we can spot a good portion of disconnected graphs:
We find the values of epsilon for which the sequence of volumes of increasing balls stabilizes before 100 steps with a final value which is not the volume of the whole graph:
In[]:=
First/@Select[volumegrowth,Length[#[[2,4]]]<100&&Last@Last[#[[2,4]]]<#[[2,3]]&]
Out[]=
{0.00790569}
In[]:=
volumegrowth[[1,2,4]]
Out[]=
{{0,1},{1,3},{2,4},{3,4}}
However, we already asked for a connectedness check in the function ballsvolumes:
In[]:=
First@*Last/@volumegrowth
Out[]=
{False,False,False,False,False,False,True,True,True,True}
so the first 6 graphs are disconnected, however, the sequence of increasing balls manages to reach radius 100 from the second value of ϵ on. We plot the volumes for all the graphs but the first.
In[]:=
GraphicsGrid[Partition[ListPlot[Last[Last[#]],PlotLabel->StringJoin[{"ϵ = ",ToString@First[#]}]]&/@Rest[volumegrowth],3],ImageSize1000]
Out[]=
The first (i.e. the second of the original series of 10) is somewhat peculiar, as it presents a more or less linear growth, whereas all other graphs have a quadratic growth of the volumes, until the geodesic balls hit the boundaries of the graph.
We try to determine more quantitatively the growths of these volumes, by transforming via log-log the data and fitting the result in a linear model, both on the whole range of radii and on some initial segment Range[10,50], with weight 1/x.
We try to determine more quantitatively the growths of these volumes, by transforming via log-log the data and fitting the result in a linear model, both on the whole range of radii and on some initial segment Range[10,50], with weight 1/x.
In[]:=
lmf1=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x]&/@Rest[volumegrowth]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
0.0253236+1.83878x
-0.129732+2.17068x
0.21072+2.19681x
1.14979+2.10786x
1.43572+2.11246x
1.58556+2.13254x
1.75238+2.14075x
2.06763+2.10289x
2.3365+2.07268x
In[]:=
lmf2=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->1./Range[Length[Last[Last[#]]]-1]]&/@Rest[volumegrowth]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
0.776012+1.57647x
0.919894+1.81561x
1.2665+1.8424x
1.73995+1.90578x
1.88741+1.95765x
1.9434+2.00988x
2.0382+2.04112x
2.22597+2.0469x
2.48171+2.02062x
In[]:=
lmf3=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->(Boole[#>0&&#<51](1/#)&/@Range[Length[Last[Last[#]]]-1])]&/@Rest[volumegrowth]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
0.878861+1.48026x
1.06792+1.67875x
1.41541+1.70468x
1.82174+1.83037x
1.95073+1.89951x
1.99442+1.96306x
2.08117+2.00137x
2.25373+2.02071x
2.50721+1.99611x
We observe that the coefficient of the linear part is slowly approaching 2, which is an indication of the 2-dimensional nature of the graph: usually, the volume of balls will grow as , so that after a log-log transformation, we obtain something like log(C) + n log(ρ). On the other hand, there is no reasonable hint that suggests that the constant C should be independent of ϵ.
n
Cρ
Row[(<|Sequence@@MapIndexed[ToString[epsilons[5][[First[#2]+1]]]-><|"logC"->#1["ParameterTableEntries"][[1,1]],"n"->#1["ParameterTableEntries"][[2,1]]|>&,#]|>//Dataset)&/@{lmf1,lmf2,lmf3}]
Out[]=
|
|
|
To check our intuition, we repeat the computations with more points.
In[]:=
volumegrowth2=With[{pts=genRand[10^6]},{#,ballsvolumes[NearestNeighborGraph[pts,{All,#}],{0.,0.},100]}&/@epsilons[6]];
In[]:=
Position[epsilons[6],First[First/@Select[volumegrowth2,Length[#[[2,4]]]<100&&Last@Last[#[[2,4]]]<#[[2,3]]&]]]
Out[]=
{{1}}
In[]:=
GraphicsGrid[Partition[ListPlot[Last[Last[#]],PlotLabel->StringJoin[{"ϵ = ",ToString@First[#]}]]&/@Rest[volumegrowth2],3],ImageSize1000]
Out[]=
In[]:=
lmf4=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x]&/@Rest[volumegrowth2]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
0.728105+1.75795x
1.28868+1.87645x
1.67211+1.89333x
1.99409+1.88895x
2.02429+1.93711x
2.2983+1.92735x
2.41825+1.94095x
2.5581+1.95153x
2.67661+1.96051x
In[]:=
lmf5=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->1./Range[Length[Last[Last[#]]]-1]]&/@Rest[volumegrowth2]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
1.43187+1.52056x
1.69738+1.73741x
1.90873+1.81304x
2.24695+1.80356x
2.26159+1.85708x
2.58388+1.831x
2.72684+1.83641x
2.82212+1.8624x
2.933+1.87403x
In[]:=
lmf6=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->(Boole[#>0&&#<51](1/#)&/@Range[Length[Last[Last[#]]]-1])]&/@Rest[volumegrowth2]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
1.53783+1.42247x
1.75669+1.68251x
1.94225+1.78226x
2.28371+1.76975x
2.29612+1.82527x
2.6249+1.79317x
2.77136+1.79534x
2.85994+1.82753x
2.96963+1.84027x
In[]:=
Row[(<|Sequence@@MapIndexed[ToString[epsilons[6][[First[#2]+1]]]-><|"logC"->#1["ParameterTableEntries"][[1,1]],"n"->#1["ParameterTableEntries"][[2,1]]|>&,#]|>//Dataset)&/@{lmf4,lmf5,lmf6}]
Out[]=
|
|
|
Obviously, we should repeat the computations with other configurations with and points and then take the average of the results. Once we are persuaded that the growth is quadratic, we can try to fit it on the theoretical model which should describe the growth particularly for small radii: +. In particular, b/a should be related to curvature; however, we notice that this quantity is highly dependent on the choice of ϵ.
5
10
6
10
2
ax
4
bx
In[]:=
curvfit=LinearModelFit[Take[Rest[Last[Last[#]]],40],{x^2,x^4},x]&/@Rest[volumegrowth2]
Out[]=
FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel,FittedModel
12.39+0.789078-2.79096×
2
x
-6
10
4
x
10.1647+2.32408-0.000115453
2
x
4
x
35.9655+3.54349-0.0000313368
2
x
4
x
52.8298+4.83033-0.000104109
2
x
4
x
28.42+5.86713-0.0000790119
2
x
4
x
32.012+7.46574-0.000118308
2
x
4
x
52.2535+8.56901+0.0000573393
2
x
4
x
39.8055+10.364+0.0000471875
2
x
4
x
41.407+11.9841+0.0000919007
2
x
4
x
In[]:=
(<|Sequence@@MapIndexed[ToString[epsilons[6][[First[#2]+1]]]-><|"a"->#1["ParameterTableEntries"][[2,1]],"b"->#1["ParameterTableEntries"][[3,1]],"est.curv"->-12*#1["ParameterTableEntries"][[3,1]]/#1["ParameterTableEntries"][[2,1]]|>&,curvfit]|>//Dataset)
Out[]=
|
We see that the b terms are not statistically relevant and their sign changes as well.
Other Variations
Other Variations
The previous computations need to be extended to the graphs in the Embedded space and to KNN-Graphs.
Shrinking Neighborhoods of a Point
Shrinking Neighborhoods of a Point
The other way of studying the asymptotic of the volume of geodesic balls around a point is to fix the radius of the ball and increase the total number of points.
This function computes, using VertexComponent, the volume of a ball of fixed radius in a ϵ-graph with points.
s
c10
In[]:=
ball[scale_Integer/;scale>0,coeff_Integer/;coeff>0,eps_Real/;eps>0,radius_Integer/;radius>0]:=With[{graph=NearestNeighborGraph[genRand[coeff10^scale],{All,eps}]},{VertexCount[graph],eps,ConnectedGraphQ[graph],Mean@VertexDegree[graph]//N,radius,Length@VertexComponent[graph,Nearest[VertexList[graph],{0.,0.},1],radius]}];
In[]:=
ρ=10;
shrinking=Outer[ball[#1,#2,2.7/Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
shrinking2=Outer[ball[#1,#2,6.2/Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
shrinking3=Outer[ball[#1,#2,14./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
shrinking4=Outer[ball[#1,#2,8./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
shrinking5=Outer[ball[#1,#2,10./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
shrinking6=Outer[ball[#1,#2,12./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];
Again, we varied ϵ, in order to observe various geometric behaviors; the output of the function ball gives us a number of informations, from connectedness to average degree. For the time being, we turn our attention to the volumes of the balls.
In[]:=
GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking,{2}]]//N,ImageSize->700]
Out[]=
As can easily be inferred by these graphs, we need more datapoints and more precision, in order to apply meaningfully any fitting technique. We can however try and change the ϵ, using another dataset. The next three plots are from a list of graphs with average vertex degree near 6.
In[]:=
GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking2,{2}]]//N,ImageSize->700]
Out[]=
The following are from a list of graph with very high ϵ, so connected and with high average vertex degree. The fact that the points do not go toward the origin is due to the high number of neighbors of each point, which is non-negligible with respect to the total number of points.
In[]:=
GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking3,{2}]]//N,ImageSize->700]
Out[]=
Varying the Manifold
An unavoidable experiment is to reproduce part of our results on another manifold.
In[]:=
k=3;n=2;ℛ=Rectangle[{-1,-1},{1,1}];uservars={x,y};(*variablesusedtodescribeuandanysubregionofℛasanImplicitRegion*)u[x_,y_]:={x,y,x^4+x^2y^2+y^2(1/2-y^2)}
In[]:=
variables=Table[Unique["x"],{n}];cu=Compile@@{{#,_Real}&/@#,u@@#,RuntimeOptions"Speed",RuntimeAttributes{Listable},ParallelizationTrue,CompilationOptions{"InlineExternalDefinitions"True}}&[variables];descrℛ=(#[[1]]/.Thread[#[[2]]variables])&[RegionConvert[ℛ,"Implicit"]];canonicalbasis=Table[Table[Boole[i==j],{i,1,n}],{j,1,n}];volmeasureM[vars_List]:=FullSimplify[Norm[Minors[Function[p,(Derivative@@p)[u]@@vars]/@canonicalbasis,n]],vars∈ℛ]callvolmeasureM=Function[#,Evaluate@volmeasureM[#]]&[variables];cutvolmeasureM=Piecewise[{{volmeasureM[{#1,#2}],descrℛ/.{variables[[1]]->#1,variables[[2]]->#2}}}]&;
In[]:=
volM=NIntegrate[callvolmeasureM[Sequence@@variables],variables∈ℛ]volR=RegionMeasure[ℛ,n]
Out[]=
8.08323
Out[]=
4
In[]:=
myPoissonPointProcess=InhomogeneousPoissonPointProcess[callvolmeasureM,n];correction=1./(volRMeanPointDensity[{myPoissonPointProcess,ℛ}])
Out[]=
0.125031
Map[ListPointPlot3D[cu@@@#,AspectRatio1.2]&,genRand[10^#,3]&/@Range[3,5],{2}];
In[]:=
GraphicsGrid[%197,ImageSize->1000]
Out[]=
In[]:=
ParametricPlot3D[u[x,y],{x,y}∈ℛ,PlotTheme"Business"]
Out[]=
Here we see that the resemblance is less striking, due to some more curvature. We can run again the same tests on the volumes.
In[]:=
Dataset@Association[Region[#,ImageSize->Tiny,BaselinePositionTop]-><|Function[j,j->Association[Function[k,StringJoin[{ToString[k]," iter"}]->Quiet@checkVol[#,10^j,k][[1]]]/@Range[20,60,20]]]/@Range[3,5]|>&/@]
Out[]=
|
In[]:=
NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@
Out[]=
{0.0392378,0.000966818,0.00582836}
Simulations for increasing balls and shrinking neighborhoods can be performed as well.
Final Remarks
We have investigated the process of creating a graph from random points on a manifold, highlighting the good properties of approximation of the volume in relation with the number of points in our sample, but also in relation with the parameters employed to create the graph. We want to mention three outcomes of our investigation:
1
.The volume approximation is consistent and increases with the number of points, only slightly influenced by singularities, pinching of curvature and other phenomena.
2
.The estimation of dimension from the growth of the volumes of geodesic balls has a very slow convergence (if it converges), but it seems robust against variations of the parameters of the construction (ϵ, k).
3
.The increasing balls volumes and the shrinking neighborhoods volumes are too sensitive to variations in the parameters to give reliable informations on second order corrections, like scalar curvature: actually, it makes sense that varying ϵ may result in a variation of the scalar curvature of the graph, as it gets more and more different from the geometric structure of the manifold.
Features that would be nice to have
Features that would be nice to have
◼
More tests should be run on different manifolds, varying also the dimensions of the parameter and the embedding spaces.
◼
What is missing here is a way to compute geodesic distances on the manifold, in order to compare the results about graph-geodesic balls to the behavior of the Riemannian geodesic balls on the underlying manifold.
◼
Estimates on the number of datapoints that could give reliable estimates on the second order term of the volume as the neighborhoods approach the point.
◼
An implementation of the Ricci-Ollivier curvature, to be compared with the sectorial growth of volumes (as in the Riemannian case).
data:image/s3,"s3://crabby-images/4079d/4079d57633b5f88bf9a49688684d35628eb2c6bf" alt=""
data:image/s3,"s3://crabby-images/56607/56607cca9c3f8f5e959237fb5ea16950a488c5ec" alt=""
Cite this as: Samuele Mongodi, "Asymptotics of the Volume of Manifolds and Graphs" from the Notebook Archive (2021), https://notebookarchive.org/2021-07-62urncv
data:image/s3,"s3://crabby-images/afa7e/afa7e751d718eac7e65669706b85c714b1d1becc" alt=""
Download
data:image/s3,"s3://crabby-images/c9374/c9374a157002afb9ce03cd482ea9bc6b4ee16fc0" alt=""
data:image/s3,"s3://crabby-images/7630b/7630b01d225114cfa2bafc392f9b6df93ec5f7bb" alt=""