Essays, Posts & Presentations

Asymptotics of the Volume of Manifolds and Graphs

Samuele Mongodi

Author

Samuele Mongodi

Title

Asymptotics of the Volume of Manifolds and Graphs

Description

In order to understand what can be inferred from a discrete sample of points about the geometry of the manifold, we will simulate a random process that generates points according to the volume density of the manifold in a given parameter space.

Asymptotics of the Volume of Manifolds and Graphs

SamueleMongodi

Politecnico di Milano

In a computational setting, we are rarely so lucky to have an explicit equation or a nice parametrization as in the usual examples of manifold . At times, of a given manifold, we only know a bunch of points, maybe as many as we want, but finitely many.
In order to understand what can be inferred from a discrete sample of points about the geometry of the manifold, we will simulate a random process that generates points according to the volume density of the manifold in a given parameter space. We will then try to give to these points a graph structure, based either on the local distances between them or on the local expected connectivity of a graph lying on a k-dimensional manifold (in terms of the average degree of a vertex); in this way, we have a notion of distance between such points which is not the Euclidean one and hopefully is linked to the geometry of the manifold.
However, our attempts at computing the limiting behavior of the volume, in the small and in the large, will show that the estimator for curvature is too sensitive to changes in the parameters that we use to construct the graph and, even if it can be calculated, it has close to no relationship with the geometry of the underlying manifold.

Poisson Point Processes

The main tool we will be using to generate points that are uniformly distributed on our manifold is the inhomogeneous Poisson point process, which produces independent draws from a given distribution.
We set up the parameter space dimension (n), the embedding space dimension (k), the parametrization function (u) and the parameter region (ℛ) given by our second example from before (we just exchange z and y for convenience) and we proceed to compute a bunch of auxiliary functions and quantities, like the area or ℛ or the area of the piece of manifold parametrized by u over ℛ, which we will call M:

In[]:=

k=3;n=2;ℛ=Rectangle[{-1,-1},{1,1}];uservars={x,y};(*variablesusedtodescribeuandanysubregionofℛasanImplicitRegion*)u[x_,y_]:={x,y,Sqrt[1-x^2+y^2]};

We create a compiled version of u, compute the volume density of u on ℛ, create a "callable" version of such volume density to be used in our inhomogeneous Poisson point process later on:

In[]:=

variables=Table[Unique["x"],{n}];cu=Compile@@{{#,_Real}&/@#,u@@#,RuntimeOptions"Speed",RuntimeAttributes{Listable},ParallelizationTrue,CompilationOptions{"InlineExternalDefinitions"True}}&[variables];descrℛ=(#[[1]]/.Thread[#[[2]]variables])&[RegionConvert[ℛ,"Implicit"]];canonicalbasis=Table[Table[Boole[i==j],{i,1,n}],{j,1,n}];volmeasureM[vars_List]:=FullSimplify[Norm[Minors[Function[p,(Derivative@@p)[u]@@vars]/@canonicalbasis,n]],vars∈ℛ]callvolmeasureM=Function[#,Evaluate@volmeasureM[#]]&[variables];cutvolmeasureM=Piecewise[{{volmeasureM[{#1,#2}],descrℛ/.{variables[[1]]->#1,variables[[2]]->#2}}}]&;

Here, we use the volume density function in the most basic way: its integral over ℛ gives the area of the piece of manifold M (hyperboloid, in our case) which is obtained by computing u on the points of ℛ. By comparison, the area of ℛ is (obviously) just 4:

In[]:=

volM=NIntegrate[callvolmeasureM[Sequence@@variables],variables∈ℛ]

Out[]=

5.55794

In[]:=

volR=RegionMeasure[ℛ,n]

Out[]=

So, the function volmeasureM (or its cognate callvolmeasureM) tells us which parts of ℛ contribute more to the area of M and which parts contribute less: plotting M over ℛ makes it clear that equal parts of ℛ may correspond to pieces of M with different extension:

Show[ParametricPlot3D[u[x,y],{x,y}∈ℛ,AxesFalse,BoxedFalse,BoxRatios{1,1,1},ViewPoint{1,-1,.5},Mesh5],ParametricPlot3D[{x,y,0},{x,-1,1},{y,-1,1},PlotTheme"Marketing",PlotStyleOpacity[0.2]]]

Out[]=

We can now build our Poisson process; however, the number of points we will be getting out of a RandomPointConfiguration call with such a PointProcess will depend by the MeanPointDensity of our point process; in order to be able to draw as many points as we want, we will just have to multiply our function by a suitable correction constant:

In[]:=

myPoissonPointProcess=InhomogeneousPoissonPointProcess[callvolmeasureM,n];correction=1./(volRMeanPointDensity[{myPoissonPointProcess,ℛ}])

Out[]=

0.1844

Random Points Generator

We are now ready to define our function to generate random points on ℛ according to the volume density of M. This functions accepts in input the desired number of points and, optionally, the number of configurations of points with such cardinality. Given the random nature of such a process and the mean involved in the definition of the correction from before, the function will return a number of points which is slightly less than requested:

In[]:=

genRand[numpts_Integer/;numpts>0,nconf_:1]:=Module[{nvM,tPPP},nvM=Function[#,Evaluate[numpts*correction*volmeasureM[#]]]&[variables];tPPP=InhomogeneousPoissonPointProcess[nvM,n];If[nconf==1,Return[RandomPointConfiguration[tPPP,ℛ]["Points"]],Return[RandomPointConfiguration[tPPP,ℛ,nconf]["PointsList"]]]]

As an example, we produce from a thousand to a million points and use Histogram3D to see if their distribution actually approximates the volume distribution function, which we will plot in the end:

In[]:=

GraphicsColumn[Histogram3D[genRand[10^#]]&/@Range[3,6],ImageSize->400]

Out[]=

Plot3D[callvolmeasureM[x,y],{x,y}∈ℛ]

Out[]=

As we can see, when the number of points (1K on the first line of the table, 1M on the last) increases, the approximation of the density is more accurate; this is also reflected in the uniform distribution of the images of such points via u on M. To see this, we compute again random points as before and then we apply to them the function u (or, better, it's compiled version). We go up only to 100K as ListPointPlot3D does not handle well bigger lists; as a comparison, we plot below the manifold M (image of the function u on ℛ).

In[]:=

GraphicsColumn[ListPointPlot3D[cu@@@genRand[10^#],AspectRatio1.2]&/@Range[3,5],ImageSize->400]

Out[]=

ParametricPlot3D[u[x,y],{x,y}∈ℛ,PlotTheme"Business"]

Out[]=

One can easily see how the distribution is uniform on M, but it is not when projected on ℛ, where, as we know, the point distribute following the law given by the volume density function, which is bigger at some points and smaller at others. This is particularly clear from the first line of plots, with fewer points

Volume Approximation

Caution: we will be indistinctly taking about volume and area, because our case-of-study manifold is a surface, but everything we say can be applied to higher dimensional manifolds (with all the obvious drawbacks in terms of execution times and memory usage).

A first test to understand in which sense these points approximate the volume density function is to compute the volume of a subregion in ℛ by counting how many points end up in it; the ratio with the total number of points should converge (as we increase the total number of points) to the ratio between the area of the piece of manifold over the subregion and the area of M:

We define three test subregions in ℛ and group them in the family . Next, we plot them in white over orange:

={ImplicitRegion[x^2+y^2<1/10,{x,y}],ImplicitRegion[Abs[x+y]<1/16&&Abs[x-y]<1/16,{x,y}],ImplicitRegion[Abs[y]<Abs[x]^(2/3)&&0<x&&x<1/7,{x,y}]};

In[]:=

GraphicsRow[Region[RegionDifference[ℛ,#],PlotTheme"Scientific"]&/@]

Out[]=

By integrating the volume density function over these three regions, we obtain the ratio between the volumes of the pieces of manifold parametrized by there three regions and the volume of M:

In[]:=

vol=Association[#->NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@];Values[vol]

Out[]=

{0.0579181,0.00140656,0.00852415}

By using the function genRand and a compiled ad hoc version of Select, we construct a function that generates several configuration of a given number of points and counts how many end up in a given region:

In[]:=

checkVol[_,npts_Integer/;npts>0,iter_Integer/;iter>0]:=Module[{descr,select,batches},descr=Function[a,Evaluate[[[1]]/.Thread[uservars->Table[Part[a,j],{j,1,n}]]]];select=Compile[{{lst,_Real,2}},Length[Select[lst,descr]]/Length[lst],RuntimeOptions->"Speed",Parallelization->True,CompilationOptions->{"InlineExternalDefinitions"->True},CompilationTarget"C",RuntimeAttributes{Listable}];batches=genRand[npts,iter];{Mean[#],#}&[select[batches]]]

We use the function checkVol to test the goodness of the approximations given by averaging on several configurations of

points, for m=3,4,5,6,7:

In[]:=

Dataset@Association[Region[#,ImageSize->Tiny,BaselinePositionTop]-><|Function[j,j->Association[Function[k,StringJoin[{ToString[k]," iter"}]->Quiet@checkVol[#,10^j,k][[1]]]/@Range[20,60,20]]]/@Range[3,5]|>&/@]

Out[]=

We see that the approximations are converging slowly (it is important here to look not only at the “correct” digits but also at the error: if the 3rd digit is correct but then we have an error of

-4

x10

, then we can actually improve the error by having the 3rd digit wrong, but staying closer to the right result.

Metropolis algorithm

We define a function Metropolis as an input of Random`DistributionVector, to implement the Metropolis algorithm inside RandomVariate, in order to produce samples from a target probability density function (pdf):

starting from a point u0,

we “propose” a jump du, which is a random vector drawn from a normal distribution of mean 0 and variance s (=1 by default),

we compute our target probability density function (pdf) at u0 and at u0+du,

we generate a random real number r between 0 and 1, with uniform probability, and we compute pdf(u0+du)/pdf(u0)

if this last number is bigger than r, we accept the jump and our next point is u0+du, otherwise our next point is still u0.

In this way, we obtain a Markov chain whose probability law asymptotically tends to the target pdf; therefore, we run the Markov chain for n+n0 steps and we ignore the first n0, obtaining n points sampled from the target pdf. We run a number (num=200 by default) of chains in parallel.
The problem of this algorithm is that we often obtain repeated points:

In[]:=

Metropolis/:Random`DistributionVector[Metropolis[pdf_,u0_,s_:1,n0_:100,chains_:200],n_Integer,prec_?Positive]:=Module[{u,du,p,p1,accept,cpdf},cpdf=Compile@@{{#,_Real}&/@#,pdf@@#,RuntimeAttributes->{Listable},RuntimeOptions->"Speed",Parallelization->True}&[Unique["x",Temporary]&/@u0];u=ConstantArray[u0,chains];p=cpdf@@Transpose[u];(Join@@Table[du=RandomVariate[NormalDistribution[0,s],{chains,Length[u0]}];p1=cpdf@@Transpose[u+du];accept=UnitStep[p1/p-RandomReal[{0,1},chains]];p+=(p1-p)accept;u+=duaccept,{Ceiling[(n0+n)/chains]}])[[n0+1;;n0+n]]];

In[]:=

genRandMet[scale_,numrand_]:=ParallelTable[RandomVariate[Metropolis[cutvolmeasureM,{0.,0.}],scale],{numrand}]

The problem with this algorithm is that, even if the running time is comparable or even better than the RandomPointConfiguration with a inhomogeneous Poisson Point Process, there are repeated points, which will prevent us to use it to construct graphs. Removing the duplicates will noticeably lower the precision of the approximation:

We use the same test as before, on the regions in , to see how good is the approximation given; in order to do so, we define another version of our checkVol function:

In[]:=

={ImplicitRegion[x^2+y^2<1/10,{x,y}],ImplicitRegion[Abs[x+y]<1/16&&Abs[x-y]<1/16,{x,y}],ImplicitRegion[Abs[y]<Abs[x]^(2/3)&&0<x&&x<1/7,{x,y}]};

In[]:=

vol=Association[#->NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@];Values[vol]

Out[]=

{0.0579181,0.00140656,0.00852415}

In[]:=

checkVolMet[_,npts_Integer/;npts>0,iter_Integer/;iter>0]:=Module[{descr,select,batches},descr=Function[a,Evaluate[[[1]]/.Thread[uservars->Table[Part[a,j],{j,1,n}]]]];select=Compile[{{lst,_Real,2}},Length[Select[lst,descr]]/Length[lst],RuntimeOptions->"Speed",Parallelization->True,CompilationOptions->{"InlineExternalDefinitions"->True},CompilationTarget"C",RuntimeAttributes{Listable}];batches=genRandMet[npts,iter];{Mean[#],#}&[select[batches]]]

In[]:=

Dataset@Association[Region[#,ImageSize->Tiny,BaselinePositionTop]-><|Function[j,j->Association[Function[k,StringJoin[{ToString[k]," iter"}]->Quiet@checkVolMet[#,10^j,k][[1]]]/@Range[20,60,20]]]/@Range[3,5]|>&/@]

Out[]=

The Metropolis algorithm needs more points to converge, but its noticeably faster than the RandomPointConfiguration; the only problem in generating

point is the memory necessary to store them.

Connecting the Dots

In order to extract some more information from our random points, we try to give them a meaningful graph structure, by connecting with a(n unoriented) edge two points when they are, in some sense, near to each other; we can do that in two ways.

◼

ϵ-Neighbors Graph : (ϵ-graph) we connect two points if their distance is less than a given threshold ϵ.

◼

k-Nearest-Neighbors Graph : (knn-graph) we connect each point with the k nearest points among the others.

Both the previous graphs can be constructed using the function NearestNeighborGraph.

These two choices are, in a way, independent and opposite: the knn-graph will almost always produce something connected and with each vertex of degree exactly k, but it may happen that two fairly distant points are connected because one of them lies in a scarcely populated region.
On the other hand, in a ϵ-graph, all the connected points will always be close, under the ϵ threshold; however, this could result in a high average degree or, if this is kept reasonably low, in a disconnected graph. Moreover, the ϵ-graph is very sensitive with respect to small perturbations of the points, as the following examples show.

The Perturbed Hexagonal and Square Lattices

The following example shows how "fragile" is the behavior of an ϵ-graph when the points are allowed to have "errors" or perturbations:

In[]:=

hexalattice=Flatten[Join[Table[{3n+1,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n-1,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n+1/2,(m+1/2)Sqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n-1/2,(m-1/2)Sqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n,mSqrt[3]},{n,-3,3},{m,-3,3}],Table[{3n+3/2,(m+1/2)Sqrt[3]},{n,-3,3},{m,-3,3}]],1];

In[]:=

hexagraph=NearestNeighborGraph[hexalattice,{All,1}]

Out[]=

Already if we use machine precision coordinates, we will fall into rounding errors:

In[]:=

mphexagraph=NearestNeighborGraph[hexalattice//N,{All,1}]

Out[]=

And if we allow for some “noise”, the result is even worse:

In[]:=

perturbedhexagraph=NearestNeighborGraph[hexalattice+RandomReal[{-10^(-9),10^(-9)},{hexalattice//Length,2}]//N,{All,1}]

Out[]=

With a knn graph we would not have such a problem:

In[]:=

perturbedhexagraph2=NearestNeighborGraph[VertexList[perturbedhexagraph],6]

Out[]=

However, who told us that 6 is the right number? If we had come from a square grid, 6 would have produced some awkward results:

In[]:=

squaregrid=Flatten[Outer[N@List,Range[10],Range[10]],1];latticegraph=Labeled[NearestNeighborGraph[squaregrid,{All,1}],"perfect ϵ-graph"]perturbedlattice=Labeled[NearestNeighborGraph[squaregrid+RandomReal[{-10^(-9),10^(-9)},{Length[squaregrid],2}],{All,1}],"noisy ϵ-graph"]perturbedlattice2=Labeled[NearestNeighborGraph[VertexList[perturbedlattice[[1]]],6],"6nn-graph"]

Out[]=

From Parametrized Manifolds to Graphs

In the Function Repository, one can find two functions to produce graphs from manifolds (ResourceFunction["ExtrinsicCurvedManifoldToGraph"] and ResourceFunction["IntrinsicCurvedManifoldToGraph"] ), which are however limited in use, either by some hypotheses on the nature of the density function we can use or for their prohibitive execution times on large collections of points.

From the documentation, we take one of the examples and increase the number of points:

In[]:=

timesICMTG=((ResourceFunction["IntrinsicCurvedManifoldToGraph"][x^2+2y^2+3x^3+4y^4+5x^5+6y^6,{x,-1,1},{y,-1,1},0.3,#];//AbsoluteTiming)&/@{200,600,1800,5400})

Out[]=

{{8.50489,Null},{24.6342,Null},{75.2945,Null},{230.097,Null}}

We propose another function, speeding up the computation in case one is interested only in the graph object per se and getting rid of the restriction on the density function. :

This function goes through the steps that we executed in the previous chapter of this notebook, to produce a set of random points which is then fed to the NearestNeighborGraph function:

In[]:=

parametricManifoldToGraph[uexpr_,reg_,uservars_,npts_,par_,OptionsPattern[{"Neighbors"->"Epsilon","Space"->"Parameters"}]]:=Module[{n,vars,u,canbasis,vmM,cvmM,myPPP,corr,nvM,tPPP,pts},n=Length[uservars];vars=Table[Unique["x"],{n}];u=Function[#,uexpr/.Thread[uservars->#]]&[vars];canbasis=Table[Table[Boole[i==j],{i,1,n}],{j,1,n}];vmM[varz_List]:=FullSimplify[Norm[Minors[Function[p,(Derivative@@p)[u]@@varz]/@canbasis,n]],varz∈ℛ];cvmM=Function[#,Evaluate@vmM[#]]&[vars];myPPP=InhomogeneousPoissonPointProcess[cvmM,n];corr=1./(RegionMeasure[reg,n]MeanPointDensity[{myPPP,reg}]);nvM=Function[#,Evaluate[npts*corr*cvmM@@#]]&[vars];tPPP=InhomogeneousPoissonPointProcess[nvM,n];If[OptionValue["Space"]=="Parameter",pts=RandomPointConfiguration[tPPP,reg]["Points"],pts=u@@@RandomPointConfiguration[tPPP,reg]["Points"]];If[OptionValue["Neighbors"]=="Epsilon",Return[NearestNeighborGraph[pts,{All,par}]],Return[NearestNeighborGraph[pts,par]]]]

It is important to notice that the function given as an argument to IntrinsicCurvedManifoldToGraph is already the probability density function from which we want to draw samples; here we pass as an argument the parametrization of the manifold, i.e. the function u, the region ℛ and the list of user variables (used to define u and ℛ). We will test our function on the same number of points, with the same epsilon parameter, on a different function:

In[]:=

timespMTG=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,0.3]//AbsoluteTiming)&/@{200,600,1800,5400,16200});

In[]:=

AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTG]//Dataset

Out[]=

To have a visual idea of what is happening, we plot the first three graphs:

In[]:=

GraphicsRow[GraphPlot@*Last/@timespMTG[[1;;3]],ImageSize->1000]

Out[]=

As a reference, the manifold we are passing to our new function is the following; we can see a resemblance with the first two graphs, which have a sufficiently low number of vertices to be visualized properly. As a side note, we remark that the image of u is not technically an embedded manifold, because it has a curve of singular points where all the “petals” meet:

In[]:=

ParametricPlot3D[{x^2+y,xy,x^4+y^6},{x,y}∈ℛ,Boxed->False,Axes->False,PlotTheme->"Business"]

Parameters vs Embedding - Epsilon vs KNN

In the parametricManifoldToGraph function, we included an option called “Space”; the default value is “Parameters”, as opposed to “Embedding”: our function u goes from ℛ to



(in our examples, k=3), so we can decide if we want to measure distanced between points in ℛ or in



. Both make sense and are locally equivalent. We present the results of the previous benchmark with the “Embedding” option:

In[]:=

timespMTGemb=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,0.3,"Space"->"Embedding"]//AbsoluteTiming)&/@{200,600,1800,5400,16200});

In[]:=

AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTGemb]//Dataset

Out[]=

200	0.684794
600	0.654335
1800	0.774844
5400	1.35455
16200	6.3844

We plot the first three graphs, which still have a manageable number of vertices and edges - sort of:

In[]:=

GraphicsRow[GraphPlot@*Last/@timespMTGemb[[1;;3]],ImageSize->1000]

Out[]=

The times are comparable, however the same value of ϵ gives slightly different structures:

In[]:=

AssociationThread[{"parameters space","embedding space"}->(AssociationThread[ToString/@{200,600,1800,5400,16200}->#]&/@{Mean@*VertexDegree/@timespMTG[[All,2]]//N,Mean@*VertexDegree/@timespMTGemb[[All,2]]//N})]//Dataset

Out[]=

	200	600	1800	5400	16200
parameters space	7.99015	24.1703	65.0156	209.944	605.266
embedding space	6.93069	22.3543	73.4049	201.346	599.901

Another small comparison we have to make is the use of the KNN method to build the graph; we follow the same procedure as before, tabulating the times and plotting the first three graphs.

In[]:=

timespMTGknn=((parametricManifoldToGraph[{x^2+y,xy,x^4+y^6},ℛ,{x,y},#,6,"Neighbors"->"KNN"]//AbsoluteTiming)&/@{200,600,1800,5400,16200});

In[]:=

AssociationThread[ToString/@{200,600,1800,5400,16200}->First/@timespMTGknn]//Dataset

Out[]=

200	0.639343
600	0.623833
1800	0.693967
5400	0.789998
16200	0.997139

In[]:=

GraphicsRow[GraphPlot@*Last/@timespMTGknn[[1;;3]],ImageSize->1000]

Out[]=

We can see that there are some peculiar differences with the previous ones, typical of the knn method; moreover, producing less edges, this method is faster.
The remaining comparison, the knn method in the embedding space, would not bring any surprise: applying u will vary small distances in a very similar way, unless we are in points with very different singular values of the Jacobian matrix, which are usually localized phenomena and will not alter the global structure of the graph.

The Choice of The ϵ

It can be easily guessed that choosing some value of ϵ instead of some other will drastically change the appearance of our graph; one important parameter is the connectedness of the graph, another is the average vertex degree. Both are impossible to determine precisely without actually computing the graph. However, we can approximate the latter with a Monte Carlo method.

This function draws num points from the set of points pts and checks how many points have distance ≤epsilon from each of them, to have an estimate of the average vertex degree of the resulting graph for such a value of epsilon:

In[]:=

estimatedeg[pts_,epsilon_,num_]:=Mean[(-1+Length[Nearest[pts,#,{Infinity,epsilon}]])&/@RandomChoice[pts,num]]//N

We check that the estimate is precise enough:

In[]:=

With[{pts=genRand[10^5]},{estimatedeg[pts,#,300],Mean[VertexDegree[NearestNeighborGraph[pts,{All,#}]]]//N}&/@Range[0.005,0.05,0.005]]//Dataset

Out[]=

1.87333	2.09026
8.02	8.30507
19.07	18.6356
32.77	32.9844
50.1767	51.2695
74.5667	73.4263
102.167	99.4434
128.223	129.208
160.097	162.667
196.773	199.832

Using this function, we can determine, given the set of points, which values of epsilon will give reasonable results (i.e. losely in the interval [5,20], where the lower end gives a geometrically significant behavior, whereas the upper end will usually guarantee connectedness, even tho this last fact is not obvious, nor certain).
An empirical analysis shows that this function gives, for a given scale of magnitude, an interesting range of values of epsilon:

In[]:=

epsilons[a_]:=Range[2.5/Sqrt[10^a],5.5/Sqrt[10^a],(1./3)/Sqrt[10^a]]

We run a couple of tests; the results are always shown so that different rows refer to different scales of magnitude and different columns to different values of ϵ.

We estimate the average vertex degree of the resulting graph, to see that we generally end up in the interval [5,20]:

In[]:=

With[{pts=genRand[10^#]},Table[estimatedeg[pts,j,300],{j,epsilons[#]}]]&/@Range[3,6]//Grid

Out[]=

4.44	5.64	6.74	8.08	10.2433	11.72	14.1067	16.4367	17.55	20.5067
5.46667	6.48	8.40667	10.2233	11.6967	13.3167	16.22	19.8633	22.1833	24.1967
5.05667	6.69667	8.53333	9.95333	11.7833	14.25	16.31	19.9333	22.0233	24.9667
5.03	6.43667	8.25	10.2167	12.4633	14.1967	17.3033	19.4067	21.7233	24.36

We construct some ϵ-graphs for the 4 largest values of epsilons[a] and check for their connectedness:

In[]:=

With[{pts=genRand[10^#]},Table[ConnectedGraphQ[NearestNeighborGraph[pts,{All,j}]],{j,Take[epsilons[#],-4]}]]&/@Range[3,6]//Grid

Out[]=

True	True	True	True
True	True	True	True
True	True	True	True
False	True	True	True

Given that in this intervals for ϵ we find graphs either respecting the combinatorial geometry of planar configurations of points or having the fundamental property of being connected, we will use such ranges of values in all the following simulations.

Increasing Graph-Geodesic Balls

Once we have a graph, we have a notion of distance on it, given by the length of the shortest path between two vertices; therefore, we can define balls with respect to this geodesic distance and we can measure their volume growth in terms of their radius:

The following function uses VertexComponent to incrementally compute the volume of the geodesic balls centered around a point of the manifold, until the volume stabilizes or a certain maximum radius is met:

In[]:=

ballsvolumes[graph_,center_,radius_]:=With[{start=Nearest[VertexList[graph],center,1]},{ConnectedGraphQ[graph],Mean@VertexDegree[graph]//N,VertexCount[graph],NestWhileList[{#[[1]]+1,Length@VertexComponent[graph,start,#[[1]]+1]}&,{0,1},#1[[2]]<#2[[2]]&&#2[[1]]<radius&,2]}]

We start by generating a set of

random points and we try to estimate the volume growth of the ϵ-graphs obtained with the values in epsilon[5]:

In[]:=

volumegrowth=With[{pts=genRand[10^5]},{#,ballsvolumes[NearestNeighborGraph[pts,{All,#}],{0.,0.},100]}&/@epsilons[5]];

By extracting data from volumegrowth, we can have a fairly complete picture of how the choice of epsilon affects the geometry of the graph in terms of distances, neighborhoods and connectivity. For example, we can spot a good portion of disconnected graphs:

We find the values of epsilon for which the sequence of volumes of increasing balls stabilizes before 100 steps with a final value which is not the volume of the whole graph:

In[]:=

First/@Select[volumegrowth,Length[#[[2,4]]]<100&&Last@Last[#[[2,4]]]<#[[2,3]]&]

Out[]=

{0.00790569}

In[]:=

volumegrowth[[1,2,4]]

Out[]=

{{0,1},{1,3},{2,4},{3,4}}

However, we already asked for a connectedness check in the function ballsvolumes:

In[]:=

First@*Last/@volumegrowth

Out[]=

{False,False,False,False,False,False,True,True,True,True}

so the first 6 graphs are disconnected, however, the sequence of increasing balls manages to reach radius 100 from the second value of ϵ on. We plot the volumes for all the graphs but the first.

In[]:=

GraphicsGrid[Partition[ListPlot[Last[Last[#]],PlotLabel->StringJoin[{"ϵ = ",ToString@First[#]}]]&/@Rest[volumegrowth],3],ImageSize1000]

Out[]=

The first (i.e. the second of the original series of 10) is somewhat peculiar, as it presents a more or less linear growth, whereas all other graphs have a quadratic growth of the volumes, until the geodesic balls hit the boundaries of the graph.
We try to determine more quantitatively the growths of these volumes, by transforming via log-log the data and fitting the result in a linear model, both on the whole range of radii and on some initial segment Range[10,50], with weight 1/x.

In[]:=

lmf1=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x]&/@Rest[volumegrowth]

Out[]=

FittedModel

0.0253236+1.83878x

,FittedModel

-0.129732+2.17068x

,FittedModel

0.21072+2.19681x

,FittedModel

1.14979+2.10786x

,FittedModel

1.43572+2.11246x

,FittedModel

1.58556+2.13254x

,FittedModel

1.75238+2.14075x

,FittedModel

2.06763+2.10289x

,FittedModel

2.3365+2.07268x



In[]:=

lmf2=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->1./Range[Length[Last[Last[#]]]-1]]&/@Rest[volumegrowth]

Out[]=

FittedModel

0.776012+1.57647x

,FittedModel

0.919894+1.81561x

,FittedModel

1.2665+1.8424x

,FittedModel

1.73995+1.90578x

,FittedModel

1.88741+1.95765x

,FittedModel

1.9434+2.00988x

,FittedModel

2.0382+2.04112x

,FittedModel

2.22597+2.0469x

,FittedModel

2.48171+2.02062x



In[]:=

lmf3=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->(Boole[#>0&&#<51](1/#)&/@Range[Length[Last[Last[#]]]-1])]&/@Rest[volumegrowth]

Out[]=

FittedModel

0.878861+1.48026x

,FittedModel

1.06792+1.67875x

,FittedModel

1.41541+1.70468x

,FittedModel

1.82174+1.83037x

,FittedModel

1.95073+1.89951x

,FittedModel

1.99442+1.96306x

,FittedModel

2.08117+2.00137x

,FittedModel

2.25373+2.02071x

,FittedModel

2.50721+1.99611x



We observe that the coefficient of the linear part is slowly approaching 2, which is an indication of the 2-dimensional nature of the graph: usually, the volume of balls will grow as

Cρ

, so that after a log-log transformation, we obtain something like log(C) + n log(ρ). On the other hand, there is no reasonable hint that suggests that the constant C should be independent of ϵ.

Row[(<|Sequence@@MapIndexed[ToString[epsilons[5][[First[#2]+1]]]-><|"logC"->#1["ParameterTableEntries"][[1,1]],"n"->#1["ParameterTableEntries"][[2,1]]|>&,#]|>//Dataset)&/@{lmf1,lmf2,lmf3}]

Out[]=

	logC	n
0.00895979	0.0253236	1.83878
0.0100139	-0.129732	2.17068
0.011068	0.21072	2.19681
0.0121221	1.14979	2.10786
0.0131762	1.43572	2.11246
0.0142302	1.58556	2.13254
0.0152843	1.75238	2.14075
0.0163384	2.06763	2.10289
0.0173925	2.3365	2.07268

	logC	n
0.00895979	0.776012	1.57647
0.0100139	0.919894	1.81561
0.011068	1.2665	1.8424
0.0121221	1.73995	1.90578
0.0131762	1.88741	1.95765
0.0142302	1.9434	2.00988
0.0152843	2.0382	2.04112
0.0163384	2.22597	2.0469
0.0173925	2.48171	2.02062

	logC	n
0.00895979	0.878861	1.48026
0.0100139	1.06792	1.67875
0.011068	1.41541	1.70468
0.0121221	1.82174	1.83037
0.0131762	1.95073	1.89951
0.0142302	1.99442	1.96306
0.0152843	2.08117	2.00137
0.0163384	2.25373	2.02071
0.0173925	2.50721	1.99611

To check our intuition, we repeat the computations with more points.

In[]:=

volumegrowth2=With[{pts=genRand[10^6]},{#,ballsvolumes[NearestNeighborGraph[pts,{All,#}],{0.,0.},100]}&/@epsilons[6]];

In[]:=

Position[epsilons[6],First[First/@Select[volumegrowth2,Length[#[[2,4]]]<100&&Last@Last[#[[2,4]]]<#[[2,3]]&]]]

Out[]=

In[]:=

GraphicsGrid[Partition[ListPlot[Last[Last[#]],PlotLabel->StringJoin[{"ϵ = ",ToString@First[#]}]]&/@Rest[volumegrowth2],3],ImageSize1000]

Out[]=

In[]:=

lmf4=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x]&/@Rest[volumegrowth2]

Out[]=

FittedModel

0.728105+1.75795x

,FittedModel

1.28868+1.87645x

,FittedModel

1.67211+1.89333x

,FittedModel

1.99409+1.88895x

,FittedModel

2.02429+1.93711x

,FittedModel

2.2983+1.92735x

,FittedModel

2.41825+1.94095x

,FittedModel

2.5581+1.95153x

,FittedModel

2.67661+1.96051x



In[]:=

lmf5=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->1./Range[Length[Last[Last[#]]]-1]]&/@Rest[volumegrowth2]

Out[]=

FittedModel

1.43187+1.52056x

,FittedModel

1.69738+1.73741x

,FittedModel

1.90873+1.81304x

,FittedModel

2.24695+1.80356x

,FittedModel

2.26159+1.85708x

,FittedModel

2.58388+1.831x

,FittedModel

2.72684+1.83641x

,FittedModel

2.82212+1.8624x

,FittedModel

2.933+1.87403x



In[]:=

lmf6=LinearModelFit[Log@Rest[Last[Last[#]]],{1,x},x,Weights->(Boole[#>0&&#<51](1/#)&/@Range[Length[Last[Last[#]]]-1])]&/@Rest[volumegrowth2]

Out[]=

FittedModel

1.53783+1.42247x

,FittedModel

1.75669+1.68251x

,FittedModel

1.94225+1.78226x

,FittedModel

2.28371+1.76975x

,FittedModel

2.29612+1.82527x

,FittedModel

2.6249+1.79317x

,FittedModel

2.77136+1.79534x

,FittedModel

2.85994+1.82753x

,FittedModel

2.96963+1.84027x



In[]:=

Row[(<|Sequence@@MapIndexed[ToString[epsilons[6][[First[#2]+1]]]-><|"logC"->#1["ParameterTableEntries"][[1,1]],"n"->#1["ParameterTableEntries"][[2,1]]|>&,#]|>//Dataset)&/@{lmf4,lmf5,lmf6}]

Out[]=

	logC	n
0.00283333	0.728105	1.75795
0.00316667	1.28868	1.87645
0.0035	1.67211	1.89333
0.00383333	1.99409	1.88895
0.00416667	2.02429	1.93711
0.0045	2.2983	1.92735
0.00483333	2.41825	1.94095
0.00516667	2.5581	1.95153
0.0055	2.67661	1.96051

	logC	n
0.00283333	1.43187	1.52056
0.00316667	1.69738	1.73741
0.0035	1.90873	1.81304
0.00383333	2.24695	1.80356
0.00416667	2.26159	1.85708
0.0045	2.58388	1.831
0.00483333	2.72684	1.83641
0.00516667	2.82212	1.8624
0.0055	2.933	1.87403

	logC	n
0.00283333	1.53783	1.42247
0.00316667	1.75669	1.68251
0.0035	1.94225	1.78226
0.00383333	2.28371	1.76975
0.00416667	2.29612	1.82527
0.0045	2.6249	1.79317
0.00483333	2.77136	1.79534
0.00516667	2.85994	1.82753
0.0055	2.96963	1.84027

Obviously, we should repeat the computations with other configurations with

and

points and then take the average of the results. Once we are persuaded that the growth is quadratic, we can try to fit it on the theoretical model which should describe the growth particularly for small radii:

. In particular, b/a should be related to curvature; however, we notice that this quantity is highly dependent on the choice of ϵ.

In[]:=

curvfit=LinearModelFit[Take[Rest[Last[Last[#]]],40],{x^2,x^4},x]&/@Rest[volumegrowth2]

Out[]=

FittedModel

12.39+0.789078

-2.79096×

-6

,FittedModel

10.1647+2.32408

-0.000115453

,FittedModel

35.9655+3.54349

-0.0000313368

,FittedModel

52.8298+4.83033

-0.000104109

,FittedModel

28.42+5.86713

-0.0000790119

,FittedModel

32.012+7.46574

-0.000118308

,FittedModel

52.2535+8.56901

+0.0000573393

,FittedModel

39.8055+10.364

+0.0000471875

,FittedModel

41.407+11.9841

+0.0000919007



In[]:=

(<|Sequence@@MapIndexed[ToString[epsilons[6][[First[#2]+1]]]-><|"a"->#1["ParameterTableEntries"][[2,1]],"b"->#1["ParameterTableEntries"][[3,1]],"est.curv"->-12*#1["ParameterTableEntries"][[3,1]]/#1["ParameterTableEntries"][[2,1]]|>&,curvfit]|>//Dataset)

Out[]=

	a	b	est.curv
0.00283333	0.789078	-0.00000279096	0.0000424439
0.00316667	2.32408	-0.000115453	0.000596124
0.0035	3.54349	-0.0000313368	0.000106122
0.00383333	4.83033	-0.000104109	0.000258638
0.00416667	5.86713	-0.0000790119	0.000161603
0.0045	7.46574	-0.000118308	0.000190161
0.00483333	8.56901	0.0000573393	-0.0000802976
0.00516667	10.364	0.0000471875	-0.000054636
0.0055	11.9841	0.0000919007	-0.000092023

We see that the b terms are not statistically relevant and their sign changes as well.

Other Variations

The previous computations need to be extended to the graphs in the Embedded space and to KNN-Graphs.

Shrinking Neighborhoods of a Point

The other way of studying the asymptotic of the volume of geodesic balls around a point is to fix the radius of the ball and increase the total number of points.

This function computes, using VertexComponent, the volume of a ball of fixed radius in a ϵ-graph with

c10

points.

In[]:=

ball[scale_Integer/;scale>0,coeff_Integer/;coeff>0,eps_Real/;eps>0,radius_Integer/;radius>0]:=With[{graph=NearestNeighborGraph[genRand[coeff10^scale],{All,eps}]},{VertexCount[graph],eps,ConnectedGraphQ[graph],Mean@VertexDegree[graph]//N,radius,Length@VertexComponent[graph,Nearest[VertexList[graph],{0.,0.},1],radius]}];

In[]:=

ρ=10;

shrinking=Outer[ball[#1,#2,2.7/Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

shrinking2=Outer[ball[#1,#2,6.2/Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

shrinking3=Outer[ball[#1,#2,14./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

shrinking4=Outer[ball[#1,#2,8./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

shrinking5=Outer[ball[#1,#2,10./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

shrinking6=Outer[ball[#1,#2,12./Sqrt[5#210^#1],ρ]&,Range[3,5],Range[10]];

Again, we varied ϵ, in order to observe various geometric behaviors; the output of the function ball gives us a number of informations, from connectedness to average degree. For the time being, we turn our attention to the volumes of the balls.

In[]:=

GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking,{2}]]//N,ImageSize->700]

Out[]=

As can easily be inferred by these graphs, we need more datapoints and more precision, in order to apply meaningfully any fitting technique. We can however try and change the ϵ, using another dataset. The next three plots are from a list of graphs with average vertex degree near 6.

In[]:=

GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking2,{2}]]//N,ImageSize->700]

Out[]=

The following are from a list of graph with very high ϵ, so connected and with high average vertex degree. The fact that the points do not go toward the origin is due to the high number of neighbors of each point, which is non-negligible with respect to the total number of points.

In[]:=

GraphicsColumn[MapIndexed[ListPlot[#1,PlotLabel->StringJoin[{"Scala ",ToString[First[#2]+2]}]]&,Map[{1/Sqrt[First[#]],Last[#]/First[#]}&,shrinking3,{2}]]//N,ImageSize->700]

Out[]=

Varying the Manifold

An unavoidable experiment is to reproduce part of our results on another manifold.

In[]:=

k=3;n=2;ℛ=Rectangle[{-1,-1},{1,1}];uservars={x,y};(*variablesusedtodescribeuandanysubregionofℛasanImplicitRegion*)u[x_,y_]:={x,y,x^4+x^2y^2+y^2(1/2-y^2)}

In[]:=

volM=NIntegrate[callvolmeasureM[Sequence@@variables],variables∈ℛ]volR=RegionMeasure[ℛ,n]

Out[]=

8.08323

Out[]=

In[]:=

myPoissonPointProcess=InhomogeneousPoissonPointProcess[callvolmeasureM,n];correction=1./(volRMeanPointDensity[{myPoissonPointProcess,ℛ}])

Out[]=

0.125031

Map[ListPointPlot3D[cu@@@#,AspectRatio1.2]&,genRand[10^#,3]&/@Range[3,5],{2}];

In[]:=

GraphicsGrid[%197,ImageSize->1000]

Out[]=

In[]:=

ParametricPlot3D[u[x,y],{x,y}∈ℛ,PlotTheme"Business"]

Out[]=

Here we see that the resemblance is less striking, due to some more curvature. We can run again the same tests on the volumes.

In[]:=

Out[]=

	20 iter	40 iter	60 iter
3	0.0406887	0.0385971	0.0389424
4	0.039039	0.0389974	0.0390825
5	0.0392972	0.0392992	0.0393523
3	0.000442105	0.00071688	0.000917459
4	0.00096664	0.000983925	0.000954634
5	0.00098721	0.00097052	0.000969554
3	0.00560194	0.0059553	0.00511469
4	0.0056521	0.00582957	0.00567149
5	0.00582025	0.00587436	0.00582831

In[]:=

NIntegrate[callvolmeasureM[Sequence@@variables],variables∈#]/volM&/@

Out[]=

{0.0392378,0.000966818,0.00582836}

Simulations for increasing balls and shrinking neighborhoods can be performed as well.

Final Remarks

We have investigated the process of creating a graph from random points on a manifold, highlighting the good properties of approximation of the volume in relation with the number of points in our sample, but also in relation with the parameters employed to create the graph. We want to mention three outcomes of our investigation:

The volume approximation is consistent and increases with the number of points, only slightly influenced by singularities, pinching of curvature and other phenomena.

The estimation of dimension from the growth of the volumes of geodesic balls has a very slow convergence (if it converges), but it seems robust against variations of the parameters of the construction (ϵ, k).

The increasing balls volumes and the shrinking neighborhoods volumes are too sensitive to variations in the parameters to give reliable informations on second order corrections, like scalar curvature: actually, it makes sense that varying ϵ may result in a variation of the scalar curvature of the graph, as it gets more and more different from the geometric structure of the manifold.

Features that would be nice to have

◼

More tests should be run on different manifolds, varying also the dimensions of the parameter and the embedding spaces.

◼

What is missing here is a way to compute geodesic distances on the manifold, in order to compare the results about graph-geodesic balls to the behavior of the Riemannian geodesic balls on the underlying manifold.

◼

Estimates on the number of datapoints that could give reliable estimates on the second order term of the volume as the neighborhoods approach the point.

◼

An implementation of the Ricci-Ollivier curvature, to be compared with the sectorial growth of volumes (as in the Riemannian case).

Cite this as: Samuele Mongodi, "Asymptotics of the Volume of Manifolds and Graphs" from the Notebook Archive (2021), https://notebookarchive.org/2021-07-62urncv