networkinference.core

class networkinference.core.core[source]

Bases: object

Methods

arand_ci(Z, grid_start, grid_stop[, ...])

Returns confidence interval (CI) obtained by inverting an approximate randomization test [1].

arand_test(Z, mu[, seed])

Returns p-value and test statistic of approximate randomization test [1].

cluster_var(Z, clusters)

Returns conventional cluster-robust variance estimator.

conductance(clusters, A[, weight])

Returns maximal conductance of a set of clusters.

drobust_ci(Z, grid_start, grid_stop[, ...])

Returns confidence interval (CI) derived from the dependence-robust test due to [1].

drobust_test(Z, mu[, alpha, beta, R, L, seed])

Returns conclusion of dependence-robust test due to [1].

network_hac(Z, A[, b, disp])

Returns network HAC variance estimator due to [1] (also see [2]).

plot_spectrum(A[, giant, weight, ...])

Plots spectrum of the normalized Laplacian in a scatterplot and histogram.

spectral_clustering(num_clusters, A[, seed])

Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]).

spectrum(A[, giant, weight])

Returns spectrum of the normalized Laplacian.

sumstats(A[, decimals])

Prints table of network summary statistics.

trobust_ci(Z[, coverage, verbose])

Returns CI from the t-statistic based cluster-robust procedure due to [1].

static arand_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, seed=None)[source]

Returns confidence interval (CI) obtained by inverting an approximate randomization test [1]. If the result is a trivial interval, try increasing grid_size. The larger the dimension of Z (i.e. more clusters), the narrower the CI. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters
Znumpy array

q-dimensional array containing estimator for each of the q clusters.

grid_startfloat

Need to specify a grid of values over which to invert the test. This is the leftmost point of the grid.

grid_stopfloat

Rightmost point of the grid.

grid_sizeint

Number of points in the grid. Default value: 151.

coveragefloat

Desired coverage. Default value: 0.95.

seedint

Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.

Returns
list

Confidence interval.

References

1(1,2)

Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.arand_ci(Z, -2, 2)
static arand_test(Z, mu, seed=None)[source]

Returns p-value and test statistic of approximate randomization test [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters
Znumpy array

q-dimensional array of estimates, one for each of the q clusters.

mufloat

Scalar null-hypothesized value of the mean of Z.

seedint

Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.

Returns
p_valuefloat

P-value of test.

T_waldfloat

Test statistic.

References

1(1,2)

Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.arand_test(Z, 0)
static cluster_var(Z, clusters)[source]

Returns conventional cluster-robust variance estimator.

Parameters
Znumpy array

n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.

clustersnumpy array

n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.

Returns
numpy array

Variance-covariance matrix.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)
>>> var = ni.core.cluster_var(Z, clusters)
static conductance(clusters, A, weight=None)[source]

Returns maximal conductance of a set of clusters. For cluster-robust methods to work, conductance should be at most 0.1, as recommended by [1].

Parameters
clustersnumpy array

n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.

ANetworkX graph

Graph on n nodes. Can be weighted or directed.

weightstring

Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.

Returns
float

Maximal conductance of the clusters.

References

1

Leung, M., “Network Cluster-Robust Inference,” arXiv preprint arXiv:2103.01470, 2021.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)
>>> ni.core.conductance(clusters, A)
static drobust_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, beta=0.01, R=None, L=1000, seed=None)[source]

Returns confidence interval (CI) derived from the dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test. If the result is a trivial interval, try increasing grid_size.

Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].

Parameters
Znumpy array

n-dimensional array of scalar observations.

grid_startfloat

Need to specify a grid of values to test for inclusion in the CI. This is the leftmost point of the grid.

grid_stopfloat

Rightmost point of the grid.

grid_sizeint

Number of points in the grid. Default value: 151.

coveragefloat

Desired coverage. Default value: 0.95.

betafloat

beta in Remark 2 of Leung (2021). The closer this is to 1-coverage, the more conservative the CI. Default value: 0.01.

Rint

Number of resampling draws for test statistic. Uses default if R=None. Default value: None.

Lint

Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.

seedint

Seed for resampling draws. Set to None to not set a seed. Default value: None.

Returns
list

Confidence interval.

References

1(1,2,3)

Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.

2

Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> ni.core.drobust_ci(Z, -2, 2)
static drobust_test(Z, mu, alpha=0.05, beta=0.01, R=None, L=1000, seed=None)[source]

Returns conclusion of dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test.

Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].

Parameters
Znumpy array

n-dimensional array of scalar observations.

mufloat

Null hypothesis, e.g. the hypothesized mean of Z.

alphafloat

Significance level. Default value: 0.05.

betafloat

beta in Remark 2 of Leung (2021). The closer this is to alpha, the more conservative the test. Default value: 0.01.

Rint

Number of resampling draws for test statistic. Uses default if R=None. Default value: None.

Lint

Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.

seedint

Seed for resampling draws. Set to None to not set a seed. Default value: None.

Returns
string

Reject or not reject.

References

1(1,2,3)

Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.

2

Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> ni.core.drobust_test(Z, 0)
static network_hac(Z, A, b=None, disp=False)[source]

Returns network HAC variance estimator due to [1] (also see [2]). Setting b=0 and A = any value (e.g. None) outputs the conventional heteroskedasticity-robust variance estimator for i.i.d. data. Network is converted to an unweighted, undirected version by dropping edge weights and directionality of links.

Parameters
Znumpy array

n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.

ANetworkX graph

Graph on n nodes. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A, so that the data for node i is given by the ith component of Z.

bfloat

HAC bandwidth. Recommend keeping b=None, which uses the bandwidth choice recommended by [2]. Default value: None.

dispboolean

If False, the function only returns HAC. If True, the function returns (HAC, APL, b, PD_failure). Default value: False.

Returns
HACnumpy array

Estimate of variance-covariance matrix.

APLfloat

Average path length of A.

bint

Bandwidth.

PD_failureboolean

True if substitute positive definite variance estimator needed to be used.

References

1(1,2)

Kojevnikov, D., V. Marmer, and K. Song, “Limit Theorems for Network Dependent Random Variables,” Journal of Econometrics, 2021, 222 (2), 882-908.

2(1,2,3)

Leung, M. “Causal Inference Under Approximate Neighborhood Interference,” Econometrica (forthcoming), 2021.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> A = FakeData.erdos_renyi(500)
>>> HAC = ni.core.network_hac(Z, A)
static plot_spectrum(A, giant=True, weight=None, xlim_scat_buffer=0.03, ylim_scat_buffer=0.03, xticks_scat=3, yticks_scat=3, xticks_hist=3, binwidth=None, binrange=None, figsize=(10, 4), title_hist='Histogram', title_scat='Scatterplot', title_y='Eigenvalues', sns_style='dark')[source]

Plots spectrum of the normalized Laplacian in a scatterplot and histogram.

Parameters
ANetworkX graph

Can be weighted or directed.

giantboolean

Set to True to plot spectrum of the giant component. Set to False to plot spectrum of the full graph. Default value: True.

weightstring

Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.

xlim_scat_bufferfloat [0,1]

Larger value adds more whitespace before and after the leftmost and rightmost points of the scatterplot.

ylim_scat_bufferfloat [0,1]

Larger value adds more whitespace below and above the bottommost and topmost points of the scatterplot.

xticks_scatint

Number of tick marks on x-axis of scatterplot.

yticks_scatint

Number of tick marks of y-axis of scatterplot.

xticks_histint

Number of tick marks on x-axis of histogram.

binwidthint

Width of histogram bins.

binrangeint

Range of histogram bins.

figsizetuple of ints

Size of figure.

title_histstring

Title of histogram.

title_scatstring

Title of scatterplot.

title_ystring

Title of scatterplot y-axis.

sns_stylestring

Seaborn style of figures.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> plot_spectrum(A)
static spectral_clustering(num_clusters, A, seed=None)[source]

Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]). All nodes not in the giant component are grouped into a single cluster. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A.

Parameters
num_clustersint

Number of desired clusters in the giant component.

ANetworkX graph

Graph on n nodes. Can be weighted or directed.

seedint

Seed for k-means clustering initialization. Set to None to not set a seed. Default value: None.

Returns
numpy array

n-dimensional array of cluster labels from 0 to num_clusters-1

References

1(1,2)

Ng, A., M. Jordan, Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm.” Advances in Neural Information Processing Systems, 2002, 849-856.

2(1,2)

von Luxburg, U., “A Tutorial on Spectral Clustering,” Statistics and Computing, 2007, 17 (4), 395-416.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)
static spectrum(A, giant=True, weight=None)[source]

Returns spectrum of the normalized Laplacian.

Parameters
ANetworkX graph

Can be weighted or directed.

giantboolean

Set to True to only return spectrum of the giant component. Set to False to return spectrum of the full graph. Default value: True.

weightstring

Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.

Returns
numpy array

Eigenvalues.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> ivals = spectrum(A)
static sumstats(A, decimals=3)[source]

Prints table of network summary statistics.

Parameters
ANetworkX undirected, unweighted graph
decimalsint

Number of decimals to which to round the output.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> sumstats(A) 
static trobust_ci(Z, coverage=0.95, verbose=True)[source]

Returns CI from the t-statistic based cluster-robust procedure due to [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters
Znumpy array

q-dimensional array of estimates, one for each of the q clusters.

coveragefloat

Desired coverage. Default value: 0.95.

verboseboolean

If True, calling this function prints out the results. Default value: True.

Returns
CIlist

Confidence interval.

References

1(1,2)

Ibragimov, R. and U. Mueller, “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 2010, 28 (4), 453-468.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.trobust(Z)