networkinference.core

class networkinference.core.core[source]

Bases: object

Methods

`arand_ci`(Z, grid_start, grid_stop[, ...])	Returns confidence interval (CI) obtained by inverting an approximate randomization test [1].
`arand_test`(Z, mu[, seed])	Returns p-value and test statistic of approximate randomization test [1].
`cluster_var`(Z, clusters)	Returns conventional cluster-robust variance estimator.
`conductance`(clusters, A[, weight])	Returns maximal conductance of a set of clusters.
`drobust_ci`(Z, grid_start, grid_stop[, ...])	Returns confidence interval (CI) derived from the dependence-robust test due to [1].
`drobust_test`(Z, mu[, alpha, beta, R, L, seed])	Returns conclusion of dependence-robust test due to [1].
`network_hac`(Z, A[, b, disp])	Returns network HAC variance estimator due to [1] (also see [2]).
`plot_spectrum`(A[, giant, weight, ...])	Plots spectrum of the normalized Laplacian in a scatterplot and histogram.
`spectral_clustering`(num_clusters, A[, seed])	Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]).
`spectrum`(A[, giant, weight])	Returns spectrum of the normalized Laplacian.
`sumstats`(A[, decimals])	Prints table of network summary statistics.
`trobust_ci`(Z[, coverage, verbose])	Returns CI from the t-statistic based cluster-robust procedure due to [1].

static arand_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, seed=None)[source]

Returns confidence interval (CI) obtained by inverting an approximate randomization test [1]. If the result is a trivial interval, try increasing grid_size. The larger the dimension of Z (i.e. more clusters), the narrower the CI. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters

Znumpy array: q-dimensional array containing estimator for each of the q clusters.
grid_startfloat: Need to specify a grid of values over which to invert the test. This is the leftmost point of the grid.
grid_stopfloat: Rightmost point of the grid.
grid_sizeint: Number of points in the grid. Default value: 151.
coveragefloat: Desired coverage. Default value: 0.95.
seedint: Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.

Returns

list: Confidence interval.

References

1(1,2): Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.arand_ci(Z, -2, 2)

static arand_test(Z, mu, seed=None)[source]

Returns p-value and test statistic of approximate randomization test [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters

Znumpy array: q-dimensional array of estimates, one for each of the q clusters.
mufloat: Scalar null-hypothesized value of the mean of Z.
seedint: Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.

Returns

p_valuefloat: P-value of test.
T_waldfloat: Test statistic.

References

1(1,2): Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.arand_test(Z, 0)

static cluster_var(Z, clusters)[source]

Returns conventional cluster-robust variance estimator.

Parameters

Znumpy array: n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.
clustersnumpy array: n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.

Returns

numpy array: Variance-covariance matrix.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)
>>> var = ni.core.cluster_var(Z, clusters)

static conductance(clusters, A, weight=None)[source]

Returns maximal conductance of a set of clusters. For cluster-robust methods to work, conductance should be at most 0.1, as recommended by [1].

Parameters

clustersnumpy array: n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.
ANetworkX graph: Graph on n nodes. Can be weighted or directed.
weightstring: Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.

Returns

float: Maximal conductance of the clusters.

References

1: Leung, M., “Network Cluster-Robust Inference,” arXiv preprint arXiv:2103.01470, 2021.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)
>>> ni.core.conductance(clusters, A)

static drobust_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, beta=0.01, R=None, L=1000, seed=None)[source]

Returns confidence interval (CI) derived from the dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test. If the result is a trivial interval, try increasing grid_size.

Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].

Parameters

Znumpy array: n-dimensional array of scalar observations.
grid_startfloat: Need to specify a grid of values to test for inclusion in the CI. This is the leftmost point of the grid.
grid_stopfloat: Rightmost point of the grid.
grid_sizeint: Number of points in the grid. Default value: 151.
coveragefloat: Desired coverage. Default value: 0.95.
betafloat: beta in Remark 2 of Leung (2021). The closer this is to 1-coverage, the more conservative the CI. Default value: 0.01.
Rint: Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
Lint: Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
seedint: Seed for resampling draws. Set to None to not set a seed. Default value: None.

Returns

list: Confidence interval.

References

1(1,2,3): Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
2: Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> ni.core.drobust_ci(Z, -2, 2)

static drobust_test(Z, mu, alpha=0.05, beta=0.01, R=None, L=1000, seed=None)[source]

Returns conclusion of dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test.

Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].

Parameters

Znumpy array: n-dimensional array of scalar observations.
mufloat: Null hypothesis, e.g. the hypothesized mean of Z.
alphafloat: Significance level. Default value: 0.05.
betafloat: beta in Remark 2 of Leung (2021). The closer this is to alpha, the more conservative the test. Default value: 0.01.
Rint: Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
Lint: Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
seedint: Seed for resampling draws. Set to None to not set a seed. Default value: None.

Returns

string: Reject or not reject.

References

1(1,2,3): Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
2: Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.

Examples

>>> import networkinference as ni
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> ni.core.drobust_test(Z, 0)

static network_hac(Z, A, b=None, disp=False)[source]

Returns network HAC variance estimator due to [1] (also see [2]). Setting b=0 and A = any value (e.g. None) outputs the conventional heteroskedasticity-robust variance estimator for i.i.d. data. Network is converted to an unweighted, undirected version by dropping edge weights and directionality of links.

Parameters

Znumpy array: n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.
ANetworkX graph: Graph on n nodes. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A, so that the data for node i is given by the ith component of Z.
bfloat: HAC bandwidth. Recommend keeping b=None, which uses the bandwidth choice recommended by [2]. Default value: None.
dispboolean: If False, the function only returns HAC. If True, the function returns (HAC, APL, b, PD_failure). Default value: False.

Returns

HACnumpy array: Estimate of variance-covariance matrix.
APLfloat: Average path length of A.
bint: Bandwidth.
PD_failureboolean: True if substitute positive definite variance estimator needed to be used.

References

1(1,2): Kojevnikov, D., V. Marmer, and K. Song, “Limit Theorems for Network Dependent Random Variables,” Journal of Econometrics, 2021, 222 (2), 882-908.
2(1,2,3): Leung, M. “Causal Inference Under Approximate Neighborhood Interference,” Econometrica (forthcoming), 2021.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=500)
>>> A = FakeData.erdos_renyi(500)
>>> HAC = ni.core.network_hac(Z, A)

static plot_spectrum(A, giant=True, weight=None, xlim_scat_buffer=0.03, ylim_scat_buffer=0.03, xticks_scat=3, yticks_scat=3, xticks_hist=3, binwidth=None, binrange=None, figsize=(10, 4), title_hist='Histogram', title_scat='Scatterplot', title_y='Eigenvalues', sns_style='dark')[source]

Plots spectrum of the normalized Laplacian in a scatterplot and histogram.

Parameters

ANetworkX graph: Can be weighted or directed.
giantboolean: Set to True to plot spectrum of the giant component. Set to False to plot spectrum of the full graph. Default value: True.
weightstring: Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.
xlim_scat_bufferfloat [0,1]: Larger value adds more whitespace before and after the leftmost and rightmost points of the scatterplot.
ylim_scat_bufferfloat [0,1]: Larger value adds more whitespace below and above the bottommost and topmost points of the scatterplot.
xticks_scatint: Number of tick marks on x-axis of scatterplot.
yticks_scatint: Number of tick marks of y-axis of scatterplot.
xticks_histint: Number of tick marks on x-axis of histogram.
binwidthint: Width of histogram bins.
binrangeint: Range of histogram bins.
figsizetuple of ints: Size of figure.
title_histstring: Title of histogram.
title_scatstring: Title of scatterplot.
title_ystring: Title of scatterplot y-axis.
sns_stylestring: Seaborn style of figures.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> plot_spectrum(A)

static spectral_clustering(num_clusters, A, seed=None)[source]

Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]). All nodes not in the giant component are grouped into a single cluster. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A.

Parameters

num_clustersint: Number of desired clusters in the giant component.
ANetworkX graph: Graph on n nodes. Can be weighted or directed.
seedint: Seed for k-means clustering initialization. Set to None to not set a seed. Default value: None.

Returns

numpy array: n-dimensional array of cluster labels from 0 to num_clusters-1

References

1(1,2): Ng, A., M. Jordan, Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm.” Advances in Neural Information Processing Systems, 2002, 849-856.
2(1,2): von Luxburg, U., “A Tutorial on Spectral Clustering,” Statistics and Computing, 2007, 17 (4), 395-416.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> clusters = ni.core.spectral_clustering(10, A)

static spectrum(A, giant=True, weight=None)[source]

Returns spectrum of the normalized Laplacian.

Parameters

ANetworkX graph: Can be weighted or directed.
giantboolean: Set to True to only return spectrum of the giant component. Set to False to return spectrum of the full graph. Default value: True.
weightstring: Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.

Returns

numpy array: Eigenvalues.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> ivals = spectrum(A)

static sumstats(A, decimals=3)[source]

Prints table of network summary statistics.

Parameters

ANetworkX undirected, unweighted graph
decimalsint: Number of decimals to which to round the output.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi(500)
>>> sumstats(A) 

static trobust_ci(Z, coverage=0.95, verbose=True)[source]

Returns CI from the t-statistic based cluster-robust procedure due to [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.

Parameters

Znumpy array: q-dimensional array of estimates, one for each of the q clusters.
coveragefloat: Desired coverage. Default value: 0.95.
verboseboolean: If True, calling this function prints out the results. Default value: True.

Returns

CIlist: Confidence interval.

References

1(1,2): Ibragimov, R. and U. Mueller, “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 2010, 28 (4), 453-468.

Examples

>>> import networkinference as ni
>>> from networkinference.utils import FakeData
>>> import numpy as np
>>> Z = np.random.normal(size=10)
>>> ni.core.trobust(Z)