networkinference.core
- class networkinference.core.core[source]
Bases:
objectMethods
arand_ci(Z, grid_start, grid_stop[, ...])Returns confidence interval (CI) obtained by inverting an approximate randomization test [1].
arand_test(Z, mu[, seed])Returns p-value and test statistic of approximate randomization test [1].
cluster_var(Z, clusters)Returns conventional cluster-robust variance estimator.
conductance(clusters, A[, weight])Returns maximal conductance of a set of clusters.
drobust_ci(Z, grid_start, grid_stop[, ...])Returns confidence interval (CI) derived from the dependence-robust test due to [1].
drobust_test(Z, mu[, alpha, beta, R, L, seed])Returns conclusion of dependence-robust test due to [1].
network_hac(Z, A[, b, disp])Returns network HAC variance estimator due to [1] (also see [2]).
plot_spectrum(A[, giant, weight, ...])Plots spectrum of the normalized Laplacian in a scatterplot and histogram.
spectral_clustering(num_clusters, A[, seed])Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]).
spectrum(A[, giant, weight])Returns spectrum of the normalized Laplacian.
sumstats(A[, decimals])Prints table of network summary statistics.
trobust_ci(Z[, coverage, verbose])Returns CI from the t-statistic based cluster-robust procedure due to [1].
- static arand_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, seed=None)[source]
Returns confidence interval (CI) obtained by inverting an approximate randomization test [1]. If the result is a trivial interval, try increasing grid_size. The larger the dimension of Z (i.e. more clusters), the narrower the CI. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- Znumpy array
q-dimensional array containing estimator for each of the q clusters.
- grid_startfloat
Need to specify a grid of values over which to invert the test. This is the leftmost point of the grid.
- grid_stopfloat
Rightmost point of the grid.
- grid_sizeint
Number of points in the grid. Default value: 151.
- coveragefloat
Desired coverage. Default value: 0.95.
- seedint
Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.
- Returns
- list
Confidence interval.
References
- 1(1,2)
Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.
Examples
>>> import networkinference as ni >>> import numpy as np >>> Z = np.random.normal(size=10) >>> ni.core.arand_ci(Z, -2, 2)
- static arand_test(Z, mu, seed=None)[source]
Returns p-value and test statistic of approximate randomization test [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- Znumpy array
q-dimensional array of estimates, one for each of the q clusters.
- mufloat
Scalar null-hypothesized value of the mean of Z.
- seedint
Seed for drawing permutations, which is only relevant when the dimension of Z exceeds 12. Set to None to not set a seed. Default value: None.
- Returns
- p_valuefloat
P-value of test.
- T_waldfloat
Test statistic.
References
- 1(1,2)
Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.
Examples
>>> import networkinference as ni >>> import numpy as np >>> Z = np.random.normal(size=10) >>> ni.core.arand_test(Z, 0)
- static cluster_var(Z, clusters)[source]
Returns conventional cluster-robust variance estimator.
- Parameters
- Znumpy array
n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.
- clustersnumpy array
n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.
- Returns
- numpy array
Variance-covariance matrix.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> import numpy as np >>> Z = np.random.normal(size=500) >>> A = FakeData.erdos_renyi(500) >>> clusters = ni.core.spectral_clustering(10, A) >>> var = ni.core.cluster_var(Z, clusters)
- static conductance(clusters, A, weight=None)[source]
Returns maximal conductance of a set of clusters. For cluster-robust methods to work, conductance should be at most 0.1, as recommended by [1].
- Parameters
- clustersnumpy array
n-dimensional array of cluster labels for all n nodes, assumed to be 0, …, L-1 where L is the number of clusters.
- ANetworkX graph
Graph on n nodes. Can be weighted or directed.
- weightstring
Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.
- Returns
- float
Maximal conductance of the clusters.
References
- 1
Leung, M., “Network Cluster-Robust Inference,” arXiv preprint arXiv:2103.01470, 2021.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi(500) >>> clusters = ni.core.spectral_clustering(10, A) >>> ni.core.conductance(clusters, A)
- static drobust_ci(Z, grid_start, grid_stop, grid_size=151, coverage=0.95, beta=0.01, R=None, L=1000, seed=None)[source]
Returns confidence interval (CI) derived from the dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test. If the result is a trivial interval, try increasing grid_size.
Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].
- Parameters
- Znumpy array
n-dimensional array of scalar observations.
- grid_startfloat
Need to specify a grid of values to test for inclusion in the CI. This is the leftmost point of the grid.
- grid_stopfloat
Rightmost point of the grid.
- grid_sizeint
Number of points in the grid. Default value: 151.
- coveragefloat
Desired coverage. Default value: 0.95.
- betafloat
beta in Remark 2 of Leung (2021). The closer this is to 1-coverage, the more conservative the CI. Default value: 0.01.
- Rint
Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
- Lint
Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
- seedint
Seed for resampling draws. Set to None to not set a seed. Default value: None.
- Returns
- list
Confidence interval.
References
- 1(1,2,3)
Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
- 2
Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.
Examples
>>> import networkinference as ni >>> import numpy as np >>> Z = np.random.normal(size=500) >>> ni.core.drobust_ci(Z, -2, 2)
- static drobust_test(Z, mu, alpha=0.05, beta=0.01, R=None, L=1000, seed=None)[source]
Returns conclusion of dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test.
Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].
- Parameters
- Znumpy array
n-dimensional array of scalar observations.
- mufloat
Null hypothesis, e.g. the hypothesized mean of Z.
- alphafloat
Significance level. Default value: 0.05.
- betafloat
beta in Remark 2 of Leung (2021). The closer this is to alpha, the more conservative the test. Default value: 0.01.
- Rint
Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
- Lint
Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
- seedint
Seed for resampling draws. Set to None to not set a seed. Default value: None.
- Returns
- string
Reject or not reject.
References
- 1(1,2,3)
Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
- 2
Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.
Examples
>>> import networkinference as ni >>> import numpy as np >>> Z = np.random.normal(size=500) >>> ni.core.drobust_test(Z, 0)
- static network_hac(Z, A, b=None, disp=False)[source]
Returns network HAC variance estimator due to [1] (also see [2]). Setting b=0 and A = any value (e.g. None) outputs the conventional heteroskedasticity-robust variance estimator for i.i.d. data. Network is converted to an unweighted, undirected version by dropping edge weights and directionality of links.
- Parameters
- Znumpy array
n-dimensional array of scalar observations or n x k matrix of n k-dimensional observations.
- ANetworkX graph
Graph on n nodes. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A, so that the data for node i is given by the ith component of Z.
- bfloat
HAC bandwidth. Recommend keeping b=None, which uses the bandwidth choice recommended by [2]. Default value: None.
- dispboolean
If False, the function only returns HAC. If True, the function returns (HAC, APL, b, PD_failure). Default value: False.
- Returns
- HACnumpy array
Estimate of variance-covariance matrix.
- APLfloat
Average path length of A.
- bint
Bandwidth.
- PD_failureboolean
True if substitute positive definite variance estimator needed to be used.
References
- 1(1,2)
Kojevnikov, D., V. Marmer, and K. Song, “Limit Theorems for Network Dependent Random Variables,” Journal of Econometrics, 2021, 222 (2), 882-908.
- 2(1,2,3)
Leung, M. “Causal Inference Under Approximate Neighborhood Interference,” Econometrica (forthcoming), 2021.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> import numpy as np >>> Z = np.random.normal(size=500) >>> A = FakeData.erdos_renyi(500) >>> HAC = ni.core.network_hac(Z, A)
- static plot_spectrum(A, giant=True, weight=None, xlim_scat_buffer=0.03, ylim_scat_buffer=0.03, xticks_scat=3, yticks_scat=3, xticks_hist=3, binwidth=None, binrange=None, figsize=(10, 4), title_hist='Histogram', title_scat='Scatterplot', title_y='Eigenvalues', sns_style='dark')[source]
Plots spectrum of the normalized Laplacian in a scatterplot and histogram.
- Parameters
- ANetworkX graph
Can be weighted or directed.
- giantboolean
Set to True to plot spectrum of the giant component. Set to False to plot spectrum of the full graph. Default value: True.
- weightstring
Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.
- xlim_scat_bufferfloat [0,1]
Larger value adds more whitespace before and after the leftmost and rightmost points of the scatterplot.
- ylim_scat_bufferfloat [0,1]
Larger value adds more whitespace below and above the bottommost and topmost points of the scatterplot.
- xticks_scatint
Number of tick marks on x-axis of scatterplot.
- yticks_scatint
Number of tick marks of y-axis of scatterplot.
- xticks_histint
Number of tick marks on x-axis of histogram.
- binwidthint
Width of histogram bins.
- binrangeint
Range of histogram bins.
- figsizetuple of ints
Size of figure.
- title_histstring
Title of histogram.
- title_scatstring
Title of scatterplot.
- title_ystring
Title of scatterplot y-axis.
- sns_stylestring
Seaborn style of figures.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi(500) >>> plot_spectrum(A)
- static spectral_clustering(num_clusters, A, seed=None)[source]
Returns network clusters obtained from normalized spectral clustering algorithm due to [1] (also see [2]). All nodes not in the giant component are grouped into a single cluster. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A.
- Parameters
- num_clustersint
Number of desired clusters in the giant component.
- ANetworkX graph
Graph on n nodes. Can be weighted or directed.
- seedint
Seed for k-means clustering initialization. Set to None to not set a seed. Default value: None.
- Returns
- numpy array
n-dimensional array of cluster labels from 0 to num_clusters-1
References
- 1(1,2)
Ng, A., M. Jordan, Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm.” Advances in Neural Information Processing Systems, 2002, 849-856.
- 2(1,2)
von Luxburg, U., “A Tutorial on Spectral Clustering,” Statistics and Computing, 2007, 17 (4), 395-416.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi(500) >>> clusters = ni.core.spectral_clustering(10, A)
- static spectrum(A, giant=True, weight=None)[source]
Returns spectrum of the normalized Laplacian.
- Parameters
- ANetworkX graph
Can be weighted or directed.
- giantboolean
Set to True to only return spectrum of the giant component. Set to False to return spectrum of the full graph. Default value: True.
- weightstring
Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.
- Returns
- numpy array
Eigenvalues.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi(500) >>> ivals = spectrum(A)
- static sumstats(A, decimals=3)[source]
Prints table of network summary statistics.
- Parameters
- ANetworkX undirected, unweighted graph
- decimalsint
Number of decimals to which to round the output.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi(500) >>> sumstats(A)
- static trobust_ci(Z, coverage=0.95, verbose=True)[source]
Returns CI from the t-statistic based cluster-robust procedure due to [1]. The larger the dimension of Z (i.e. more clusters), the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- Znumpy array
q-dimensional array of estimates, one for each of the q clusters.
- coveragefloat
Desired coverage. Default value: 0.95.
- verboseboolean
If True, calling this function prints out the results. Default value: True.
- Returns
- CIlist
Confidence interval.
References
- 1(1,2)
Ibragimov, R. and U. Mueller, “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 2010, 28 (4), 453-468.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> import numpy as np >>> Z = np.random.normal(size=10) >>> ni.core.trobust(Z)