networkinference.OLS
- class networkinference.OLS(Y, X, A=None)[source]
Bases:
objectOLS estimator.
- Parameters
- Ynumpy float array
n-dimensional array of outcomes.
- Xnumpy float array
n x k array of regressors (not including intercept) or n-dimensional array.
- ANetworkX graph
Graph on n nodes. NOTE: Assumes nodes are labeled as integers 0, …, n-1 in A, so that the outcome of node i is given by the ith component of Y. Network can be weighted or directed, although weights and directions are ignored when computing network SEs. Argument not used for dependence robust test or CI. Default value: None.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols() >>> ols_model = ni.OLS(Y, X, A) >>> print(ols_model.estimate)
- Attributes
- datadictionary
Stores all input data, adding a column of ones to X.
- summandsnumpy array
n-dimensional array of intermediate products used to compute OLS estimator.
- estimatefloat
OLS estimator.
- residnumpy array
Regression residuals.
- invhessiannumpy array
Inverse hessian matrix.
- scoresnumpy array
Regression scores.
Methods
arand_ci(grid_start, grid_stop[, dimension, ...])Returns confidence interval (CI) obtained by inverting an approximate randomization test [1].
arand_test(mu[, dimension, num_clusters, ...])Returns p-value of approximate randomization test [1].
cluster_se([num_clusters, decimals, verbose])Returns clustered standard errors.
drobust_ci(grid_start, grid_stop[, ...])Returns confidence interval (CI) derived from the dependence-robust test due to [1].
drobust_test(mu[, dimension, alpha, beta, ...])Returns conclusion of dependence-robust test due to [1].
est_by_cluster(dimension)Returns array of OLS estimates, one for each cluster.
get_clusters(num_clusters[, clusters, seed, ...])Returns network clusters obtained from normalized spectral clustering algorithm due to [2] (also see [3]).
network_se([b, decimals, verbose, PD_alert])Returns standard errors derived from network HAC variance estimator due to [1] using bandwidth suggested by [2].
trobust_ci([dimension, num_clusters, ...])Returns confidence interval (CI) from the t-statistic based cluster-robust procedure due to [1].
- arand_ci(grid_start, grid_stop, dimension=None, grid_size=151, coverage=0.95, num_clusters=5, decimals=3, seed=None, verbose=True)[source]
Returns confidence interval (CI) obtained by inverting an approximate randomization test [1]. If the result is a trivial interval, try increasing grid_size. The CI is narrower with more clusters. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- grid_startfloat
Need to specify a grid of values over which to invert the test. This is the leftmost point of the grid.
- grid_stopfloat
Rightmost point of the grid.
- dimensionint
Dimension of the estimand for which you want the CI. To generate a table of CIs for all dimensions, set dimension=None. Ignored if estimand is scalar. Default value: None.
- grid_sizeint
Number of points in the grid. Default value: 151.
- coveragefloat
Desired coverage. Default value: 0.95.
- num_clustersint
Ignored if get_clusters() was already run on this object. If it wasn’t, this calls the get_cluster() method, asking for this many clusters. Default value: 5.
- decimalsint
Number of decimals to which to round the output table.
- seedint
Seed for drawing permutations, which is only relevant when there are more than 12 clusters. Set to None to not set a seed. Default value: None.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
References
- 1(1,2)
Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols(network='RGG') >>> ols_model = ni.OLS(Y, X, A) >>> ols.get_clusters(10) >>> ols.arand_ci(-5, 5)
- Attributes
- arand_ci_resultlist
Confidence interval.
- arand_test(mu, dimension=0, num_clusters=5, seed=None, verbose=True)[source]
Returns p-value of approximate randomization test [1]. The test is more powerful with more clusters. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- dimensionint
Dimension of estimand being tested. Ignored if estimand is scalar. Default value: 0.
- mufloat
Null value of the estimand in the specified dimension.
- num_clustersint
Ignored if get_clusters() was already run on this object. If it wasn’t, this calls the get_cluster() method, asking for this many clusters. Default value: 5.
- seedint
Seed for drawing permutations, which is only relevant when there are more than 12 clusters. Set to None to not set a seed. Default value: None.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
References
- 1(1,2)
Canay, I., J. Romano, and A. Shaikh, “Randomization Tests Under an Approximate Symmetry Assumption,” Econometrica, 2017, 85 (3), 1013-1030.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols(network='RGG') >>> ols_model = ni.OLS(Y, X, A) >>> ols.get_clusters(10) >>> ols.arand_test(1, dimension=1)
- Attributes
- arand_test_resultfloat
P-value.
- arand_test_statfloat
Test statistic.
- cluster_se(num_clusters=30, decimals=3, verbose=True)[source]
Returns clustered standard errors.
- Parameters
- num_clustersint
Ignored if get_clusters() was already run on this object. If it wasn’t, this calls the get_cluster() method, asking for this many clusters. Default value: 30.
- decimalsint
Number of decimals to which to round the output table.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols(network='RGG') >>> ols_model = ni.OLS(Y, X, A) >>> ols.get_clusters(30) >>> ols.cluster_se()
- Attributes
- cluster_se_vcovfloat
Cluster-robust variance estimate.
- cluster_se_resultfloat
Clustered standard errors.
- drobust_ci(grid_start, grid_stop, dimension=None, grid_size=151, coverage=0.95, beta=0.01, R=None, L=1000, seed=None, decimals=3, verbose=True)[source]
Returns confidence interval (CI) derived from the dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test. If the result is a trivial interval, try increasing grid_size.
Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].
- Parameters
- grid_startfloat
Need to specify a grid of values to test for inclusion in the CI. This is the leftmost point of the grid.
- grid_stopfloat
Rightmost point of the grid.
- dimensionint
Dimension of the estimand for which you want the CI. Ignored if estimand is scalar. To generate a table of CIs for all dimensions, set dimension=None. Default value: None.
- grid_sizeint
Number of points in the grid. Default value: 151.
- coveragefloat
Desired coverage. Default value: 0.95.
- betafloat
beta in Remark 2 of Leung (2021). The closer this is to 1-coverage, the more conservative the CI. Default value: 0.01.
- Rint
Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
- Lint
Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
- seedint
seed for resampling draws. Set to None to not set a seed. Default value: None.
- decimalsint
Number of decimals to which to round the output table.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
References
- 1(1,2,3)
Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
- 2
Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols() >>> ols_model = ni.OLS(Y, X, A) >>> ols.drobust_ci(-5, 5)
- Attributes
- drobust_ci_resultlist
Confidence interval.
- drobust_test(mu, dimension=0, alpha=0.05, beta=0.01, R=None, L=1000, seed=None, verbose=True)[source]
Returns conclusion of dependence-robust test due to [1]. Note that the output of the test is random by nature. L is the number of simulation draws, and larger values reduce the random variation of the test.
Test is implemented using the U-type statistic and randomized confidence function approach due to [2] discussed in Remark 2 of [1].
- Parameters
- mufloat
Null value of the estimand in the specified dimension.
- dimensionint
Dimension of the estimand being tested. Ignored if estimand is scalar. Default value: 0.
- alphafloat
Significance level. Default value: 0.05.
- betafloat
beta in Remark 2 of Leung (2021). The closer this is to alpha, the more conservative the test. Default value: 0.01.
- Rint
Number of resampling draws for test statistic. Uses default if R=None. Default value: None.
- Lint
Number of resampling draws for randomized confidence function. The larger the value, the less random the output. Default value: 1000.
- seedint
seed for resampling draws. Set to None to not set a seed. Default value: None.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
References
- 1(1,2,3)
Leung, M. “Dependence-Robust Inference Using Resampled Statistics,” Journal of Applied Econometrics (forthcoming), 2021.
- 2
Song, K. “Ordering-Free Inference from Locally Dependent Data,” UBC working paper, 2016.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols() >>> ols_model = ni.OLS(Y, X, A) >>> ols.drobust_test(1, dimension=1)
- Attributes
- drobust_test_resultstring
Reject or not reject.
- est_by_cluster(dimension)[source]
Returns array of OLS estimates, one for each cluster. This is a helper method used by arand_test() and arand_ci().
- Parameters
- dimensionint
Dimension of estimand being tested. Ignore if estimand is scalar. Default value: 0.
- Returns
- thetahatnumpy array
L-dimensional array of OLS estimates, one for each of the L clusters.
- get_clusters(num_clusters, clusters=None, seed=None, weight=None, verbose=True)[source]
Returns network clusters obtained from normalized spectral clustering algorithm due to [2] (also see [3]). Returns maximal conductance of clusters, a [0,1]-measure of cluster quality that should be at most 0.1 for cluster-robust methods to have good performance (see [1]). All nodes not in the giant component are grouped into a single cluster.
- Parameters
- num_clustersint
Number of desired clusters in the giant component.
- seedint
Seed for k-means clustering initialization. Set to None to not set a seed. Default value: None.
- clustersnumpy array
Optional array of cluster memberships obtained from the output of this method or spectral_clustering() in the core class. The only purpose of this argument is to load clusters obtained elsewhere into the current object.
- weightstring
Specifies how edge weights are labeled in A, if A is a weighted graph. Default value: None.
- verboseboolean
When set to True, the method prints the maximal conductance of the clusters. Default value: True.
References
- 1
Leung, M., “Network Cluster-Robust Inference,” arXiv preprint arXiv:2103.01470, 2021.
- 2(1,2)
Ng, A., M. Jordan, Y. Weiss, “On Spectral Clustering: Analysis and an Algorithm.” Advances in Neural Information Processing Systems, 2002, 849-856.
- 3(1,2)
von Luxburg, U., “A Tutorial on Spectral Clustering,” Statistics and Computing, 2007, 17 (4), 395-416.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols(network='RGG') >>> ols_model = ni.OLS(Y, X, A) >>> ols.get_clusters(10)
- Attributes
- clustersnumpy array
n-dimensional array of cluster labels from 0 to num_clusters-1, where n is the number of nodes.
- conductancefloat
Maximal conductance of the clusters.
- network_se(b=None, decimals=3, verbose=True, PD_alert=False)[source]
Returns standard errors derived from network HAC variance estimator due to [1] using bandwidth suggested by [2]. Setting b=0 outputs the conventional heteroskedasticity-robust variance estimator for i.i.d. data. Network is converted to an unweighted, undirected version by dropping edge weights and directionality of links.
The default output uses a uniform kernel. If the result is not positive definite, the output is an estimator guaranteed to be positive definite due to the first working paper version of [2].
- Parameters
- bfloat
HAC bandwidth. Recommend keeping b=None, which uses the bandwidth choice recommended by [2]. Default value: None.
- decimalsint
Number of decimals to which to round the output table.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
- PD_alertboolean
If True, method will print an alert whenever the default estimator is not positive definite.
References
- 1(1,2)
Kojevnikov, D., V. Marmer, and K. Song, “Limit Theorems for Network Dependent Random Variables,” Journal of Econometrics, 2021, 222 (2), 882-908.
- 2(1,2,3,4)
Leung, M. “Causal Inference Under Approximate Neighborhood Interference,” Econometrica (forthcoming), 2021.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols() >>> ols_model = ni.OLS(Y, X, A) >>> ols.network_se()
- Attributes
- network_se_vcovfloat
Estimate of variance-covariance matrix.
- network_se_resultfloat
Standard errors.
- trobust_ci(dimension=None, num_clusters=5, coverage=0.95, decimals=3, verbose=True)[source]
Returns confidence interval (CI) from the t-statistic based cluster-robust procedure due to [1]. The more clusters, the more powerful the test. However, since the test computes estimates cluster by cluster, the output can be more unstable with a larger number of clusters since the sample size within each cluster can be small.
- Parameters
- dimensionint
Dimension of the estimand for which you want the CI. Ignored if estimand is scalar. To generate a table of CIs for all dimensions, set dimension=None. Default value: None.
- num_clustersint
Ignored if get_clusters() was already run on this object. If it wasn’t, this calls the get_cluster() method, asking for this many clusters. Default value: 5.
- coveragefloat
Desired coverage. Default value: 0.95.
- decimalsint
Number of decimals to which to round the output table.
- verboseboolean
If True, calling this method prints out the results. Default value: True.
References
- 1(1,2)
Ibragimov, R. and U. Mueller, “t-Statistic Based Correlation and Heterogeneity Robust Inference,” Journal of Business and Economic Statistics, 2010, 28 (4), 453-468.
Examples
>>> import networkinference as ni >>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols(network='RGG') >>> ols_model = ni.OLS(Y, X, A) >>> ols.get_clusters(10) >>> ols.trobust_ci()
- Attributes
- trobust_ci_resultlist
Confidence interval.