networkinference.utils.tools.FakeData
- class networkinference.utils.tools.FakeData[source]
Bases:
objectMethods for simulating data.
Methods
erdos_renyi([n, avg_deg, seed])Returns an Erdos-Renyi graph on n nodes with linking probability avg_deg / n.
ipw([n, network, avg_deg, p, seed])Returns data (Y, ind1, ind2, pscores1, pscores2, A) to input into IPW class.
linear_in_means(A[, theta])Returns a scalar outcome generated from a linear-in-means model with scalar covariate.
ols([n, network, avg_deg, seed])Returns data (Y, X, A) to input into OLS class.
random_geometric([n, avg_deg, seed])Returns a random geometric graph on n nodes.
tsls([n, network, avg_deg, seed])Returns data (Y, X, W, A) to input into TSLS class.
- static erdos_renyi(n=500, avg_deg=5, seed=None)[source]
Returns an Erdos-Renyi graph on n nodes with linking probability avg_deg / n. Just a wrapper for a NetworkX function.
- Parameters
- nint
Number of nodes. Default value: 500.
- avg_degint
Desired average degree of output graph. Default value: 5.
- seedint
Seed for random links. Set to None to not set a seed. Default value: None.
- Returns
- NetworkX graph
Undirected and unweighted graph on n nodes.
Examples
>>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi()
- static ipw(n=500, network='RGG', avg_deg=5, p=0.15, seed=None)[source]
Returns data (Y, ind1, ind2, pscores1, pscores2, A) to input into IPW class. Outcome model is
\[Y_i = \left( \beta_i + \frac{\sum_{j=1}^n A_{ij} \beta_j}{\sum_{j=1}^n A_{ij}} \right) + \mathbf{1}\left\{\sum_{j=1}^n A_{ij} D_j > 0\right\} + \left( \varepsilon_i + \frac{\sum_{j=1}^n A_{ij} \varepsilon_j}{\sum_{j=1}^n A_{ij}} \right)\]where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(\beta_i \stackrel{iid}\sim \mathcal{N}(1,1)\), \(\varepsilon_i\) is i.i.d. standard normal, and \(D_i\) is i.i.d. Bernoulli with success probability p.
- Parameters
- nint
Number of observations. Default value: 500.
- networkstring
Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.
- avg_degint
Desired average degree of output graph. Default value: 5.
- pfloat [0,1]
Treatment probability.
- seedint
Seed for randomness. Set to None to not set a seed. Default value: None.
- Returns
- Ynumpy array
n-dimensional array of outcomes generated from a linear-in-means model.
- ind1numpy int array
n-dimensional array of indicators for having at least one treated friend.
- ind2numpy int array
n-dimensional array of indicators for having no treated friends.
- pscores1numpy float array
n-dimensional array of probabilities of having at least one treated friend.
- pscores2numpy float array
n-dimensional array of probabilities of having no treated friends.
- ANetworkX graph
Undirected, unweighted graph on n nodes.
Examples
>>> from networkinference.utils import FakeData >>> Y, ind1, ind2, pscores1, pscores2, A = FakeData.ipw()
- static linear_in_means(A, theta=array([1., 0.5, 3., 1.]))[source]
Returns a scalar outcome generated from a linear-in-means model with scalar covariate. Covariates and errors are i.i.d. standard normal. Outcomes are generated from the following linear-in-means model:
\[Y_i = \alpha + \beta \frac{\sum_{j=1}^n A_{ij} Y_j}{\sum_{j=1}^n A_{ij}} + \delta \frac{\sum_{j=1}^n A_{ij} X_j}{\sum_{j=1}^n A_{ij}} + \gamma X_i + \varepsilon_i,\]where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(\alpha\) is the intercept, \(\beta\) the endogenous peer effect, and \(\delta\) the exogenous peer effect. The default parameter values are 1, 0.5, 3, and 1, respectively.
- Parameters
- ANetworkX graph
Network on n units. Can be weighted or directed.
- thetanumpy array
4-dimensional array of model parameters. Default value: np.array([1,0.5,3,1]).
- Returns
- Ynumpy array
n-dimensional array of outcomes.
- Xnumpy array
n-dimensional array of covariates.
Examples
>>> from networkinference.utils import FakeData >>> A = FakeData.erdos_renyi() >>> Y, X = FakeData.linear_in_means(A)
- static ols(n=500, network='RGG', avg_deg=5, seed=None)[source]
Returns data (Y, X, A) to input into OLS class. Outcome model is
\[Y_i = 1 + \left( X_i + \frac{\sum_{j=1}^n A_{ij} X_j}{\sum_{j=1}^n A_{ij}} \right) + \left( \varepsilon_i + \frac{\sum_{j=1}^n A_{ij} \varepsilon_j}{\sum_{j=1}^n A_{ij}} \right)\]where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(X_i\), and \(\varepsilon_i\) are i.i.d. standard normal.
- Parameters
- nint
Number of observations. Default value: 500.
- networkstring
Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.
- avg_degint
Desired average degree of output graph. Default value: 5.
- seedint
Seed for randomness. Set to None to not set a seed. Default value: None.
- Returns
- Ynumpy array
n-dimensional array of outcomes.
- Xnumpy array
n-dimensional array of covariates.
- ANetworkX undirected graph
Undirected, unweighted network on n nodes.
Examples
>>> from networkinference.utils import FakeData >>> Y, X, A = FakeData.ols()
- static random_geometric(n=500, avg_deg=5, seed=None)[source]
Returns a random geometric graph on n nodes. Nodes are randomly positioned in \([0,1]^2\) and form links with all alters within a certain radius.
- Parameters
- nint
Number of nodes. Default value: 500.
- avg_degint
Desired average degree of output graph. Default value: 5.
- seedint
Seed for random positions. Set to None to not set a seed. Default value: None.
- Returns
- RGGNetworkX graph
Unweighted and undirected graph on n nodes.
Examples
>>> from networkinference.utils import FakeData >>> A = FakeData.random_geometric()
- static tsls(n=500, network='RGG', avg_deg=5, seed=None)[source]
Returns data (Y, X, W, A) to input into TSLS class. Outcomes are generated using the linear_in_means() method of this class. X = (average outcomes of neighbors, average covariate of neighbors, own covariate). W = (average covariate of friends of friends, average covariate of neighbors, own covariate).
- Parameters
- nint
Number of observations. Default value: 500.
- networkstring
Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.
- avg_degint
Desired average degree of output graph. Default value: 5.
- seedint
Seed for randomness. Set to None to not set a seed. Default value: None.
- Returns
- Ynumpy array
n-dimensional array of outcomes.
- Xnumpy array
n x 3 array of regressors. First column is average outcomes of network neighbors, second is average covariate of neighbors, third is own covariate, where covariates are binary.
- Wnumpy array
n x 3 array of instruments. First column is average covariate of friends of friends, second is average covariate of neighbors, third is own covariate, where covariates are binary.
- ANetworkX undirected graph
Undirected, unweighted network on n nodes.
Examples
>>> from networkinference.utils import FakeData >>> Y, X, W, A = FakeData.tsls()