networkinference.utils.tools.FakeData

class networkinference.utils.tools.FakeData[source]

Bases: object

Methods for simulating data.

Methods

erdos_renyi([n, avg_deg, seed])

Returns an Erdos-Renyi graph on n nodes with linking probability avg_deg / n.

ipw([n, network, avg_deg, p, seed])

Returns data (Y, ind1, ind2, pscores1, pscores2, A) to input into IPW class.

linear_in_means(A[, theta])

Returns a scalar outcome generated from a linear-in-means model with scalar covariate.

ols([n, network, avg_deg, seed])

Returns data (Y, X, A) to input into OLS class.

random_geometric([n, avg_deg, seed])

Returns a random geometric graph on n nodes.

tsls([n, network, avg_deg, seed])

Returns data (Y, X, W, A) to input into TSLS class.

static erdos_renyi(n=500, avg_deg=5, seed=None)[source]

Returns an Erdos-Renyi graph on n nodes with linking probability avg_deg / n. Just a wrapper for a NetworkX function.

Parameters
nint

Number of nodes. Default value: 500.

avg_degint

Desired average degree of output graph. Default value: 5.

seedint

Seed for random links. Set to None to not set a seed. Default value: None.

Returns
NetworkX graph

Undirected and unweighted graph on n nodes.

Examples

>>> from networkinference.utils import FakeData
>>> A = FakeData.erdos_renyi()
static ipw(n=500, network='RGG', avg_deg=5, p=0.15, seed=None)[source]

Returns data (Y, ind1, ind2, pscores1, pscores2, A) to input into IPW class. Outcome model is

\[Y_i = \left( \beta_i + \frac{\sum_{j=1}^n A_{ij} \beta_j}{\sum_{j=1}^n A_{ij}} \right) + \mathbf{1}\left\{\sum_{j=1}^n A_{ij} D_j > 0\right\} + \left( \varepsilon_i + \frac{\sum_{j=1}^n A_{ij} \varepsilon_j}{\sum_{j=1}^n A_{ij}} \right)\]

where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(\beta_i \stackrel{iid}\sim \mathcal{N}(1,1)\), \(\varepsilon_i\) is i.i.d. standard normal, and \(D_i\) is i.i.d. Bernoulli with success probability p.

Parameters
nint

Number of observations. Default value: 500.

networkstring

Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.

avg_degint

Desired average degree of output graph. Default value: 5.

pfloat [0,1]

Treatment probability.

seedint

Seed for randomness. Set to None to not set a seed. Default value: None.

Returns
Ynumpy array

n-dimensional array of outcomes generated from a linear-in-means model.

ind1numpy int array

n-dimensional array of indicators for having at least one treated friend.

ind2numpy int array

n-dimensional array of indicators for having no treated friends.

pscores1numpy float array

n-dimensional array of probabilities of having at least one treated friend.

pscores2numpy float array

n-dimensional array of probabilities of having no treated friends.

ANetworkX graph

Undirected, unweighted graph on n nodes.

Examples

>>> from networkinference.utils import FakeData
>>> Y, ind1, ind2, pscores1, pscores2, A = FakeData.ipw()
static linear_in_means(A, theta=array([1., 0.5, 3., 1.]))[source]

Returns a scalar outcome generated from a linear-in-means model with scalar covariate. Covariates and errors are i.i.d. standard normal. Outcomes are generated from the following linear-in-means model:

\[Y_i = \alpha + \beta \frac{\sum_{j=1}^n A_{ij} Y_j}{\sum_{j=1}^n A_{ij}} + \delta \frac{\sum_{j=1}^n A_{ij} X_j}{\sum_{j=1}^n A_{ij}} + \gamma X_i + \varepsilon_i,\]

where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(\alpha\) is the intercept, \(\beta\) the endogenous peer effect, and \(\delta\) the exogenous peer effect. The default parameter values are 1, 0.5, 3, and 1, respectively.

Parameters
ANetworkX graph

Network on n units. Can be weighted or directed.

thetanumpy array

4-dimensional array of model parameters. Default value: np.array([1,0.5,3,1]).

Returns
Ynumpy array

n-dimensional array of outcomes.

Xnumpy array

n-dimensional array of covariates.

Examples

>>> from networkinference.utils import FakeData 
>>> A = FakeData.erdos_renyi()
>>> Y, X = FakeData.linear_in_means(A)
static ols(n=500, network='RGG', avg_deg=5, seed=None)[source]

Returns data (Y, X, A) to input into OLS class. Outcome model is

\[Y_i = 1 + \left( X_i + \frac{\sum_{j=1}^n A_{ij} X_j}{\sum_{j=1}^n A_{ij}} \right) + \left( \varepsilon_i + \frac{\sum_{j=1}^n A_{ij} \varepsilon_j}{\sum_{j=1}^n A_{ij}} \right)\]

where \(A_{ij}\) is an indicator for whether nodes i and j are linked, \(X_i\), and \(\varepsilon_i\) are i.i.d. standard normal.

Parameters
nint

Number of observations. Default value: 500.

networkstring

Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.

avg_degint

Desired average degree of output graph. Default value: 5.

seedint

Seed for randomness. Set to None to not set a seed. Default value: None.

Returns
Ynumpy array

n-dimensional array of outcomes.

Xnumpy array

n-dimensional array of covariates.

ANetworkX undirected graph

Undirected, unweighted network on n nodes.

Examples

>>> from networkinference.utils import FakeData
>>> Y, X, A = FakeData.ols()
static random_geometric(n=500, avg_deg=5, seed=None)[source]

Returns a random geometric graph on n nodes. Nodes are randomly positioned in \([0,1]^2\) and form links with all alters within a certain radius.

Parameters
nint

Number of nodes. Default value: 500.

avg_degint

Desired average degree of output graph. Default value: 5.

seedint

Seed for random positions. Set to None to not set a seed. Default value: None.

Returns
RGGNetworkX graph

Unweighted and undirected graph on n nodes.

Examples

>>> from networkinference.utils import FakeData
>>> A = FakeData.random_geometric()
static tsls(n=500, network='RGG', avg_deg=5, seed=None)[source]

Returns data (Y, X, W, A) to input into TSLS class. Outcomes are generated using the linear_in_means() method of this class. X = (average outcomes of neighbors, average covariate of neighbors, own covariate). W = (average covariate of friends of friends, average covariate of neighbors, own covariate).

Parameters
nint

Number of observations. Default value: 500.

networkstring

Type of network to generate. Options: ‘ER’ (Erdos-Renyi) and ‘RGG’ (random geometric graph). Default value: ‘RGG’.

avg_degint

Desired average degree of output graph. Default value: 5.

seedint

Seed for randomness. Set to None to not set a seed. Default value: None.

Returns
Ynumpy array

n-dimensional array of outcomes.

Xnumpy array

n x 3 array of regressors. First column is average outcomes of network neighbors, second is average covariate of neighbors, third is own covariate, where covariates are binary.

Wnumpy array

n x 3 array of instruments. First column is average covariate of friends of friends, second is average covariate of neighbors, third is own covariate, where covariates are binary.

ANetworkX undirected graph

Undirected, unweighted network on n nodes.

Examples

>>> from networkinference.utils import FakeData
>>> Y, X, W, A = FakeData.tsls()