Statistics

netin.stats.distributions.fit_power_law(data: array | Set | List, discrete: bool = True, xmin: None | int | float = None, xmax: None | int | float = None, **kwargs) Fit

Fits a power-law of a given distribution.

Parameters

data: Union[np.array, Set, List]

The data to fit.

discrete: bool

Whether the data is discrete or not.

xmin: Union[None, int, float]

The minimum value of the data.

xmax: Union[None, int, float]

The maximum value of the data.

kwargs: dict

Additional arguments to pass to the powerlaw.Fit constructor.

Returns

powerlaw.Fit

The fitted power-law.

netin.stats.distributions.get_ccdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Computes the complementary cumulative distribution CCDF of the input data.

Parameters

dfpd.DataFrame

DataFrame that contains the data.

xstr

The column name of the data.

totalfloat

The total amount by which to normalize the data.

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values and the y values (CCDF)

netin.stats.distributions.get_cdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Computes the cumulative distribution CDF of the input data.

Parameters

dfpd.DataFrame

DataFrame that contains the data.

xstr

The column name of the data.

totalfloat

The total amount by which to normalize the data. (not used here)

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values and the y values (CDF)

netin.stats.distributions.get_disparity(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Computes the disparity of the input data given by the column x.

Parameters

df: pd.DataFrame

DataFrame that contains the data.

x: str

The column name of the data.

total: float

The total amount by which to normalize the data. (not used here)

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values (ranking) and the y values (disparity)

netin.stats.distributions.get_fraction_of_minority(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Computes the fraction of minority in each top-k rank.

Parameters

df: pd.DataFrame

DataFrame that contains the data.

x: str

The column name of the data.

total: float

The total amount by which to normalize the data. (not used here)

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values (ranking) and the y values (fraction of minority)

netin.stats.distributions.get_gini_coefficient(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)

Computes the Gini coefficient of the distribution in each top-k rank.

Parameters

df: pd.DataFrame

DataFrame that contains the data.

x: str

The column name of the data.

total: float

The total amount by which to normalize the data. (not used here)

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values (ranking) and the y values (Gini coefficient)

netin.stats.distributions.get_pdf(df: DataFrame, x: str, total: float) Tuple[ndarray, ndarray]

Computes the probability density of the input data.

Parameters

dfpd.DataFrame

DataFrame that contains the data.

xstr

The column name of the data.

totalfloat

The total amount by which to normalize the data.

Returns

Tuple[np.ndarray, np.ndarray]

Two arrays holding the x values and y values (their probability).

netin.stats.networks.get_average_degree(g: Graph | DiGraph) float

Returns the average node degree of the graph.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the average degree for

Returns

float

Average degree of the graph

netin.stats.networks.get_average_degrees(g: Graph | DiGraph, class_attribute: str | None = None) Tuple[float, float, float]

Computes and returns the average degree of the graph, the average degree of the majority and the minority class.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the average degree for

class_attribute: str

Name of the class attribute in the graph

Returns

Tuple[float, float, float]

Average degree of the graph, the average degree of the majority and the minority class

netin.stats.networks.get_edge_type_counts(g: Graph | DiGraph, fractions: bool = False, class_attribute: str | None = None) Counter

Computes the edge type counts of the graph using the class_attribute of each node.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the edge type counts

fractions: bool

If True, the counts are returned as fractions of the total number of edges.

class_attribute: str

The name of the attribute that holds the class label of each node.

Returns

Counter

Counter holding the edge type counts

Notes

Class labels are assumed to be binary. The minority class is assumed to be labeled as 1.

netin.stats.networks.get_min_degree(g: Graph | DiGraph) int

Returns the minimum degree of nodes in the graph.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the minimum degree

Returns

int

Minimum degree of nodes in the graph

netin.stats.networks.get_minority_fraction(g: Graph | DiGraph, class_attribute: str | None = None) float

Computes the fraction of the minority class in the graph.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the fraction of the minority class

netin.stats.networks.get_node_attributes(g: Graph | DiGraph) list

Returns the values of the class attribute for all nodes in the graph.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to get the node attributes from

Returns

list

List of node attributes

netin.stats.networks.get_similitude(g: Graph | DiGraph, class_attribute: str | None = None) float

Computes and returns the fraction of same-class edges in the graph.

Parameters

g: Union[nx.Graph, nx.DiGraph]

Graph to compute the similitude for

class_attribute: str

Name of the class attribute in the graph

Returns

float

Fraction of same-class edges in the graph

netin.stats.ranking.get_fraction_of_minority_in_ranking(df: DataFrame, x: str) Tuple[ndarray, ndarray] | Tuple[list, list]

Computes the fraction of minority in each top-k rank.

Parameters

df: pd.DataFrame

DataFrame that contains the data.

x: str

The column name of the data.

Returns

xs: np.ndarray

The x values (ranking).

ys: np.ndarray

The y values (fraction of minority).

netin.stats.ranking.get_gini_in_ranking(df: DataFrame, x: str) Tuple[ndarray, ndarray] | Tuple[list, list]

Computes the Gini coefficient of a distribution df[x] in each top-k rank.

Parameters

df: pd.DataFrame

Dataframe that contains the data.

x: str

The column name of the data.

Returns

xs: np.ndarray

The x values (ranking).

ys: np.ndarray

The y values (Gini coefficients).

netin.stats.ranking.get_ranking_inequality(ys: array) float

Returns the Gini coefficient of the entire distribution (at op-100%).

Parameters

ys: np.array

The y values (Gini coefficients in each top-k rank).

Returns

float

The Gini coefficient of the entire distribution (at op-100%).

netin.stats.ranking.get_ranking_inequality_class(gini_global: float, cuts: Set[float] = [0.3, 0.6]) str

Infers the inequality class label given the Gini coefficient of the entire distribution.

Parameters

gini_global: float

The Gini coefficient of the entire distribution.

cuts: Set[float]

The cuts to determine the inequality class.

Returns

label: str

The inequality class label (i.e., equality, moderate, skewed)

Notes

By default, cuts={0.3, 0.6}, see [Espin-Noboa2022].

netin.stats.ranking.get_ranking_inequity(f_m: float, ys: array) float

Computes ME: mean error distance between the fraction of minority in each top-k rank f_m^k and the fraction of minority of the entire graph f_m. ME is the ranking inequity of the rank.

Parameters

f_m: float

The fraction of minority in the entire graph.

ys: np.array

The fraction of minority in each top-k rank.

Returns

me: float

The ranking inequity of the rank.

netin.stats.ranking.get_ranking_inequity_class(me: float, beta: float | None = None) str

Infers the inequity class (label) given the inequity measure (ME).

Parameters

me: float

The inequity measure (ME).

beta: float

The threshold to determine the inequity class.

Returns

label: str

The inequity class label (i.e., fair, over-represented, under-represented).

Notes

See get_ranking_inequity() for more details on me.

By default, beta=0.05, see [Espin-Noboa2022].

netin.stats.ranking.gini(data: array) float

Calculates the Gini coefficient of a distribution.

Parameters

data: np.array

The data.

Returns

float

The Gini coefficient of the distribution.

References

Gini coefficient Implementation