Statistics¶

netin.stats.distributions.fit_power_law(data: array | Set | List, discrete: bool = True, xmin: None | int | float = None, xmax: None | int | float = None, **kwargs) → Fit¶

Fits a power-law of a given distribution.

Parameters¶

data: Union[np.array, Set, List]: The data to fit.
discrete: bool: Whether the data is discrete or not.
xmin: Union[None, int, float]: The minimum value of the data.
xmax: Union[None, int, float]: The maximum value of the data.
kwargs: dict: Additional arguments to pass to the powerlaw.Fit constructor.

Returns¶

powerlaw.Fit: The fitted power-law.

netin.stats.distributions.get_ccdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶

Computes the complementary cumulative distribution CCDF of the input data.

Parameters¶

dfpd.DataFrame: DataFrame that contains the data.
xstr: The column name of the data.
totalfloat: The total amount by which to normalize the data.

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values and the y values (CCDF)

netin.stats.distributions.get_cdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶

Computes the cumulative distribution CDF of the input data.

Parameters¶

dfpd.DataFrame: DataFrame that contains the data.
xstr: The column name of the data.
totalfloat: The total amount by which to normalize the data. (not used here)

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values and the y values (CDF)

netin.stats.distributions.get_disparity(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶

Computes the disparity of the input data given by the column x.

Parameters¶

df: pd.DataFrame: DataFrame that contains the data.
x: str: The column name of the data.
total: float: The total amount by which to normalize the data. (not used here)

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values (ranking) and the y values (disparity)

netin.stats.distributions.get_fraction_of_minority(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶

Computes the fraction of minority in each top-k rank.

Parameters¶

df: pd.DataFrame: DataFrame that contains the data.
x: str: The column name of the data.
total: float: The total amount by which to normalize the data. (not used here)

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values (ranking) and the y values (fraction of minority)

netin.stats.distributions.get_gini_coefficient(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶

Computes the Gini coefficient of the distribution in each top-k rank.

Parameters¶

df: pd.DataFrame: DataFrame that contains the data.
x: str: The column name of the data.
total: float: The total amount by which to normalize the data. (not used here)

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values (ranking) and the y values (Gini coefficient)

netin.stats.distributions.get_pdf(df: DataFrame, x: str, total: float) → Tuple[ndarray, ndarray]¶

Computes the probability density of the input data.

Parameters¶

dfpd.DataFrame: DataFrame that contains the data.
xstr: The column name of the data.
totalfloat: The total amount by which to normalize the data.

Returns¶

Tuple[np.ndarray, np.ndarray]: Two arrays holding the x values and y values (their probability).

netin.stats.networks.get_average_degree(g: Graph | DiGraph) → float¶

Returns the average node degree of the graph.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the average degree for

Returns¶

float: Average degree of the graph

netin.stats.networks.get_average_degrees(g: Graph | DiGraph, class_attribute: str | None = None) → Tuple[float, float, float]¶

Computes and returns the average degree of the graph, the average degree of the majority and the minority class.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the average degree for
class_attribute: str: Name of the class attribute in the graph

Returns¶

Tuple[float, float, float]: Average degree of the graph, the average degree of the majority and the minority class

netin.stats.networks.get_edge_type_counts(g: Graph | DiGraph, fractions: bool = False, class_attribute: str | None = None) → Counter¶

Computes the edge type counts of the graph using the class_attribute of each node.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the edge type counts
fractions: bool: If True, the counts are returned as fractions of the total number of edges.
class_attribute: str: The name of the attribute that holds the class label of each node.

Returns¶

Counter: Counter holding the edge type counts

Notes¶

Class labels are assumed to be binary. The minority class is assumed to be labeled as 1.

netin.stats.networks.get_min_degree(g: Graph | DiGraph) → int¶

Returns the minimum degree of nodes in the graph.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the minimum degree

Returns¶

int: Minimum degree of nodes in the graph

netin.stats.networks.get_minority_fraction(g: Graph | DiGraph, class_attribute: str | None = None) → float¶

Computes the fraction of the minority class in the graph.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the fraction of the minority class

netin.stats.networks.get_node_attributes(g: Graph | DiGraph) → list¶

Returns the values of the class attribute for all nodes in the graph.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to get the node attributes from

Returns¶

list: List of node attributes

netin.stats.networks.get_similitude(g: Graph | DiGraph, class_attribute: str | None = None) → float¶

Computes and returns the fraction of same-class edges in the graph.

Parameters¶

g: Union[nx.Graph, nx.DiGraph]: Graph to compute the similitude for
class_attribute: str: Name of the class attribute in the graph

Returns¶

float: Fraction of same-class edges in the graph

netin.stats.ranking.get_fraction_of_minority_in_ranking(df: DataFrame, x: str) → Tuple[ndarray, ndarray] | Tuple[list, list]¶

Computes the fraction of minority in each top-k rank.

Parameters¶

df: pd.DataFrame: DataFrame that contains the data.
x: str: The column name of the data.

Returns¶

xs: np.ndarray: The x values (ranking).
ys: np.ndarray: The y values (fraction of minority).

netin.stats.ranking.get_gini_in_ranking(df: DataFrame, x: str) → Tuple[ndarray, ndarray] | Tuple[list, list]¶

Computes the Gini coefficient of a distribution df[x] in each top-k rank.

Parameters¶

df: pd.DataFrame: Dataframe that contains the data.
x: str: The column name of the data.

Returns¶

xs: np.ndarray: The x values (ranking).
ys: np.ndarray: The y values (Gini coefficients).

netin.stats.ranking.get_ranking_inequality(ys: array) → float¶

Returns the Gini coefficient of the entire distribution (at op-100%).

Parameters¶

ys: np.array: The y values (Gini coefficients in each top-k rank).

Returns¶

float: The Gini coefficient of the entire distribution (at op-100%).

netin.stats.ranking.get_ranking_inequality_class(gini_global: float, cuts: Set[float] = [0.3, 0.6]) → str¶

Infers the inequality class label given the Gini coefficient of the entire distribution.

Parameters¶

gini_global: float: The Gini coefficient of the entire distribution.
cuts: Set[float]: The cuts to determine the inequality class.

Returns¶

label: str: The inequality class label (i.e., equality, moderate, skewed)

Notes¶

By default, cuts={0.3, 0.6}, see [Espin-Noboa2022].

netin.stats.ranking.get_ranking_inequity(f_m: float, ys: array) → float¶

Computes ME: mean error distance between the fraction of minority in each top-k rank f_m^k and the fraction of minority of the entire graph f_m. ME is the ranking inequity of the rank.

Parameters¶

f_m: float: The fraction of minority in the entire graph.
ys: np.array: The fraction of minority in each top-k rank.

Returns¶

me: float: The ranking inequity of the rank.

netin.stats.ranking.get_ranking_inequity_class(me: float, beta: float | None = None) → str¶

Infers the inequity class (label) given the inequity measure (ME).

Parameters¶

me: float: The inequity measure (ME).
beta: float: The threshold to determine the inequity class.

Returns¶

label: str: The inequity class label (i.e., fair, over-represented, under-represented).

Notes¶

See get_ranking_inequity() for more details on me.

By default, beta=0.05, see [Espin-Noboa2022].

netin.stats.ranking.gini(data: array) → float¶

Calculates the Gini coefficient of a distribution.

Parameters¶

data: np.array: The data.

Returns¶

float: The Gini coefficient of the distribution.

References¶

Gini coefficient Implementation