Statistics¶
- netin.stats.distributions.fit_power_law(data: array | Set | List, discrete: bool = True, xmin: None | int | float = None, xmax: None | int | float = None, **kwargs) Fit ¶
Fits a power-law of a given distribution.
Parameters¶
- data: Union[np.array, Set, List]
The data to fit.
- discrete: bool
Whether the data is discrete or not.
- xmin: Union[None, int, float]
The minimum value of the data.
- xmax: Union[None, int, float]
The maximum value of the data.
- kwargs: dict
Additional arguments to pass to the powerlaw.Fit constructor.
Returns¶
- powerlaw.Fit
The fitted power-law.
- netin.stats.distributions.get_ccdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Computes the complementary cumulative distribution CCDF of the input data.
Parameters¶
- dfpd.DataFrame
DataFrame that contains the data.
- xstr
The column name of the data.
- totalfloat
The total amount by which to normalize the data.
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values and the y values (CCDF)
- netin.stats.distributions.get_cdf(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Computes the cumulative distribution CDF of the input data.
Parameters¶
- dfpd.DataFrame
DataFrame that contains the data.
- xstr
The column name of the data.
- totalfloat
The total amount by which to normalize the data. (not used here)
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values and the y values (CDF)
- netin.stats.distributions.get_disparity(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Computes the disparity of the input data given by the column x.
Parameters¶
- df: pd.DataFrame
DataFrame that contains the data.
- x: str
The column name of the data.
- total: float
The total amount by which to normalize the data. (not used here)
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values (ranking) and the y values (disparity)
- netin.stats.distributions.get_fraction_of_minority(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Computes the fraction of minority in each top-k rank.
Parameters¶
- df: pd.DataFrame
DataFrame that contains the data.
- x: str
The column name of the data.
- total: float
The total amount by which to normalize the data. (not used here)
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values (ranking) and the y values (fraction of minority)
- netin.stats.distributions.get_gini_coefficient(df: ~pandas.core.frame.DataFrame, x: str, total: float | None = None) -> (<class 'numpy.ndarray'>, <class 'numpy.ndarray'>)¶
Computes the Gini coefficient of the distribution in each top-k rank.
Parameters¶
- df: pd.DataFrame
DataFrame that contains the data.
- x: str
The column name of the data.
- total: float
The total amount by which to normalize the data. (not used here)
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values (ranking) and the y values (Gini coefficient)
- netin.stats.distributions.get_pdf(df: DataFrame, x: str, total: float) Tuple[ndarray, ndarray] ¶
Computes the probability density of the input data.
Parameters¶
- dfpd.DataFrame
DataFrame that contains the data.
- xstr
The column name of the data.
- totalfloat
The total amount by which to normalize the data.
Returns¶
- Tuple[np.ndarray, np.ndarray]
Two arrays holding the x values and y values (their probability).
- netin.stats.networks.get_average_degree(g: Graph | DiGraph) float ¶
Returns the average node degree of the graph.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the average degree for
Returns¶
- float
Average degree of the graph
- netin.stats.networks.get_average_degrees(g: Graph | DiGraph, class_attribute: str | None = None) Tuple[float, float, float] ¶
Computes and returns the average degree of the graph, the average degree of the majority and the minority class.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the average degree for
- class_attribute: str
Name of the class attribute in the graph
Returns¶
- Tuple[float, float, float]
Average degree of the graph, the average degree of the majority and the minority class
- netin.stats.networks.get_edge_type_counts(g: Graph | DiGraph, fractions: bool = False, class_attribute: str | None = None) Counter ¶
Computes the edge type counts of the graph using the class_attribute of each node.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the edge type counts
- fractions: bool
If True, the counts are returned as fractions of the total number of edges.
- class_attribute: str
The name of the attribute that holds the class label of each node.
Returns¶
- Counter
Counter holding the edge type counts
Notes¶
Class labels are assumed to be binary. The minority class is assumed to be labeled as 1.
- netin.stats.networks.get_min_degree(g: Graph | DiGraph) int ¶
Returns the minimum degree of nodes in the graph.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the minimum degree
Returns¶
- int
Minimum degree of nodes in the graph
- netin.stats.networks.get_minority_fraction(g: Graph | DiGraph, class_attribute: str | None = None) float ¶
Computes the fraction of the minority class in the graph.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the fraction of the minority class
- netin.stats.networks.get_node_attributes(g: Graph | DiGraph) list ¶
Returns the values of the class attribute for all nodes in the graph.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to get the node attributes from
Returns¶
- list
List of node attributes
- netin.stats.networks.get_similitude(g: Graph | DiGraph, class_attribute: str | None = None) float ¶
Computes and returns the fraction of same-class edges in the graph.
Parameters¶
- g: Union[nx.Graph, nx.DiGraph]
Graph to compute the similitude for
- class_attribute: str
Name of the class attribute in the graph
Returns¶
- float
Fraction of same-class edges in the graph
- netin.stats.ranking.get_fraction_of_minority_in_ranking(df: DataFrame, x: str) Tuple[ndarray, ndarray] | Tuple[list, list] ¶
Computes the fraction of minority in each top-k rank.
Parameters¶
- df: pd.DataFrame
DataFrame that contains the data.
- x: str
The column name of the data.
Returns¶
- xs: np.ndarray
The x values (ranking).
- ys: np.ndarray
The y values (fraction of minority).
- netin.stats.ranking.get_gini_in_ranking(df: DataFrame, x: str) Tuple[ndarray, ndarray] | Tuple[list, list] ¶
Computes the Gini coefficient of a distribution df[x] in each top-k rank.
Parameters¶
- df: pd.DataFrame
Dataframe that contains the data.
- x: str
The column name of the data.
Returns¶
- xs: np.ndarray
The x values (ranking).
- ys: np.ndarray
The y values (Gini coefficients).
- netin.stats.ranking.get_ranking_inequality(ys: array) float ¶
Returns the Gini coefficient of the entire distribution (at op-100%).
Parameters¶
- ys: np.array
The y values (Gini coefficients in each top-k rank).
Returns¶
- float
The Gini coefficient of the entire distribution (at op-100%).
- netin.stats.ranking.get_ranking_inequality_class(gini_global: float, cuts: Set[float] = [0.3, 0.6]) str ¶
Infers the inequality class label given the Gini coefficient of the entire distribution.
Parameters¶
- gini_global: float
The Gini coefficient of the entire distribution.
- cuts: Set[float]
The cuts to determine the inequality class.
Returns¶
- label: str
The inequality class label (i.e., equality, moderate, skewed)
Notes¶
By default, cuts={0.3, 0.6}, see [Espin-Noboa2022].
- netin.stats.ranking.get_ranking_inequity(f_m: float, ys: array) float ¶
Computes ME: mean error distance between the fraction of minority in each top-k rank f_m^k and the fraction of minority of the entire graph f_m. ME is the ranking inequity of the rank.
Parameters¶
- f_m: float
The fraction of minority in the entire graph.
- ys: np.array
The fraction of minority in each top-k rank.
Returns¶
- me: float
The ranking inequity of the rank.
- netin.stats.ranking.get_ranking_inequity_class(me: float, beta: float | None = None) str ¶
Infers the inequity class (label) given the inequity measure (ME).
Parameters¶
- me: float
The inequity measure (ME).
- beta: float
The threshold to determine the inequity class.
Returns¶
- label: str
The inequity class label (i.e., fair, over-represented, under-represented).
Notes¶
See
get_ranking_inequity()
for more details on me.By default, beta=0.05, see [Espin-Noboa2022].