Spaces:
Runtime error
Runtime error
Add application file
Browse files- README.md +121 -18
- app.py +68 -3
- ham.py +170 -0
- isco_hierachical_accuracy_v2.py +91 -61
- tests.py +128 -11
README.md
CHANGED
|
@@ -1,50 +1,153 @@
|
|
| 1 |
---
|
| 2 |
-
title:
|
| 3 |
datasets:
|
| 4 |
-
-
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
-
description: "
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 3.19.1
|
| 11 |
app_file: app.py
|
| 12 |
pinned: false
|
| 13 |
---
|
| 14 |
|
| 15 |
-
# Metric Card for
|
| 16 |
-
|
| 17 |
-
***Module Card Instructions:*** *Fill out the following subsections. Feel free to take a look at existing metric cards if you'd like examples.*
|
| 18 |
|
| 19 |
## Metric Description
|
| 20 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 21 |
|
| 22 |
## How to Use
|
| 23 |
-
*Give general statement of how to use the metric*
|
| 24 |
|
| 25 |
-
|
|
|
|
| 26 |
|
| 27 |
### Inputs
|
| 28 |
-
|
| 29 |
-
- **
|
|
|
|
| 30 |
|
| 31 |
### Output Values
|
| 32 |
|
| 33 |
-
|
| 34 |
|
| 35 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 36 |
|
| 37 |
#### Values from Popular Papers
|
| 38 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 39 |
|
| 40 |
### Examples
|
| 41 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 42 |
|
| 43 |
## Limitations and Bias
|
| 44 |
-
|
|
|
|
| 45 |
|
| 46 |
## Citation
|
| 47 |
-
|
|
|
|
|
|
|
|
|
|
| 48 |
|
| 49 |
## Further References
|
| 50 |
-
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
---
|
| 2 |
+
title: ISCO-08 Hierarchical Accuracy Measure v2
|
| 3 |
datasets:
|
| 4 |
+
- ICILS/multilingual_parental_occupations
|
| 5 |
tags:
|
| 6 |
- evaluate
|
| 7 |
- metric
|
| 8 |
+
description: "The ISCO-08 Hierarchical Accuracy Measure is an implementation of the measure described in [Functional Annotation of Genes Using Hierarchical Text Categorization](https://www.researchgate.net/publication/44046343_Functional_Annotation_of_Genes_Using_Hierarchical_Text_Categorization) (Kiritchenko, Svetlana and Famili, Fazel. 2005) applied to the ISCO-08 classification scheme by the International Labour Organization."
|
| 9 |
sdk: gradio
|
| 10 |
sdk_version: 3.19.1
|
| 11 |
app_file: app.py
|
| 12 |
pinned: false
|
| 13 |
---
|
| 14 |
|
| 15 |
+
# Metric Card for ISCO-08 Hierarchical Accuracy Measure
|
|
|
|
|
|
|
| 16 |
|
| 17 |
## Metric Description
|
| 18 |
+
|
| 19 |
+
ISCO‑08 is a four‑level taxonomy (Table 1). Correctly locating an occupation at the appropriate level—Major, Sub‑major, Minor or Unit—is essential, yet many practical systems make small but still useful errors (e.g. confusing two Unit groups within the same Minor group). The present metric extends classical precision, recall and *F*‑measure to this hierarchical setting, thereby providing a more nuanced assessment than flat accuracy.
|
| 20 |
+
|
| 21 |
+
|Digits|Group level |
|
| 22 |
+
|--|--|
|
| 23 |
+
| 1-digit | Major groups |
|
| 24 |
+
| 2-digits | Sub-major groups |
|
| 25 |
+
| 3-digits | Minor groups |
|
| 26 |
+
| 4-digits | Unit groups |
|
| 27 |
+
|
| 28 |
+

|
| 29 |
+
|
| 30 |
+
In this context, the hierarchical accuracy measure is specifically designed to evaluate classifications within this structured framework. It emphasizes the importance of precision in classifying occupations at the correct level of specificity:
|
| 31 |
+
|
| 32 |
+
1. **Major Groups (1-digit codes):** The broadest category that encompasses a wide range of occupations grouped based on their fundamental characteristic and role in the job market. Misclassification among different Major groups implies a fundamental misunderstanding of the occupational role.
|
| 33 |
+
|
| 34 |
+
2. **Sub-major Groups (2-digit codes):** Provide a more detailed classification within each Major group, delineating categories that share specific occupational characteristics. Errors within a Major group but across different Sub-major groups are less severe than those across Major groups but are still significant due to the deviation from the correct occupational category.
|
| 35 |
+
|
| 36 |
+
3. **Minor Groups (3-digit codes):** Offer further granularity within Sub-major groups, classifying occupations that require similar skill sets and qualifications. Misclassifications within a Sub-major group but across different Minor groups are penalized, yet to a lesser extent than errors across Sub-major or Major groups, as they still fall within a broader correct occupational category.
|
| 37 |
+
|
| 38 |
+
4. **Unit Groups (4-digit codes):** The most specific level, identifying precise occupations. Misclassification within a Minor group, between different Unit groups, represents the least severe error, as it occurs within the context of closely related occupations.
|
| 39 |
+
|
| 40 |
+
The hierarchical accuracy measure rewards more precise classifications that correctly identify an occupation's placement down to the specific Unit group level and applies penalties for misclassifications based on the hierarchical distance between the correct and assigned categories. This approach allows for a refined evaluation of classification systems, acknowledging partially correct classifications while penalizing inaccuracies more severely as they deviate further from the correct level in the hierarchical structure.
|
| 41 |
+
|
| 42 |
+
For example, a misclassification into Unit group 2211 (Generalist Medical Practitioners) when the correct category is Unit group 2212 (Specialist Medical Practitioners) should incur a lesser penalty than misclassification into Unit group 2352 (Special Needs Teachers), as Unit groups 2211 and 2212 are within the same Minor group 221 (Medical doctors) and share a closer hierarchical relationship compared to Minor group 235 (Other teaching professionals) which is in a different Sub-major group (23 Teaching professionals).
|
| 43 |
+
|
| 44 |
+
The measure applies a higher penalty for errors that occur between more distant categories within the hierarchy:
|
| 45 |
+
|
| 46 |
+
1. Correct classifications at a more specific level (e.g., Minor group 2211) are evaluated more favorably than classifications at a more general level within the same hierarchy (e.g., Sub-major group 22), since the former indicates a closer match to the correct category.
|
| 47 |
+
2. Conversely, incorrect classifications at a more specific level are penalized more heavily than those at a more general level. For instance, classifying into Minor group 222 (Nursing and Midwifery Professionals) is considered worse than classifying into its parent category, Sub-major group 22, if the correct classification is Minor group 221, because the incorrect specific classification indicates a farther deviation from the correct category.
|
| 48 |
+
|
| 49 |
+
Misclassification among sibling categories (e.g., between different Minor groups within the same Sub-major group) is less severe than misclassification at a higher hierarchical level (e.g., between different Major groups).
|
| 50 |
+
|
| 51 |
+
The measure extends the concepts of precision and recall into a hierarchical context, introducing hierarchical precision (ℎ𝑃) and hierarchical recall (ℎ𝑅). In this framework, each sample belongs not only to its designated class but also to all ancestor categories in the hierarchy, excluding the root (we exclude the root of the graph, since all samples belong to the root by default). This adjustment allows the measure to account for the hierarchical structure of the classification scheme, rewarding more accurate location of a sample within the hierarchy and penalizing errors based on their hierarchical significance.
|
| 52 |
+
|
| 53 |
+
To calculate the hierarchical measure, extend the set of real classes
|
| 54 |
+
|
| 55 |
+
$$C_i = \{G\}$$
|
| 56 |
+
with all ancestors of 𝐺:
|
| 57 |
+
$$\vec{C}_i = \{B, C, E, G\}$$
|
| 58 |
+
|
| 59 |
+
Similarly, extend the set of predicted classes
|
| 60 |
+
|
| 61 |
+
$$C^′_i = \{F\}$$
|
| 62 |
+
with all ancestors of 𝐹:
|
| 63 |
+
$$\vec{C}^′_i = \{C, F\}$$
|
| 64 |
+
|
| 65 |
+
Class 𝐶 is the only correctly assigned label from the extended sets:
|
| 66 |
+
|
| 67 |
+
$$| \vec{C}_i ∩ \vec{C}^′_i| = 1$$
|
| 68 |
+
|
| 69 |
+
There are
|
| 70 |
+
|
| 71 |
+
$$| \vec{C}^′_i| = 2$$
|
| 72 |
+
predicted labels and
|
| 73 |
+
$$| \vec{C}_i| = 4$$
|
| 74 |
+
|
| 75 |
+
real classes.
|
| 76 |
+
|
| 77 |
+
Therefore,
|
| 78 |
+
|
| 79 |
+
$$hP = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}^′_i |} = \frac{1}{2}$$
|
| 80 |
+
$$hR = \frac{| \vec{C}_i ∩ \vec{C}^′_i|} {|\vec{C}_i |} = \frac{1}{2}$$
|
| 81 |
+
|
| 82 |
+
Finally, combine ℎ𝑃 and ℎ𝑅 into the hierarchical 𝐹-measure:
|
| 83 |
+
|
| 84 |
+
$$hF_β = \frac{(β^2 + 1) · hP · hR}{(β^2 · hP + hR)}, β ∈ [0, +∞)$$
|
| 85 |
+
The metric **rewards depth**: predicting *221* instead of *22* yields higher *hF* because more ancestors overlap. It **penalises distance**: predicting a Unit in a different Sub‑major group incurs a larger loss than confusing two Units under the same Minor group.
|
| 86 |
|
| 87 |
## How to Use
|
|
|
|
| 88 |
|
| 89 |
+
- **Model evaluation** Assess neural or rule‑based classifiers that map free‑text occupation descriptions to ISCO‑08 codes.
|
| 90 |
+
- **Inter‑rater agreement** Quantify consistency between human coders when full agreement at the Unit level is not always expected.
|
| 91 |
|
| 92 |
### Inputs
|
| 93 |
+
|
| 94 |
+
- **references** *(List[str])*: List of reference ISCO-08 codes (true labels or coder 1 codes).
|
| 95 |
+
- **predictions** *(List[str])*: List of predicted ISCO-08 codes (predicted labels) or coder 2 codes. This is the predicted classification or classification to compare against the reference codes.
|
| 96 |
|
| 97 |
### Output Values
|
| 98 |
|
| 99 |
+
Values are decimal numbers between 0 and 1. Higher scores are better.
|
| 100 |
|
| 101 |
+
**Example output**:
|
| 102 |
+
|
| 103 |
+
```python
|
| 104 |
+
{
|
| 105 |
+
"accuracy": 0.25,
|
| 106 |
+
"hierarchical_precision": 1.0,
|
| 107 |
+
"hierarchical_recall": 0.4,
|
| 108 |
+
"hierarchical_fmeasure": 0.5714285714285715,
|
| 109 |
+
}
|
| 110 |
+
```
|
| 111 |
|
| 112 |
#### Values from Popular Papers
|
| 113 |
+
|
| 114 |
+
The following table is from the paper [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf).
|
| 115 |
+
|
| 116 |
+
**Summary of multilingual model overall accuracies and hF-measure scores**
|
| 117 |
+
|
| 118 |
+
| Model name | Training dataset (training & validation splits) | Evaluation dataset (test split) | Accuracy | hFβ |
|
| 119 |
+
|------------|--------------------------------------------------|---------------------------------|----------|-----|
|
| 120 |
+
| Model 1 | ICILS | ICILS | 63% | 0.89|
|
| 121 |
+
| Model 2 | ILO | ILO | 92% | 0.99|
|
| 122 |
+
| Model 2 | ILO | ICILS | 36% | 0.94|
|
| 123 |
+
| Model 3 | ICILS+ILO | ICILS+ILO | 80% | 0.93|
|
| 124 |
+
| Model 3 | ICILS+ILO | ICILS | 62% | 0.95|
|
| 125 |
|
| 126 |
### Examples
|
| 127 |
+
|
| 128 |
+
```python
|
| 129 |
+
def compute_metrics(p: EvalPrediction):
|
| 130 |
+
preds = p.predictions[0] if isinstance(p.predictions, tuple) else p.predictions
|
| 131 |
+
preds = np.argmax(preds, axis=1)
|
| 132 |
+
# Use labels instead of ids for hierarchical ISCO-08 classification
|
| 133 |
+
preds = [model.config.id2label[p] for p in preds]
|
| 134 |
+
refs = [model.config.id2label[r] for r in p.label_ids]
|
| 135 |
+
result = metric.compute(predictions=preds, references=refs)
|
| 136 |
+
return result
|
| 137 |
+
```
|
| 138 |
|
| 139 |
## Limitations and Bias
|
| 140 |
+
|
| 141 |
+
No known limitations or bias.
|
| 142 |
|
| 143 |
## Citation
|
| 144 |
+
|
| 145 |
+
This metric was developed as part of an [IEA R&D project](https://www.iea.nl/publications/other/rd-outcomes), [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024). The metric was used to evaluate the multilingual [ICILS XLM-R ISCO](https://huggingface.co/ICILS/xlm-r-icils-ilo) classification model.
|
| 146 |
+
|
| 147 |
+
Note: This is version 2 of the ISCO Hierarchical Accuracy Measure.
|
| 148 |
|
| 149 |
## Further References
|
| 150 |
+
|
| 151 |
+
- [The International Standard Classification of Occupations- ISCO-08](https://isco.ilo.org/en/isco-08/)
|
| 152 |
+
- [Improving Parental Occupation Coding Procedures with AI](https://www.iea.nl/sites/default/files/2024-09/Improving-Parental-Occupation-Coding-Procedures-AI.pdf) (Duckworth et al, 2024)
|
| 153 |
+
- [ICILS XLM-R ISCO model](https://huggingface.co/ICILS/xlm-r-icils-ilo)
|
app.py
CHANGED
|
@@ -1,6 +1,71 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
import evaluate
|
| 2 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 3 |
|
|
|
|
| 4 |
|
| 5 |
-
|
| 6 |
-
launch_gradio_widget(module)
|
|
|
|
| 1 |
+
from tests import test_cases
|
| 2 |
+
from evaluate.utils.logging import get_logger
|
| 3 |
+
from evaluate.utils import (
|
| 4 |
+
infer_gradio_input_types,
|
| 5 |
+
parse_gradio_data,
|
| 6 |
+
json_to_string_type,
|
| 7 |
+
parse_readme,
|
| 8 |
+
parse_test_cases,
|
| 9 |
+
)
|
| 10 |
+
from pathlib import Path
|
| 11 |
+
|
| 12 |
import evaluate
|
| 13 |
+
import sys
|
| 14 |
+
|
| 15 |
+
logger = get_logger(__name__)
|
| 16 |
+
|
| 17 |
+
|
| 18 |
+
def launch_gradio_widget(metric, test_cases):
|
| 19 |
+
"""Launches `metric` widget with Gradio."""
|
| 20 |
+
|
| 21 |
+
try:
|
| 22 |
+
import gradio as gr
|
| 23 |
+
except ImportError as error:
|
| 24 |
+
logger.error(
|
| 25 |
+
"To create a metric widget with Gradio make sure gradio is installed."
|
| 26 |
+
)
|
| 27 |
+
raise error
|
| 28 |
+
|
| 29 |
+
local_path = Path(sys.path[0])
|
| 30 |
+
# if there are several input types, use first as default.
|
| 31 |
+
if isinstance(metric.features, list):
|
| 32 |
+
(feature_names, feature_types) = zip(*metric.features[0].items())
|
| 33 |
+
else:
|
| 34 |
+
(feature_names, feature_types) = zip(*metric.features.items())
|
| 35 |
+
gradio_input_types = infer_gradio_input_types(feature_types)
|
| 36 |
+
|
| 37 |
+
parsed_test_cases = parse_test_cases(test_cases, feature_names, gradio_input_types)
|
| 38 |
+
|
| 39 |
+
def compute(data):
|
| 40 |
+
return metric.compute(**parse_gradio_data(data, gradio_input_types))
|
| 41 |
+
|
| 42 |
+
demo = gr.Interface(
|
| 43 |
+
fn=compute,
|
| 44 |
+
inputs=gr.Dataframe(
|
| 45 |
+
value=parsed_test_cases[0],
|
| 46 |
+
headers=feature_names,
|
| 47 |
+
col_count=len(feature_names),
|
| 48 |
+
datatype=json_to_string_type(gradio_input_types),
|
| 49 |
+
# row_count=5,
|
| 50 |
+
),
|
| 51 |
+
outputs=gr.Textbox(label=metric.name),
|
| 52 |
+
description=(
|
| 53 |
+
metric.info.description
|
| 54 |
+
+ "\nISCO codes must be wrapped in double quotes."
|
| 55 |
+
# " Alternatively you can use a JSON-formatted list as input."
|
| 56 |
+
),
|
| 57 |
+
title=f"Metric: {metric.name}",
|
| 58 |
+
article=parse_readme(local_path / "README.md"),
|
| 59 |
+
# TODO: load test cases and use them to populate examples
|
| 60 |
+
examples=[parsed_test_cases],
|
| 61 |
+
)
|
| 62 |
+
if __name__ == "__main__":
|
| 63 |
+
demo.launch()
|
| 64 |
+
else:
|
| 65 |
+
return demo
|
| 66 |
+
|
| 67 |
|
| 68 |
+
module = evaluate.load("danieldux/isco_hierarchical_accuracy_v2")
|
| 69 |
|
| 70 |
+
if __name__ == "__main__":
|
| 71 |
+
launch_gradio_widget(module, test_cases)
|
ham.py
ADDED
|
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
"""
|
| 2 |
+
Compute hierarchical precision, recall, and F_beta for a single instance.
|
| 3 |
+
|
| 4 |
+
Returns: Tuple[float, float, float]: (hierarchical precision, hierarchical recall, F_beta score)
|
| 5 |
+
"""
|
| 6 |
+
|
| 7 |
+
# Copyright 2025 Daniel Duckworth
|
| 8 |
+
# Licensed under the Apache License, Version 2.0
|
| 9 |
+
|
| 10 |
+
from typing import Iterable, List, Tuple, Optional, Dict, Any
|
| 11 |
+
|
| 12 |
+
|
| 13 |
+
def _normalize(code: Optional[str]) -> Optional[str]:
|
| 14 |
+
"""
|
| 15 |
+
Normalize an ISCO-08 code to a digit string of length 1..4.
|
| 16 |
+
Returns None if the input is empty/invalid.
|
| 17 |
+
Preserves leading zeros if they were present in the original string.
|
| 18 |
+
"""
|
| 19 |
+
if code is None:
|
| 20 |
+
return None
|
| 21 |
+
s = str(code).strip()
|
| 22 |
+
|
| 23 |
+
# If it's purely digits already, keep as-is to preserve leading zeros
|
| 24 |
+
if s.isdigit():
|
| 25 |
+
if 1 <= len(s) <= 4:
|
| 26 |
+
return s
|
| 27 |
+
return None
|
| 28 |
+
|
| 29 |
+
# Otherwise strip non-digits while preserving any leading 0s present
|
| 30 |
+
digits = "".join(ch for ch in s if ch.isdigit())
|
| 31 |
+
if 1 <= len(digits) <= 4:
|
| 32 |
+
return digits
|
| 33 |
+
return None
|
| 34 |
+
|
| 35 |
+
|
| 36 |
+
def ancestors(code: Optional[str]) -> List[str]:
|
| 37 |
+
"""
|
| 38 |
+
Ancestor-closure (excluding the artificial root): all non-empty prefixes.
|
| 39 |
+
For '2211' -> ['2','22','221','2211'].
|
| 40 |
+
"""
|
| 41 |
+
norm = _normalize(code)
|
| 42 |
+
if norm is None:
|
| 43 |
+
return []
|
| 44 |
+
return [norm[:k] for k in range(1, len(norm) + 1)]
|
| 45 |
+
|
| 46 |
+
|
| 47 |
+
def hp_hr_hfbeta(
|
| 48 |
+
true_code: Optional[str], pred_code: Optional[str], beta: float = 1.0
|
| 49 |
+
) -> Tuple[float, float, float]:
|
| 50 |
+
"""
|
| 51 |
+
Per-instance hierarchical precision, recall, and F_beta.
|
| 52 |
+
"""
|
| 53 |
+
C = set(ancestors(true_code))
|
| 54 |
+
Cp = set(ancestors(pred_code))
|
| 55 |
+
|
| 56 |
+
if not C or not Cp:
|
| 57 |
+
return 0.0, 0.0, 0.0
|
| 58 |
+
|
| 59 |
+
m = len(C & Cp)
|
| 60 |
+
hp = m / len(Cp)
|
| 61 |
+
hr = m / len(C)
|
| 62 |
+
|
| 63 |
+
if hp == 0.0 and hr == 0.0:
|
| 64 |
+
return 0.0, 0.0, 0.0
|
| 65 |
+
|
| 66 |
+
b2 = beta * beta
|
| 67 |
+
hf = (1.0 + b2) * hp * hr / (b2 * hp + hr)
|
| 68 |
+
return hp, hr, hf
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
def hierarchical_scores(
|
| 72 |
+
y_true: Iterable[Optional[str]],
|
| 73 |
+
y_pred: Iterable[Optional[str]],
|
| 74 |
+
beta: float = 1.0,
|
| 75 |
+
average: str = "both", # "micro", "macro", or "both"
|
| 76 |
+
return_per_instance: bool = False,
|
| 77 |
+
) -> Dict[str, Any]:
|
| 78 |
+
"""
|
| 79 |
+
Compute micro/macro aggregated hierarchical P/R/F_beta.
|
| 80 |
+
"""
|
| 81 |
+
y_true = list(y_true)
|
| 82 |
+
y_pred = list(y_pred)
|
| 83 |
+
if len(y_true) != len(y_pred):
|
| 84 |
+
raise ValueError("y_true and y_pred must have the same length")
|
| 85 |
+
|
| 86 |
+
inst_hp, inst_hr, inst_hf = [], [], []
|
| 87 |
+
per_instance = []
|
| 88 |
+
|
| 89 |
+
M = 0 # total intersection
|
| 90 |
+
P = 0 # total predicted path length
|
| 91 |
+
T = 0 # total true path length
|
| 92 |
+
|
| 93 |
+
for g, p in zip(y_true, y_pred):
|
| 94 |
+
C = set(ancestors(g))
|
| 95 |
+
Cp = set(ancestors(p))
|
| 96 |
+
|
| 97 |
+
if C and Cp:
|
| 98 |
+
m = len(C & Cp)
|
| 99 |
+
hp = m / len(Cp)
|
| 100 |
+
hr = m / len(C)
|
| 101 |
+
if hp == 0.0 and hr == 0.0:
|
| 102 |
+
hf = 0.0
|
| 103 |
+
else:
|
| 104 |
+
b2 = beta * beta
|
| 105 |
+
hf = (1.0 + b2) * hp * hr / (b2 * hp + hr)
|
| 106 |
+
|
| 107 |
+
inst_hp.append(hp)
|
| 108 |
+
inst_hr.append(hr)
|
| 109 |
+
inst_hf.append(hf)
|
| 110 |
+
|
| 111 |
+
M += m
|
| 112 |
+
P += len(Cp)
|
| 113 |
+
T += len(C)
|
| 114 |
+
else:
|
| 115 |
+
hp = hr = hf = 0.0
|
| 116 |
+
inst_hp.append(hp)
|
| 117 |
+
inst_hr.append(hr)
|
| 118 |
+
inst_hf.append(hf)
|
| 119 |
+
|
| 120 |
+
if return_per_instance:
|
| 121 |
+
per_instance.append(
|
| 122 |
+
{
|
| 123 |
+
"hP": hp,
|
| 124 |
+
"hR": hr,
|
| 125 |
+
"hF_beta": hf,
|
| 126 |
+
}
|
| 127 |
+
)
|
| 128 |
+
|
| 129 |
+
out: Dict[str, Any] = {}
|
| 130 |
+
|
| 131 |
+
if average in ("macro", "both"):
|
| 132 |
+
macro_hp = sum(inst_hp) / len(inst_hp) if inst_hp else 0.0
|
| 133 |
+
macro_hr = sum(inst_hr) / len(inst_hr) if inst_hr else 0.0
|
| 134 |
+
macro_hf_mean = sum(inst_hf) / len(inst_hf) if inst_hf else 0.0
|
| 135 |
+
b2 = beta * beta
|
| 136 |
+
macro_hf_from_pr = (
|
| 137 |
+
(1.0 + b2) * macro_hp * macro_hr / (b2 * macro_hp + macro_hr)
|
| 138 |
+
if (macro_hp + macro_hr) > 0
|
| 139 |
+
else 0.0
|
| 140 |
+
)
|
| 141 |
+
out.update(
|
| 142 |
+
{
|
| 143 |
+
"macro_hP": macro_hp,
|
| 144 |
+
"macro_hR": macro_hr,
|
| 145 |
+
"macro_hF_beta_mean": macro_hf_mean,
|
| 146 |
+
"macro_hF_beta_from_macroPR": macro_hf_from_pr,
|
| 147 |
+
}
|
| 148 |
+
)
|
| 149 |
+
|
| 150 |
+
if average in ("micro", "both"):
|
| 151 |
+
micro_hp = (M / P) if P > 0 else 0.0
|
| 152 |
+
micro_hr = (M / T) if T > 0 else 0.0
|
| 153 |
+
b2 = beta * beta
|
| 154 |
+
micro_hf = (
|
| 155 |
+
(1.0 + b2) * micro_hp * micro_hr / (b2 * micro_hp + micro_hr)
|
| 156 |
+
if (micro_hp + micro_hr) > 0
|
| 157 |
+
else 0.0
|
| 158 |
+
)
|
| 159 |
+
out.update(
|
| 160 |
+
{
|
| 161 |
+
"micro_hP": micro_hp,
|
| 162 |
+
"micro_hR": micro_hr,
|
| 163 |
+
"micro_hF_beta": micro_hf,
|
| 164 |
+
}
|
| 165 |
+
)
|
| 166 |
+
|
| 167 |
+
if return_per_instance:
|
| 168 |
+
out["per_instance"] = per_instance
|
| 169 |
+
|
| 170 |
+
return out
|
isco_hierachical_accuracy_v2.py
CHANGED
|
@@ -1,95 +1,125 @@
|
|
| 1 |
-
# Copyright 2020 The HuggingFace Datasets Authors
|
|
|
|
| 2 |
#
|
| 3 |
-
# Licensed under the Apache License, Version 2.0
|
| 4 |
-
# you may not use this file except in compliance with the License.
|
| 5 |
-
# You may obtain a copy of the License at
|
| 6 |
#
|
| 7 |
-
#
|
| 8 |
-
#
|
| 9 |
-
# Unless required by applicable law or agreed to in writing, software
|
| 10 |
-
# distributed under the License is distributed on an "AS IS" BASIS,
|
| 11 |
-
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
| 12 |
-
# See the License for the specific language governing permissions and
|
| 13 |
-
# limitations under the License.
|
| 14 |
-
"""TODO: Add a description here."""
|
| 15 |
|
| 16 |
import evaluate
|
| 17 |
import datasets
|
| 18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 19 |
|
| 20 |
-
# TODO: Add BibTeX citation
|
| 21 |
_CITATION = """\
|
| 22 |
-
@
|
| 23 |
-
title
|
| 24 |
-
|
| 25 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 26 |
}
|
| 27 |
"""
|
| 28 |
|
| 29 |
-
# TODO: Add description of the module here
|
| 30 |
_DESCRIPTION = """\
|
| 31 |
-
|
|
|
|
|
|
|
|
|
|
| 32 |
"""
|
| 33 |
|
| 34 |
-
|
| 35 |
-
# TODO: Add description of the arguments of the module here
|
| 36 |
_KWARGS_DESCRIPTION = """
|
| 37 |
-
Calculates how good are predictions given some references, using certain scores
|
| 38 |
Args:
|
| 39 |
-
predictions:
|
| 40 |
-
|
| 41 |
-
references:
|
| 42 |
-
|
| 43 |
-
|
| 44 |
-
|
| 45 |
-
|
| 46 |
-
|
| 47 |
-
|
| 48 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 49 |
|
| 50 |
-
|
| 51 |
-
>>>
|
| 52 |
-
>>>
|
| 53 |
-
|
|
|
|
|
|
|
|
|
|
| 54 |
"""
|
| 55 |
|
| 56 |
-
#
|
| 57 |
-
|
|
|
|
|
|
|
| 58 |
|
| 59 |
|
| 60 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 61 |
-
class isco_hierachical_accuracy_v2(evaluate.Metric):
|
| 62 |
-
"""
|
| 63 |
|
| 64 |
def _info(self):
|
| 65 |
-
#
|
|
|
|
| 66 |
return evaluate.MetricInfo(
|
| 67 |
-
# This is the description that will appear on the modules page.
|
| 68 |
module_type="metric",
|
| 69 |
description=_DESCRIPTION,
|
| 70 |
citation=_CITATION,
|
| 71 |
inputs_description=_KWARGS_DESCRIPTION,
|
| 72 |
-
|
| 73 |
-
|
| 74 |
-
|
| 75 |
-
|
| 76 |
-
|
| 77 |
-
|
| 78 |
-
homepage="
|
| 79 |
-
|
| 80 |
-
|
| 81 |
-
|
|
|
|
|
|
|
|
|
|
| 82 |
)
|
| 83 |
|
| 84 |
def _download_and_prepare(self, dl_manager):
|
| 85 |
-
|
| 86 |
-
# TODO: Download external resources if needed
|
| 87 |
pass
|
| 88 |
|
| 89 |
-
def _compute(
|
| 90 |
-
|
| 91 |
-
|
| 92 |
-
|
| 93 |
-
|
| 94 |
-
|
| 95 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
# Copyright 2020 The HuggingFace Datasets Authors
|
| 2 |
+
# and 2025 Daniel Duckworth (metric integration)
|
| 3 |
#
|
| 4 |
+
# Licensed under the Apache License, Version 2.0
|
|
|
|
|
|
|
| 5 |
#
|
| 6 |
+
# Metric: ISCO-08 Hierarchical Precision/Recall/Fβ with micro/macro aggregation.
|
| 7 |
+
# The metric treats each code as belonging to all ancestor prefixes (excluding the root).
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 8 |
|
| 9 |
import evaluate
|
| 10 |
import datasets
|
| 11 |
|
| 12 |
+
# Import the implementation from a sibling module file.
|
| 13 |
+
# If packaging as an evaluate module, place both files in the same module directory.
|
| 14 |
+
from ham import hierarchical_scores
|
| 15 |
+
|
| 16 |
|
|
|
|
| 17 |
_CITATION = """\
|
| 18 |
+
@inproceedings{kosmopoulos2015evaluation,
|
| 19 |
+
title={Evaluation Measures for Hierarchical Classification: A Unified View and Novel Proposals},
|
| 20 |
+
author={Kosmopoulos, Aris and Partalas, Ioannis and Gaussier, Eric and Paliouras, George and Androutsopoulos, Ion},
|
| 21 |
+
booktitle={Neurocomputing},
|
| 22 |
+
year={2015}
|
| 23 |
+
}
|
| 24 |
+
@misc{isco08,
|
| 25 |
+
title={International Standard Classification of Occupations (ISCO-08)},
|
| 26 |
+
howpublished={International Labour Organization},
|
| 27 |
+
year={2008}
|
| 28 |
}
|
| 29 |
"""
|
| 30 |
|
|
|
|
| 31 |
_DESCRIPTION = """\
|
| 32 |
+
Hierarchical precision (hP), recall (hR), and Fβ (hFβ) for ISCO-08 codes.
|
| 33 |
+
Each code is expanded to its ancestor-closure (all non-empty prefixes), and
|
| 34 |
+
overlap between predicted and reference closures determines hP/hR. This rewards
|
| 35 |
+
correct depth and penalizes distance in the hierarchy.
|
| 36 |
"""
|
| 37 |
|
|
|
|
|
|
|
| 38 |
_KWARGS_DESCRIPTION = """
|
|
|
|
| 39 |
Args:
|
| 40 |
+
predictions (List[str] | List[int]): Predicted ISCO-08 codes (length 1..4).
|
| 41 |
+
Strings are recommended to preserve leading zeros.
|
| 42 |
+
references (List[str] | List[int]): Reference ISCO-08 codes (length 1..4).
|
| 43 |
+
beta (float, optional): F-measure beta parameter. Default 1.0.
|
| 44 |
+
average (str, optional): "micro", "macro", or "both". Default "both".
|
| 45 |
+
return_per_instance (bool, optional): If True, returns a list of per-instance
|
| 46 |
+
dicts with hP/hR/hFβ. Default False.
|
| 47 |
+
|
| 48 |
+
Returns (dict):
|
| 49 |
+
If average includes "macro":
|
| 50 |
+
- macro_hP
|
| 51 |
+
- macro_hR
|
| 52 |
+
- macro_hF_beta_mean # mean of per-instance hFβ
|
| 53 |
+
- macro_hF_beta_from_macroPR # Fβ computed from macro hP/hR
|
| 54 |
+
If average includes "micro":
|
| 55 |
+
- micro_hP
|
| 56 |
+
- micro_hR
|
| 57 |
+
- micro_hF_beta
|
| 58 |
+
If return_per_instance:
|
| 59 |
+
- per_instance: List[{"hP": float, "hR": float, "hF_beta": float}, ...]
|
| 60 |
|
| 61 |
+
Examples:
|
| 62 |
+
>>> import evaluate
|
| 63 |
+
>>> metric = evaluate.load("path/to/isco_hierachical_accuracy_v2.py")
|
| 64 |
+
>>> refs = ["2211", "22", "3112"]
|
| 65 |
+
>>> preds = ["22", "2213", "2211"]
|
| 66 |
+
>>> metric.compute(references=refs, predictions=preds, beta=1.0, average="both")
|
| 67 |
+
{'macro_hP': 0.5833333333333334, 'macro_hR': 0.5, 'macro_hF_beta_mean': 0.5, 'macro_hF_beta_from_macroPR': 0.5384615384615384, 'micro_hP': 0.6, 'micro_hR': 0.5, 'micro_hF_beta': 0.5454545454545454}
|
| 68 |
"""
|
| 69 |
|
| 70 |
+
# Optional external resources
|
| 71 |
+
ISCO_CODES_URL = (
|
| 72 |
+
"https://www.ilo.org/ilostat-files/ISCO/newdocs-08-2021/ISCO-08/ISCO-08%20EN.csv"
|
| 73 |
+
)
|
| 74 |
|
| 75 |
|
| 76 |
@evaluate.utils.file_utils.add_start_docstrings(_DESCRIPTION, _KWARGS_DESCRIPTION)
|
| 77 |
+
class isco_hierachical_accuracy_v2(evaluate.Metric): # keep class name as requested
|
| 78 |
+
"""Hierarchical ISCO-08 evaluation metric for hP/hR/hFβ."""
|
| 79 |
|
| 80 |
def _info(self):
|
| 81 |
+
# The features describe how inputs are structured when used with a Dataset;
|
| 82 |
+
# compute(...) can still accept raw Python lists.
|
| 83 |
return evaluate.MetricInfo(
|
|
|
|
| 84 |
module_type="metric",
|
| 85 |
description=_DESCRIPTION,
|
| 86 |
citation=_CITATION,
|
| 87 |
inputs_description=_KWARGS_DESCRIPTION,
|
| 88 |
+
features=datasets.Features(
|
| 89 |
+
{
|
| 90 |
+
"predictions": datasets.Sequence(datasets.Value("string")),
|
| 91 |
+
"references": datasets.Sequence(datasets.Value("string")),
|
| 92 |
+
}
|
| 93 |
+
),
|
| 94 |
+
homepage="https://isco.ilo.org/en/isco-08/",
|
| 95 |
+
codebase_urls=["https://github.com/huggingface/evaluate"],
|
| 96 |
+
reference_urls=[
|
| 97 |
+
"https://www.ilo.org/ilostat-files/ISCO/newdocs-08-2021/ISCO-08/ISCO-08%20EN%20Vol%201.pdf",
|
| 98 |
+
"https://www.ilo.org/ilostat-files/ISCO/newdocs-08-2021/ISCO-08/ISCO-08%20EN%20Structure%20and%20definitions.xlsx",
|
| 99 |
+
"https://www.ilo.org/ilostat-files/ISCO/newdocs-08-2021/ISCO-08/ISCO-08%20-88%20EN%20Index.xlsx",
|
| 100 |
+
],
|
| 101 |
)
|
| 102 |
|
| 103 |
def _download_and_prepare(self, dl_manager):
|
| 104 |
+
# No external assets are required.
|
|
|
|
| 105 |
pass
|
| 106 |
|
| 107 |
+
def _compute(
|
| 108 |
+
self,
|
| 109 |
+
predictions,
|
| 110 |
+
references,
|
| 111 |
+
beta: float = 1.0,
|
| 112 |
+
average: str = "both",
|
| 113 |
+
return_per_instance: bool = False,
|
| 114 |
+
):
|
| 115 |
+
"""
|
| 116 |
+
Returns hierarchical precision/recall/Fβ (micro/macro).
|
| 117 |
+
"""
|
| 118 |
+
results = hierarchical_scores(
|
| 119 |
+
y_true=references,
|
| 120 |
+
y_pred=predictions,
|
| 121 |
+
beta=beta,
|
| 122 |
+
average=average,
|
| 123 |
+
return_per_instance=return_per_instance,
|
| 124 |
+
)
|
| 125 |
+
return results
|
tests.py
CHANGED
|
@@ -1,17 +1,134 @@
|
|
| 1 |
test_cases = [
|
| 2 |
{
|
| 3 |
-
"
|
| 4 |
-
|
| 5 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 6 |
},
|
| 7 |
{
|
| 8 |
-
"
|
| 9 |
-
"
|
| 10 |
-
"result": {
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 11 |
},
|
| 12 |
{
|
| 13 |
-
"
|
| 14 |
-
"
|
| 15 |
-
"result": {
|
| 16 |
-
|
| 17 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
test_cases = [
|
| 2 |
{
|
| 3 |
+
"references": [
|
| 4 |
+
"1111",
|
| 5 |
+
"1111",
|
| 6 |
+
"1111",
|
| 7 |
+
"1111",
|
| 8 |
+
"1111",
|
| 9 |
+
"1111",
|
| 10 |
+
"1111",
|
| 11 |
+
"1111",
|
| 12 |
+
"1111",
|
| 13 |
+
"1111",
|
| 14 |
+
],
|
| 15 |
+
"predictions": [
|
| 16 |
+
"1111",
|
| 17 |
+
"1112",
|
| 18 |
+
"1120",
|
| 19 |
+
"1211",
|
| 20 |
+
"1311",
|
| 21 |
+
"2111",
|
| 22 |
+
"111",
|
| 23 |
+
"11",
|
| 24 |
+
"1",
|
| 25 |
+
"9999",
|
| 26 |
+
],
|
| 27 |
+
"result": {
|
| 28 |
+
"accuracy": 0.1111111111111111,
|
| 29 |
+
"hierarchical_precision": 0.26666666666666666,
|
| 30 |
+
"hierarchical_recall": 1.0,
|
| 31 |
+
"hierarchical_fmeasure": 0.4210526315789474,
|
| 32 |
+
},
|
| 33 |
},
|
| 34 |
{
|
| 35 |
+
"references": ["1111"],
|
| 36 |
+
"predictions": ["1111"],
|
| 37 |
+
"result": {
|
| 38 |
+
"accuracy": 1.0,
|
| 39 |
+
"hierarchical_precision": 1.0,
|
| 40 |
+
"hierarchical_recall": 1.0,
|
| 41 |
+
"hierarchical_fmeasure": 1.0,
|
| 42 |
+
},
|
| 43 |
},
|
| 44 |
{
|
| 45 |
+
"references": ["1111"],
|
| 46 |
+
"predictions": ["1112"],
|
| 47 |
+
"result": {
|
| 48 |
+
"accuracy": 0.0,
|
| 49 |
+
"hierarchical_precision": 0.75,
|
| 50 |
+
"hierarchical_recall": 0.75,
|
| 51 |
+
"hierarchical_fmeasure": 0.75,
|
| 52 |
+
},
|
| 53 |
+
},
|
| 54 |
+
{
|
| 55 |
+
"references": ["1111"],
|
| 56 |
+
"predictions": ["1120"],
|
| 57 |
+
"result": {
|
| 58 |
+
"accuracy": 0.0,
|
| 59 |
+
"hierarchical_precision": 0.5,
|
| 60 |
+
"hierarchical_recall": 0.5,
|
| 61 |
+
"hierarchical_fmeasure": 0.5,
|
| 62 |
+
},
|
| 63 |
+
},
|
| 64 |
+
{
|
| 65 |
+
"references": ["1111"],
|
| 66 |
+
"predictions": ["1211"],
|
| 67 |
+
"result": {
|
| 68 |
+
"accuracy": 0.0,
|
| 69 |
+
"hierarchical_precision": 0.25,
|
| 70 |
+
"hierarchical_recall": 0.25,
|
| 71 |
+
"hierarchical_fmeasure": 0.25,
|
| 72 |
+
},
|
| 73 |
+
},
|
| 74 |
+
{
|
| 75 |
+
"references": ["1111"],
|
| 76 |
+
"predictions": ["1311"],
|
| 77 |
+
"result": {
|
| 78 |
+
"accuracy": 0.0,
|
| 79 |
+
"hierarchical_precision": 0.25,
|
| 80 |
+
"hierarchical_recall": 0.25,
|
| 81 |
+
"hierarchical_fmeasure": 0.25,
|
| 82 |
+
},
|
| 83 |
+
},
|
| 84 |
+
{
|
| 85 |
+
"references": ["1111"],
|
| 86 |
+
"predictions": ["2111"],
|
| 87 |
+
"result": {
|
| 88 |
+
"accuracy": 0.0,
|
| 89 |
+
"hierarchical_precision": 0.0,
|
| 90 |
+
"hierarchical_recall": 0.0,
|
| 91 |
+
"hierarchical_fmeasure": 0,
|
| 92 |
+
},
|
| 93 |
+
},
|
| 94 |
+
{
|
| 95 |
+
"references": ["1111"],
|
| 96 |
+
"predictions": ["111"],
|
| 97 |
+
"result": {
|
| 98 |
+
"accuracy": 0.0,
|
| 99 |
+
"hierarchical_precision": 1.0,
|
| 100 |
+
"hierarchical_recall": 0.25,
|
| 101 |
+
"hierarchical_fmeasure": 0.4,
|
| 102 |
+
},
|
| 103 |
+
},
|
| 104 |
+
{
|
| 105 |
+
"references": ["1111"],
|
| 106 |
+
"predictions": ["11"],
|
| 107 |
+
"result": {
|
| 108 |
+
"accuracy": 0.0,
|
| 109 |
+
"hierarchical_precision": 1.0,
|
| 110 |
+
"hierarchical_recall": 0.25,
|
| 111 |
+
"hierarchical_fmeasure": 0.4,
|
| 112 |
+
},
|
| 113 |
+
},
|
| 114 |
+
{
|
| 115 |
+
"references": ["1111"],
|
| 116 |
+
"predictions": ["1"],
|
| 117 |
+
"result": {
|
| 118 |
+
"accuracy": 0.0,
|
| 119 |
+
"hierarchical_precision": 1.0,
|
| 120 |
+
"hierarchical_recall": 0.25,
|
| 121 |
+
"hierarchical_fmeasure": 0.4,
|
| 122 |
+
},
|
| 123 |
+
},
|
| 124 |
+
{
|
| 125 |
+
"references": ["1111"],
|
| 126 |
+
"predictions": ["9999"],
|
| 127 |
+
"result": {
|
| 128 |
+
"accuracy": 0.0,
|
| 129 |
+
"hierarchical_precision": 0.0,
|
| 130 |
+
"hierarchical_recall": 0.0,
|
| 131 |
+
"hierarchical_fmeasure": 0,
|
| 132 |
+
},
|
| 133 |
+
},
|
| 134 |
+
]
|