This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
wiki:autolit:screening:inclusionpredictionmodel [2023/05/17 19:49] jthurnham |
wiki:autolit:screening:inclusionpredictionmodel [2024/03/06 02:59] (current) kevinkallmes [Robot Screener] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Screening Model ====== | ====== Screening Model ====== | ||
- | ===== How the Screening Model Works ===== | + | The Screening Model uses AI to learn from screening decisions within a specific nest, predicting inclusion (standard screening) or abstract advancement (two pass screening) probabilities based on configuration. Then it automatically re-orders studies in Screening so that the most likely to be included/ |
+ | |||
+ | ===== Robot Screener | ||
+ | |||
+ | The Screening Model can be used to power AI-assisted screening, replacing one expert in Dual Screening processes: | ||
+ | |||
+ | {{youtube> | ||
+ | |||
+ | When selecting a mode, note that in most cases, when employing Dual Two Pass Mode, **the Robot Screener should replace an expert reviewer only for the Abstract stage of screening **, as the model itself is trained on and screens based on Abstract content. Using the model in this way provides Advancement probabilities (in effect, relevancy scores) to each record. | ||
- | The Screening Model uses AI to learn from screening decisions within | + | See here for full [[: |
- | It also works to power Robot Screener, an AI alternative to a second reviewer in Dual Screening modes. | + | ===== User Guide ===== |
==== Running the Screening Model ==== | ==== Running the Screening Model ==== | ||
Line 13: | Line 21: | ||
In its default setting, the Screening Model must be run manually. To do so, click "Train Screening Model" on the Screening panel: | In its default setting, the Screening Model must be run manually. To do so, click "Train Screening Model" on the Screening panel: | ||
- | {{ : | + | {{ : |
Once the modal opens, click "Train New Model." | Once the modal opens, click "Train New Model." | ||
- | It may take a minute to train, after which it will populate | + | It may take a minute to train, after which it will populate |
- | {{ : | + | {{ : |
==== Interpreting the Model ==== | ==== Interpreting the Model ==== | ||
- | Once the Model is trained, you should see a graph where Included, Excluded, and Unscreened records are represented by green, red, and purple curves, respectively: | + | Once the Model is trained, you should see a graph where Included |
- | {{ : | + | {{ : |
- | Odds of inclusion are presented on the x-axis (ranging from 0 to 1). Since the Model is trained on a nest-by-nest basis, its accuracy ranges based on how many records it can train on and how many patterns it can find in inclusion activities. | + | Odds of inclusion/ |
- | You can see the accuracy in the modal (see red arrow in the image above). Accuracy | + | You can see the accuracy in the modal after the model is trained. In the Cross Validation tab, several statistics are shown. Scores of Recall and Accuracy |
- | ROC-AUC has a minimum of 0 and a maximum of 1, where 1 indicates that when the Model checks its predictions on existing inclusion decisions, it had no false positives or negatives. So, high ROC-AUC (0.85-0.99) indicates that trusting the Model may be warranted, while lower ROC-AUC means that more screening may be necessary to train it further, or the patterns in inclusion decisions are too disparate for accurate prediction. | + | {{ : |
==== Implications for Screening ==== | ==== Implications for Screening ==== | ||
- | Inclusion Probability generated from the Screening model is also available as a filter in [[: | + | Inclusion Probability generated from the Screening model is also available as a filter in [[: |
- | + | ||
- | ===== Behind the Scenes: How Screening Modelling Works ===== | + | |
- | + | ||
- | ==== What exactly is the model? ==== | + | |
- | + | ||
- | At a high level, the model is a Decision Tree- a series of Yes/No questions about characteristics of records that lead to different probabilities of inclusion/ | + | |
- | + | ||
- | In more detail, the model is a gradient-boosted decision tree ensemble. Its hyperparameters, | + | |
- | + | ||
- | ==== What data does the model use? ==== | + | |
- | + | ||
- | The model uses the following data from your records as inputs: | + | |
- | * Bibliographic data | + | |
- | * Time since publication of the record | + | |
- | * Page count | + | |
- | * Keywords/ | + | |
- | * Abstract Content | + | |
- | * N-grams | + | |
- | * OpenAI text embedding (ada-002) | + | |
- | * Citation Counts from Scite, accessed using the DOI | + | |
- | * Number of citing publications | + | |
- | * Number of supporting citation statements | + | |
- | * Number of contrasting citation statements | + | |
- | + | ||
- | Often some of this data will be missing for records; it is imputed as if the record is approximately typical to other records in the nest. | + | |
===== Model Performance ===== | ===== Model Performance ===== | ||
Line 73: | Line 55: | ||
* Weak signal amongst available predictors against protocol | * Weak signal amongst available predictors against protocol | ||
- | **For these reasons, we recommend using the model to augment your screening workflow, not fully automate it. | + | **For these reasons, we recommend using the model to augment your screening workflow, not fully automate it. ** |
- | ** | + | |
How can it augment your screening? | How can it augment your screening? | ||
Line 81: | Line 62: | ||
* Raising high relevancy records to reviewers | * Raising high relevancy records to reviewers | ||
- | **Our model errs towards including/ | + | **Our model errs towards including/ |
==== Testing out the model ==== | ==== Testing out the model ==== | ||
+ | |||
+ | In an internal study, Nested Knowledge ran the model across several hundred SLR projects, finding the following cumulative accuracy statistics: | ||
=== Standard Screening === | === Standard Screening === | ||
- | * AUC: 0.88 | + | * Area Under the [Receiver Operating Characteristic] Curve (AUC): 0.88 |
* Classification Accuracy: 0.92 | * Classification Accuracy: 0.92 | ||
* Recall: 0.76 | * Recall: 0.76 | ||
Line 105: | Line 88: | ||
Following our philosophy, recall is relatively higher than precision: the model suggests inclusion/ | Following our philosophy, recall is relatively higher than precision: the model suggests inclusion/ | ||
- | For comparison purposes, our study found human reviewer recall (relative to the adjudicated decision) was 85% in the average nest. Our models are within 4 & 9 points of human performance on the most critical measure. | + | For comparison purposes, our study found human reviewer recall (relative to the adjudicated decision) was 85% in the average nest. Our models are within 4 & 9 points of human performance on this most critical measure. |
==== Analyzing Your Nest ==== | ==== Analyzing Your Nest ==== | ||
Line 117: | Line 100: | ||
In general, as you screen more records, the better the model will perform. Of course, you want to use the model before you’ve screened every record! | In general, as you screen more records, the better the model will perform. Of course, you want to use the model before you’ve screened every record! | ||
- | To provide the model with sufficient information to begin understanding your review, we require 50 total screens and 10 inclusions/ | + | To provide the model with sufficient information to begin understanding your review, we require 50 total screens and 10 inclusions/ |
As the graph below shows, AUC and recall can grow on a relatively sharp curve early in your review. The curve begins to flatten around 20-30% of records screened, which is where we typically begin to recommend the use of Robot Screener in Dual screening modes. | As the graph below shows, AUC and recall can grow on a relatively sharp curve early in your review. The curve begins to flatten around 20-30% of records screened, which is where we typically begin to recommend the use of Robot Screener in Dual screening modes. | ||
- | {{ : | + | {{ : |
+ | |||
+ | ===== How the Screening Model Works ===== | ||
+ | |||
+ | At a high level, the model is a Decision Tree- a series of Yes/No questions about characteristics of records that lead to different probabilities of inclusion/ | ||
+ | |||
+ | In more detail, the model is a gradient-boosted decision tree ensemble. Its hyperparameters, | ||
+ | |||
+ | ==== What data does the model use? ==== | ||
+ | |||
+ | The model uses the following data from your records as inputs: | ||
+ | |||
+ | * Bibliographic data | ||
+ | * Time since publication of the record | ||
+ | * Page count | ||
+ | * Keywords/ | ||
+ | * Abstract Content | ||
+ | * N-grams | ||
+ | * OpenAI text embedding (ada-002) | ||
+ | * Citation Counts from Scite, accessed using the DOI | ||
+ | * Number of citing publications | ||
+ | * Number of supporting citation statements | ||
+ | * Number of contrasting citation statements | ||
+ | |||
+ | Often some of this data will be missing for records; it is imputed as if the record is approximately typical to other records in the nest. | ||