Nested Knowledge

Bringing Systematic Review to Life

User Tools

Site Tools


wiki:autolit:screening:inclusionpredictionmodel

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
wiki:autolit:screening:inclusionpredictionmodel [2023/05/17 19:54]
jthurnham [How the Screening Model Works]
wiki:autolit:screening:inclusionpredictionmodel [2024/03/06 02:59] (current)
kevinkallmes [Robot Screener]
Line 1: Line 1:
 ====== Screening Model ====== ====== Screening Model ======
  
-===== How the Screening Model Works =====+The Screening Model uses AI to learn from screening decisions within a specific nest, predicting inclusion (standard screening) or abstract advancement (two pass screening) probabilities based on configuration. Then it automatically re-orders studies in Screening so that the most likely to be included/advanced are presented first. This assists in identifying relevant studies early. 
 + 
 +===== Robot Screener ===== 
 + 
 +The Screening Model can be used to power AI-assisted screening, replacing one expert in Dual Screening processes: 
 + 
 +{{youtube>9bsA4DMF4aE}} 
 + 
 +When selecting a mode, note that in most cases, when employing Dual Two Pass Mode, **the Robot Screener should replace an expert reviewer only for the Abstract stage of screening **, as the model itself is trained on and screens based on Abstract content. Using the model in this way provides Advancement probabilities (in effect, relevancy scores) to each record.
  
-The Screening Model uses AI to learn from screening decisions within specific nest, predicting inclusion (standard screening) or abstract advancement (two pass screening) probabilities based on configuration. Then it automatically re-orders studies in Screening so that the most likely to be included are presented first. This assists in identifying relevant studies early.+See here for full [[:wiki:autolit:screening:robot|]], an AI alternative to a second reviewer in Dual Screening modes.
  
-It also works to power [[wiki:autolit:screening:robot|Robot Screener]], an AI alternative to a second reviewer in Dual Screening modes.+===== User Guide =====
  
 ==== Running the Screening Model ==== ==== Running the Screening Model ====
Line 13: Line 21:
 In its default setting, the Screening Model must be run manually. To do so, click "Train Screening Model" on the Screening panel: In its default setting, the Screening Model must be run manually. To do so, click "Train Screening Model" on the Screening panel:
  
-{{ :undefined:4screen.png?nolink |}}+{{  :undefined:4screen.png?nolink&  }}
  
 Once the modal opens, click "Train New Model." Note: To provide the model with sufficient information to begin understanding your review, we require **50 total screens and 10 inclusions/advancements** before the model can be trained. If there is insufficient evidence to train the model, complete more screening until the "Train New Model" button becomes available. Once the modal opens, click "Train New Model." Note: To provide the model with sufficient information to begin understanding your review, we require **50 total screens and 10 inclusions/advancements** before the model can be trained. If there is insufficient evidence to train the model, complete more screening until the "Train New Model" button becomes available.
  
-It may take a minute to train, after which it will populate the histogram on the left. From then on, each record will show a probability of inclusion (this is always shown if automatic training is on):+It may take a minute to train, after which it will populate histogram on the left. From then on, each record will show a probability of inclusion or advancement:
  
-{{ :undefined:2screen.png?nolink |}}+{{  :undefined:2screen.png?nolink&  }}
  
 ==== Interpreting the Model ==== ==== Interpreting the Model ====
  
-Once the Model is trained, you should see a graph where Included, Excluded, and Unscreened records are represented by green, red, and purple curves, respectively:+Once the Model is trained, you should see a graph where Included or Advanced, Excluded, and Unscreened records are represented by green, red, and purple curves, respectively:
  
-{{ :undefined:model.png?nolink |}}+{{  :undefined:model.png?nolink&  }}
  
-Odds of inclusion are presented on the x-axis (ranging from 0 to 1). Since the Model is trained on a nest-by-nest basis, its accuracy ranges based on how many records it can train on and how many patterns it can find in inclusion activities.+Odds of inclusion/advancement are presented on the x-axis (ranging from 0 to 1). Since the Model is trained on a nest-by-nest basis, its accuracy ranges based on how many records it can train on and how many patterns it can find in inclusion activities.
  
-You can see the accuracy in the modal (see red arrow in the image above). Accuracy is presented as a Receiver Operating Characteristic Area Under the Curve ([[https://towardsdatascience.com/understanding-auc-roc-curve-68b2303cc9c5|ROC-AUC]]).+You can see the accuracy in the modal after the model is trainedIn the Cross Validation tab, several statistics are shown. Scores of Recall and Accuracy can be used to interpret how the model will perform on the remaining records. High recall (0.7/70%+indicates that the model will less frequently exclude relevant records, meaning higher performanceSimilarly, accuracy indicates how correct the model's decisions are compared to already screened records, and thus how it is likely to fare on upcoming records. See below for an example of a relatively well trained model:
  
-ROC-AUC has a minimum of 0 and a maximum of 1, where 1 indicates that when the Model checks its predictions on existing inclusion decisions, it had no false positives or negatives. So, high ROC-AUC (0.85-0.99) indicates that trusting the Model may be warranted, while lower ROC-AUC means that more screening may be necessary to train it further, or the patterns in inclusion decisions are too disparate for accurate prediction.+{{  :undefined:mod.png?nolink&  }}
  
 ==== Implications for Screening ==== ==== Implications for Screening ====
  
-Inclusion Probability generated from the Screening model is also available as a filter in [[:wiki:autolit:utilities:inspector|Inspector]], which can assist with finding records based on their chance of inclusion. [[:wiki:autolit:utilities:inspector:bulk_actions#bulk_screening_status|Bulk Actions]] can also be taken at your discretion, but ensure that you are careful in excluding studies if you have not reviewed their Abstracts at least! +Inclusion Probability generated from the Screening model is also available as a filter in [[:wiki:autolit:utilities:inspector|Inspector]], which can assist with finding records based on their chance of inclusion/advancement. [[:wiki:autolit:utilities:inspector:bulk_actions#bulk_screening_status|Bulk Actions]] can also be taken at your discretion, but ensure that you are careful in excluding studies if you have not reviewed their Abstracts at least!
- +
-===== Behind the Scenes: How Screening Modelling Works ===== +
- +
-==== What exactly is the model? ==== +
- +
-At a high level, the model is a Decision Tree- a series of Yes/No questions about characteristics of records that lead to different probabilities of inclusion/advancement. +
- +
-In more detail, the model is a gradient-boosted decision tree ensemble. Its hyperparameters, particularly around model complexity (number of trees, tree depth) are optimized using a cross validation grid search. The model produces posterior probabilities and is optimized on logistic loss. SMOTE oversampling is employed as a correction to highly imbalanced classes frequently seen in screening. +
- +
-==== What data does the model use? ==== +
- +
-The model uses the following data from your records as inputs: +
-  * Bibliographic data +
-       * Time since publication of the record +
-       * Page count +
-       * Keywords/Descriptors +
-  * Abstract Content +
-       * N-grams +
-       * OpenAI text embedding (ada-002) +
-  * Citation Counts from Scite, accessed using the DOI +
-       * Number of citing publications +
-       * Number of supporting citation statements +
-       * Number of contrasting citation statements +
- +
-Often some of this data will be missing for records; it is imputed as if the record is approximately typical to other records in the nest. +
  
 ===== Model Performance ===== ===== Model Performance =====
Line 73: Line 55:
   * Weak signal amongst available predictors against protocol   * Weak signal amongst available predictors against protocol
  
-**For these reasons, we recommend using the model to augment your screening workflow, not fully automate it. +**For these reasons, we recommend using the model to augment your screening workflow, not fully automate it. **
-**+
  
 How can it augment your screening? How can it augment your screening?
Line 81: Line 62:
   * Raising high relevancy records to reviewers   * Raising high relevancy records to reviewers
  
-**Our model errs towards including/advancing irrelevant records over excluding relevant records.** In statistical terminology, the model aims to achieve high recall. In a review, it is far more costly to exclude a relevant study. Once excluded, reviewers are unlikely to reconsider a record. In contrast, an included/advanced study will be revisited multiple times later in the review, more readily allowing an incorrect include/advance decision to be corrected.+**Our model errs towards including/advancing irrelevant records over excluding relevant records.**  In statistical terminology, the model aims to achieve high recall. In a review, it is far more costly to exclude a relevant study. Once excluded, reviewers are unlikely to reconsider a record. In contrast, an included/advanced study will be revisited multiple times later in the review, more readily allowing an incorrect include/advance decision to be corrected.
  
 ==== Testing out the model ==== ==== Testing out the model ====
 +
 +In an internal study, Nested Knowledge ran the model across several hundred SLR projects, finding the following cumulative accuracy statistics:
  
 === Standard Screening === === Standard Screening ===
  
-  * AUC: 0.88+  * Area Under the [Receiver Operating Characteristic] Curve (AUC): 0.88
   * Classification Accuracy: 0.92   * Classification Accuracy: 0.92
   * Recall: 0.76   * Recall: 0.76
Line 105: Line 88:
 Following our philosophy, recall is relatively higher than precision: the model suggests inclusion/advancement of a larger amount of relevant records, at the cost of suggesting inclusion of some irrelevant records. Due to class imbalance, the model scores a 90%+ classification accuracy, predominantly consisting of correct exclusion suggestions. Following our philosophy, recall is relatively higher than precision: the model suggests inclusion/advancement of a larger amount of relevant records, at the cost of suggesting inclusion of some irrelevant records. Due to class imbalance, the model scores a 90%+ classification accuracy, predominantly consisting of correct exclusion suggestions.
  
-For comparison purposes, our study found human reviewer recall (relative to the adjudicated decision) was 85% in the average nest. Our models are within 4 & 9 points of human performance on the most critical measure.+For comparison purposes, our study found human reviewer recall (relative to the adjudicated decision) was 85% in the average nest. Our models are within 4 & 9 points of human performance on this most critical measure.
  
 ==== Analyzing Your Nest ==== ==== Analyzing Your Nest ====
Line 117: Line 100:
 In general, as you screen more records, the better the model will perform. Of course, you want to use the model before you’ve screened every record! In general, as you screen more records, the better the model will perform. Of course, you want to use the model before you’ve screened every record!
  
-To provide the model with sufficient information to begin understanding your review, we require 50 total screens and 10 inclusions/advancements. At that point, we recommend checking model performance (see above) to evaluate performance. +To provide the model with sufficient information to begin understanding your review, we require 50 total screens and 10 inclusions/advancements. At that point, we recommend checking model performance (see above) to evaluate performance.
  
 As the graph below shows, AUC and recall can grow on a relatively sharp curve early in your review. The curve begins to flatten around 20-30% of records screened, which is where we typically begin to recommend the use of Robot Screener in Dual screening modes. As the graph below shows, AUC and recall can grow on a relatively sharp curve early in your review. The curve begins to flatten around 20-30% of records screened, which is where we typically begin to recommend the use of Robot Screener in Dual screening modes.
  
-{{ :undefined:auc.png?nolink |}}+{{  :undefined:auc.png?nolink&  }} 
 + 
 +===== How the Screening Model Works ===== 
 + 
 +At a high level, the model is a Decision Tree- a series of Yes/No questions about characteristics of records that lead to different probabilities of inclusion/advancement. 
 + 
 +In more detail, the model is a gradient-boosted decision tree ensemble. Its hyperparameters, particularly around model complexity (number of trees, tree depth) are optimized using a cross validation grid search. The model produces posterior probabilities and is optimized on logistic loss. SMOTE oversampling is employed as a correction to highly imbalanced classes frequently seen in screening. 
 + 
 +==== What data does the model use? ==== 
 + 
 +The model uses the following data from your records as inputs: 
 + 
 +  * Bibliographic data 
 +      * Time since publication of the record 
 +      * Page count 
 +      * Keywords/Descriptors 
 +  * Abstract Content 
 +      * N-grams 
 +      * OpenAI text embedding (ada-002) 
 +  * Citation Counts from Scite, accessed using the DOI 
 +      * Number of citing publications 
 +      * Number of supporting citation statements 
 +      * Number of contrasting citation statements 
 + 
 +Often some of this data will be missing for records; it is imputed as if the record is approximately typical to other records in the nest.
  
  
wiki/autolit/screening/inclusionpredictionmodel.1684353244.txt.gz · Last modified: 2023/05/17 19:54 by jthurnham