This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
wiki:autolit:screening:inclusionpredictionmodel [2024/07/24 15:32] jthurnham [Running the Screening Model] |
wiki:autolit:screening:inclusionpredictionmodel [2024/09/23 11:59] (current) jthurnham [Interpreting Robot Screener] |
||
---|---|---|---|
Line 1: | Line 1: | ||
====== Screening Model ====== | ====== Screening Model ====== | ||
- | The Screening Model uses AI to learn from screening decisions within a specific nest, predicting | + | The Screening Model uses AI to learn from screening decisions within a specific nest, generating |
- | ===== Robot Screener | + | You may use the screening model in **two ways:** |
+ | * [[wiki: | ||
+ | * //Dual modes only:// Turn on **Robot Screener** to replace an expert reviewer, which makes decisions based on these probabilities. | ||
- | The Screening Model can be used to power AI-assisted | + | Both methods require the model to be trained but the first only displays probabilities, |
- | {{youtube> | + | The below guidance is specifically for using Robot Screener, for information on training the model for probability generation only and general information on how the model works see [[wiki: |
- | When selecting a mode, note that in most cases, when employing Dual Two Pass Mode, **the Robot Screener should replace an expert reviewer only for the Abstract stage of screening **, as the model itself is trained on and screens based on Abstract content. Using the model in this way provides Advancement probabilities (in effect, relevancy scores) to each record. | ||
- | When used in "Robot Screener" | + | ---- |
- | ===== Robot Screener | + | ===== Robot Screener ===== |
- | Robot Screener has been validated | + | The Screening Model can be used to power AI-assisted screening, replacing one expert |
- | | + | **Robot Screener |
- | * [[https://www.ispor.org/heor-resources/presentations-database/ | + | * Dual Standard Mode: Robot Screener replaces a reviewer in the singular round of Screening based on //Inclusion Probabilities.// |
- | * Estimates of [[https://about.nested-knowledge.com/2023/11/ | + | * Dual Two Pass Mode: Robot Screener |
- | You can see a deeper summary of the Validation Studies and their implications [[https:// | + | {{youtube> |
- | ===== User Guide ===== | ||
- | ==== Running the Screening Model ==== | + | ---- |
- | To learn about configuration settings, which enable you to toggle Manual updating vs. Automatic and Displayed vs. Hidden, see the [[: | ||
- | In its default setting, the Screening Model must be run manually. To do so, click "Train Screening Model" on the Screening panel: | ||
- | {{ : | ||
- | Once the modal opens, click "Train New Model." | ||
- | <WRAP center round important 60%> | ||
- | To provide the model with sufficient information to begin understanding your review, we require **50 total adjudicated screening decisions with 10 advancements or inclusions** | ||
- | </ | ||
+ | ===== User Guide ===== | ||
- | It may take a minute to train, after which it will populate a histogram on the left. From then on, each record will show a probability of inclusion or advancement: | + | ==== Settings ==== |
- | {{ : | + | To turn on Robot Screener, head to Nest Settings --> Screening Model, toggle on Robot Screener. |
- | ==== Interpreting the Model ==== | + | {{ : |
- | Once the Model is trained, you should see a graph where Included or Advanced, Excluded, and Unscreened records are represented by green, red, and purple curves, respectively: | + | Not displayed? You must be in a Dual Screening mode to use Robot Screener. |
- | {{ | + | Want to use Automatic Training for use in manual screening instead? [[wiki:autolit: |
- | Odds of inclusion/ | + | ---- |
- | You can see the accuracy in the modal after the model is trained. In the Cross Validation tab, several statistics are shown. Scores of Recall and Accuracy can be used to interpret how the model will perform on the remaining records. High recall (0.7/70%+) indicates that the model will less frequently exclude relevant records, meaning higher performance. Similarly, accuracy indicates how correct the model' | + | ==== Meeting |
- | {{ | + | When toggling Robot Screener, you'll be presented with an instructional modal: |
- | ==== Implications for Screening ==== | + | {{ : |
- | Inclusion Probability generated from the Screening | + | Highlighted in red are the requirements for training |
- | ===== Model Performance ===== | + | <WRAP center round important 60%> |
+ | Before Robot Screener can be turned on, 50 adjudicated screening decisions with 10 advancements/ | ||
+ | </ | ||
- | ==== Our Philosophy ==== | + | Once trained and turned on, the Robot is assigning both inclusion probabilities and actual screening decisions to the remainder of records in the queue. Currently, Robot Screener does not assign exclusion reasons, so decisions are displayed as " |
- | Screening is a complex task that relies on human expertise. Our model may stumble due to: | + | ---- |
- | * Insufficient training examples (usually included/ | ||
- | * Data not available to the model (e.g. screening with a full text article, missing abstract) | ||
- | * Weak signal amongst available predictors against protocol | ||
- | **For these reasons, we recommend using the model to augment your screening workflow, not fully automate it. ** | + | ==== Interpreting Robot Screener ==== |
- | How can it augment your screening? | + | At any time, you may wish to view how the screening |
- | * Excluding clearly low-relevancy records | + | {{ : |
- | * Raising high relevancy records to reviewers | + | |
- | **Our model errs towards including/ | + | This will display a histogram under the " |
- | ==== Testing out the model ==== | + | You can also view the Robot Screener recommendations in the Screening model modal. Select " |
- | In an internal study, Nested Knowledge ran the model across several hundred SLR projects, finding the following cumulative accuracy statistics: | + | With Predictions toggled: |
- | === Standard Screening === | + | {{ : |
- | * Area Under the [Receiver Operating Characteristic] Curve (AUC): 0.88 | + | With Cross Validation toggled: |
- | * Classification Accuracy: 0.92 | + | |
- | * Recall: 0.76 | + | |
- | * Precision: 0.40 | + | |
- | * F1: 0.51 | + | |
- | === Two Pass Screening === | + | {{ : |
- | In two pass screening, the model predicts advancement of a record from abstract screening to full text screening. Given that advancement rates are typically higher than inclusion rates, | + | [[wiki: |
- | * AUC: 0.88 | + | ---- |
- | * Classification Accuracy: 0.93 | + | |
- | * Recall: 0.81 | + | |
- | * Precision: 0.44 | + | |
- | * F1: 0.56 | + | |
- | Following our philosophy, recall is relatively higher than precision: the model suggests inclusion/ | ||
- | For comparison purposes, our study found human reviewer recall (relative to the adjudicated decision) was 85% in the average nest. Our models are within 4 & 9 points of human performance on this most critical measure. | + | ==== Improving Robot Screener ==== |
- | ==== Analyzing Your Nest ==== | + | The best way to improve Robot Screener, is to adjudicate records, since these are the decisions it trains on. We recommend, if you can, have your adjudicator make their final decisions on the Adjudicate Screening page after every 50 studies are screened for best model performance. For reference, the following is what adjudicators will see for records that have one human and one Robot Screener decision applied: |
- | When you train a new model, we generate k-fold cross validation performance measures using the same model hyperparameters the final model is trained with. These performance measures typically provide a lower bound on the performance you can expect from the model on records not yet screened in your nest. High recall (70%+) suggests that your review is less likely to be missing relevant records at the end of screening. High AUC (.8+) suggests that your model is effectively discerning between included and excluded records. | + | {{ : |
- | While we cannot guarantee performance improvement, | + | ---- |
+ | ===== Robot Screener Validation Studies ===== | ||
- | === Timing of Model Training === | + | Robot Screener has been validated in several published studies assessing its decisions in comparison to human decisions across multiple reviews and review types. |
- | In general, as you screen more records, the better the model will perform. Of course, you want to use the model before you’ve screened every record! | + | * [[https:// |
+ | * [[https:// | ||
+ | * Estimates of [[https:// | ||
- | To provide the model with sufficient information to begin understanding your review, we require 50 total screens and 10 inclusions/ | + | You can see a deeper summary |
- | + | ||
- | As the graph below shows, AUC and recall can grow on a relatively sharp curve early in your review. The curve begins to flatten around 20-30% | + | |
- | + | ||
- | {{ | + | |
- | + | ||
- | ===== How the Screening Model Works ===== | + | |
- | + | ||
- | At a high level, the model is a Decision Tree- a series of Yes/No questions | + | |
- | + | ||
- | In more detail, the model is a gradient-boosted decision tree ensemble. Its hyperparameters, | + | |
- | + | ||
- | ==== What data does the model use? ==== | + | |
- | + | ||
- | The model uses the following data from your records | + | |
- | + | ||
- | * Bibliographic data | + | |
- | * Time since publication of the record | + | |
- | * Page count | + | |
- | * Keywords/ | + | |
- | * Abstract Content | + | |
- | * N-grams | + | |
- | * OpenAI text embedding (ada-002) | + | |
- | * Citation Counts from Scite, accessed using the DOI | + | |
- | * Number of citing publications | + | |
- | * Number of supporting citation statements | + | |
- | * Number of contrasting citation statements | + | |
- | + | ||
- | Often some of this data will be missing for records; it is imputed as if the record is approximately typical to other records in the nest. | + | |