Nested Knowledge

Bringing Systematic Review to Life

User Tools

Site Tools


wiki:support:ai_disclosure

Disclosure of AI Systems in Nested Knowledge

There are several tools in Nested Knowledge that utilize artificial intelligence and machine learning to make systematic reviews easier and more effective to conduct. This page provides technical details on what these features are, and how your data is used.

1. RoboPICO

  • Used to highlight Populations, Interventions/Comparators, and Outcomes in abstracts during Screening
    • Optional: by default this feature is toggled on, but can be toggled off and will remain off for all abstracts in the queue/ongoing modules and Study Inspector until toggled back on.
  • Used to generate most commonly reported terms among the literature, to inform the build of search queries in Search Exploration
    • Optional: Search Exploration is not a required step in AutoLit and simply offers assistance to build a search. However, when it is used, RoboPICO auto-generates terms when concepts are entered and “Refresh Exploration” is selected. This cannot be switched off when Search Exploration is used.

a. Which data does RoboPICO use?

The abstract of records is made available to the model.

b. How does the model work?

RoboPICO uses a fork of the machine learning system offered by RobotReviewer. Specifically, Named-entity Recognition models extract Patient/Problem, Intervention, and Outcome entities from data in article abstracts. NK's modifications to RobotReviewer are open and General Public Licensed.

This model is not trained or updated from your data.

2. Bibliomine

Bibliomine extracts references from full text PDFs. Typically, previous systematic reviews or landmark studies are bibliomined, importing all cited references as records directly into your nest. This feature does not access your data unless you use it.

Optional: This feature is helpful if, for example, you are performing an update on an existing review, but is not required to successfully upload records to a nest.

a. Which data does Bibliomine use?

Bibliomine consumes any PDFs uploaded for its purposes. It writes records with full bibliographic data to your nest (pending user addition).

b. How does the model work?

Bibliomine uses Cermine, an open source machine learning library for mining documents. Using DOI & title extracted, full bibliographic data will be retrieved from PubMed or CrossRef (in that order of preference).

This model is not trained or updated from your data.

3. Robot Screener

The Robot Screener uses a model trained on human screening decisions to make reviewer-level screening decisions based on inclusion probabilities. In effect, it replaces one human reviewer when turned on in nests with a Dual Screening mode. This feature does not have access to your data unless you turn it on in Settings.

Optional: This feature is helpful to speed up the Screening process, but is not required to successfully Dual Screen all records in a nest.

a. Which data does Robot Screener use?

The following data from your records are model inputs:

  • Bibliographic data
    • Time since publication of the record
    • Page count
    • Keywords/Descriptors
  • Abstract Content
    • N-grams
    • OpenAI text embedding (ada-002)
  • Citation Counts from Scite, accessed using the DOI
    • Number of citing publications
    • Number of supporting citation statements
    • Number of contrasting citation statements

The model is trained on adjudicated AB or FT screening decisions, depending on your screening mode. This includes the exclusion reason. Similarly, the model outputs a screening decision for each record requiring a reviewer-level decision.

If enabled, Robot Screener will continuously screen new records as they imported into your nest.

b. How does this model work?

Learn more about the screening model that generates inclusion/advancement probabilities for Robot Screener. A probability threshold, optimized on a geometric mean of precision and recall in a cross validation, determines if records should be included or excluded.

This model is only available within your nest & its parameters are not shared with other nests or users.

4. Smart Tag Recommendations

Smart Tag Recommendations are context-aware recommendations of relevant concepts within full text PDFs. These recommendations include provenance via an annotation within the PDF, making review of the model's work possible.

Optional: This feature is helpful to speed up the data extraction process, but is not required to perform data extraction of all records in the nest. Standard Tag Recommendations should remain selected to keep this feature off.

a. Which data does Smart Tagging use?

Recommendations are generated for the full-text PDFs of all included records. You instruct the model through:

  • Standard Tagging:
    • Tag Names
    • Hierarchical structure of tags
  • Form Based Tagging:
    • Tag Names
    • Question Types
    • Question

b. How does the model work?

Smart Tagging uses OpenAI's GPT-4, a large language model, to identify relevant concepts or answer questions about plain-text reductions of full-text PDFs. The model's recommendations back to the PDF as an annotation using fuzzy (edit distance) search.

The model, and OpenAI in general, is not trained or updated using your data.

5. Synthesis AI Disclosure

The records included in Synthesis, as well as the Tags applied to underlying reports, may have been collected by users with assistance of Artificial Intelligence. The Artificial Intelligence tools are generally integrated with direct researcher oversight, and the nest owners have final responsibility for the accuracy of all Screening and Tagging.

wiki/support/ai_disclosure.txt · Last modified: 2024/10/24 17:14 by jthurnham