Corpus manager (indra_world.service.corpus_manager
)
This module allows running one-off assembly on a set of DART records (i.e., reader outputs) into a ‘seed corpus’ that can be dumped on S3 for loading into CauseMos.
- class indra_world.service.corpus_manager.CorpusManager(db_url, dart_records, corpus_id, metadata, dart_client=None, tenant=None, ontology=None)[source]
Corpus manager class allowing running assembly on a set of DART records.
- assemble()[source]
Run assembly on the prepared statements.
This function loads all the prepared statements associated with the corpus and then runs assembly on them.
- prepare(records_exist=False)[source]
Run the preprocessing pipeline on statements.
This function adds the new corpus to the DB, adds records to the new corpus, then processes the reader outputs for those records into statements, preprocesses the statements, and then stores these prepared statements in the DB.