INDRA World
INDRA World
INDRA World is a generalization of INDRA (originally developed for biology) for
automatically collecting, assembling, and modeling the web of causal relations
that drive interconnected events in regional and global systems.
INDRA World interfaces with four machine reading systems which extract concepts, events, and causal relations from text (typically reports from governmental and non-governmental organizations, news stories, and scientific publications). The extractions are processed into a standardized Statement representation and then processed (filtered, normalized, etc.).
INDRA World makes use of the general INDRA assembly logic to find relationships between statements, including matching, contradiction, and refinement (i.e., one statement is a more general or more specific version of the other). It then calculates a belief score which is based on all available evidence directly or indirectly supporting a given statement.
This repository also implements a database and service architecture to run INDRA World as a service that integrates with other systems and supports managing project-specific statement sets and incremental assembly with new reader outputs.
Installation
INDRA World can be installed directly from Github as
$ pip install git+https://github.com/indralab/indra_world.git
Additionally, INDRA World can be run via Docker with public images available through Dockerhub. For more information, see https://github.com/indralab/indra_world/tree/master/docker.
Documentation
Detailed documentation is available at: https://indra-world.readthedocs.io/en/latest/.
Command line interface
The INDRA World command line interface allows running assembly using externally supplied arguments and configurations files. This serves as an alternative to using the Python API.
usage: indra_world [-h]
(--reader-output-files READER_OUTPUT_FILES |
--reader-output-dart-query READER_OUTPUT_DART_QUERY |
--reader-output-dart-keys READER_OUTPUT_DART_KEYS)
[--assembly-config ASSEMBLY_CONFIG]
(--ontology-path ONTOLOGY_PATH |
--ontology-id ONTOLOGY_ID)
--output-folder OUTPUT_FOLDER
[--causemos-metadata CAUSEMOS_METADATA]
INDRA World assembly CLI
optional arguments:
-h, --help show this help message and exit
Input options:
--reader-output-files READER_OUTPUT_FILES
Path to a JSON file whose keys are reading system
identifiers and whose values are lists of file paths to
outputs from the given system to be used in assembly.
--reader-output-dart-query READER_OUTPUT_DART_QUERY
Path to a JSON file that specifies query parameters for
reader output records in DART. Only applicable if DART
is being used.
--reader-output-dart-keys READER_OUTPUT_DART_KEYS
Path to a text file where each line is a DART storage
key corresponding to a reader output record. Only
applicable if DART is being used.
Assembly options:
--assembly-config ASSEMBLY_CONFIG
Path to a JSON file that specifies the INDRA assembly
pipeline. If not provided, the default assembly
pipeline will be used.
--ontology-path ONTOLOGY_PATH
Path to an ontology YAML file.
--ontology-id ONTOLOGY_ID
The identifier of an ontology registered in DART. Only
applicable if DART is being used.
Output options:
--output-folder OUTPUT_FOLDER
The path to a folder to which the INDRA output will be
written.
--causemos-metadata CAUSEMOS_METADATA
Path to a JSON file that provides metadata to be used
for a Causemos-compatible dump of INDRA output (which
consists of multiple files). THe --output-path
option must also be used along with this option.
The CLI can also be invoked through Docker. In this case, all CLI arguments that are paths, need to be made visible to Docker. To do this, the -v flag can be used to mount a host folder (in the command below, [local-path-to-mount] into the container on a given path. All CLI path arguments then need to be given with respect to the path as seen in the container. Furthermore, if any of the files referred to in CLI arguments themselves list file paths (e.g., the value of –reader-output-files), those paths need to be relative to the Docker container’s mounted volume as well.
docker run -v [local-path-to-mount]:/data --entrypoint indra_world indralab/indra_world:latest [cli-arguments]
Dockerized INDRA World service
This folder contains files to run the INDRA World service through Docker containers. It also provides files to build them locally in case customizations are needed.
Running the integrated service
A docker-compose file defines how the service image and DB image need to be
run. The docker-compose file refers to two images (indralab/indra_world and indralab/indra_world_db), both available publicly
on Dockerhub. This means that they are automatically pulled when running
docker-compose up
unless they are already available locally.
To launch the service, run
docker-compose up -d
where the optional -d
flag runs the containers in the background.
There are two files that need to be created containing environment variables for each container with the following names and content:
indra_world.env
INDRA_WM_SERVICE_DB=postgresql://postgres:mysecretpassword@db:5432
DART_WM_URL=<DART URL>
DART_WM_USERNAME=<DART username>
DART_WM_PASSWORD=<DART password>
AWS_ACCESS_KEY_ID=<AWS account key ID, necessary if assembled outputs need to be dumped to S3 for CauseMos>
AWS_SECRET_ACCESS_KEY=<AWS account secret key, necessary if assembled outputs need to be dumped to S3 for CauseMos>
AWS_REGION=us-east-1
INDRA_WORLD_ONTOLOGY_URL=<GitHub URL to ontology being used, only necessary if DART is not used.>
LOCAL_DEPLOYMENT=1
Above, LOCAL_DEPLOYMENT
should only be set if the service is intended to
be run on and accessed from localhost. This enables the assembly dashboard
app at http://localhost:8001/dashboard
which can write assembled corpus
output to the container’s disk (this can either be mounted to correspond to
a host folder or files can be copied to the host using docker cp).
indra_world_db.env
POSTGRES_PASSWORD=mysecretpassword
PGDATA=/var/lib/postgresql/pgdata
Note that if necessary, the default POSTGRES_PASSWORD=mysecretpassword
setting
can be changed using standard psql
commands in the indra_world_db
container
and then committed to an image.
Building the Docker images locally
As described above, the two necessary Docker images are available on Dockerhub, therefore the following steps are only necessary if local changes to the images (beyond what can be controlled through environmental variables) are needed.
Building the INDRA World service image
To build the indra_world
Docker image, run
docker build --tag indra_world:latest .
Initializing the INDRA World DB image
To create the indra_world_db
Docker image from scratch, run
./initialize_db_image.sh
Note that this requires Python dependencies needed to run INDRA World to be available in the local environment.
Using the public INDRA World API
The API is deployed and documented at wm.indra.bio.
Cloud-based CauseMos integration via S3
Access to the INDRA-assembled corpora requires credentials to the shared World Modelers S3 bucket “world-modelers”. Each INDRA-assembled corpus is available within this bucket, under the “indra_models” key base. Each corpus is identified by a string identifier.
The corpus index
The list of corpora can be obtained either using S3’s list objects function or by reading the index.csv file which is maintained by INDRA. This index is a comma separated values text file which contains one row for each corpus. Each row’s first element is a corpus identifier, and the second element is the UTC date-time at which the corpus was uploaded to S3. An example row in this file looks as follows
test1_newlines,2020-05-08-22-34-29
where test1_newlines is the corpus identifier and 2020-05-08-22-34-29 is the upload date-time.
Structure of each corpus
Within the world-modelers bucket, under the indra_models key base, files for each corpus are organized under a subkey equivalent to the corpus identifier, for instance, all the files for the test1_newlines corpus are under the indra_models/test1_newlines/ key base. The list of files for each corpus are as follows
statements.json: a JSON dump of assembled INDRA Statements. Each statement’s JSON representation is on a separate line in this file. This is the main file that CauseMos needs to ingest for UI interaction.
metadata.json: a JSON file containing key-value pairs that describe the corpus. The standard keys in this file are as follows:
corpus_id: the ID of the corpus (redundant with the corresponding entry in the index).
description: a human-readable description of how the corpus was obtained.
display_name: a human-readable display name for the corpus.
readers: a list of the names of the reading systems from which statements were obtained in the corpus.
assembly: a dictionary identifying attributes of the assembly process with the following keys:
level: the level of resolution used to assemble the corpus (e.g., “location_and_time”).
grounding_threshold: the threshold (if any) which was used to filter statements by grounding score (e.g., 0.7)
num_statements: the number of assembled INDRA Statements in the corpus ( i.e., statements.json).
num_documents: the number of documents that were read by readers to produce the statements that were assembled.
tenant: if DART is used, a corpus is typically associated with a tenant (i.e., a user or an institution); this field provides the tenant ID.
Note that any of these keys may be missing if unavailable, for instance, in the case of old uploads.
INDRA World Modules Reference
Knowledge Sources (indra_world.sources
)
Eidos (indra_world.sources.eidos
)
API (indra_world.sources.eidos.api
)
- indra_world.sources.eidos.api.process_json(json_dict, grounding_ns=None, extract_filter=None, grounding_mode=None)[source]
Return an EidosProcessor by processing a Eidos JSON-LD dict.
- Parameters:
json_dict (dict) – The JSON-LD dict to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type:
EidosProcessor
- indra_world.sources.eidos.api.process_json_file(file_name, grounding_ns=None, extract_filter=None, grounding_mode='compositional')[source]
Return an EidosProcessor by processing the given Eidos JSON-LD file.
This function is useful if the output from Eidos is saved as a file and needs to be processed.
- Parameters:
file_name (str) – The name of the JSON-LD file to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type:
EidosProcessor
- indra_world.sources.eidos.api.process_json_str(json_str, grounding_ns=None, extract_filter=None, grounding_mode='compositional')[source]
Return an EidosProcessor by processing the Eidos JSON-LD string.
- Parameters:
json_str (str) – The JSON-LD string to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type:
EidosProcessor
- indra_world.sources.eidos.api.process_text(text, save_json='eidos_output.json', webservice=None, grounding_ns=None, extract_filter=None, grounding_mode='compositional')[source]
Return an EidosProcessor by processing the given text.
This constructs a reader object via Java and extracts mentions from the text. It then serializes the mentions into JSON and processes the result with process_json.
- Parameters:
text (str) – The text to be processed.
save_json (Optional[str]) – The name of a file in which to dump the JSON output of Eidos.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
ep – An EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type:
EidosProcessor
- indra_world.sources.eidos.api.reground_texts(texts, ont_yml, webservice=None, topk=10, filter=True, is_canonicalized=True)[source]
Return grounding for concept texts given an ontology.
- Parameters:
ont_yml (str) – A serialized YAML string representing the ontology.
webservice (Optional[str]) – The address where the Eidos web service is running, e.g., http://localhost:9000. If None, a local Eidos JAR is invoked via pyjnius. Default: None
topk (Optional[int]) – The number of top scoring groundings to return. Default: 10
is_canonicalized (Optional[bool]) – If True, the texts are assumed to be canonicalized. If False, Eidos will canonicalize the texts which yields much better groundings but is slower. Default: False
filter (Optional[bool]) – If True, Eidos filters the ontology to remove determiners from examples and other similar operations. Should typically be set to True. Default: True
- Returns:
A list of the top k scored groundings for each text in the list.
- Return type:
Client (indra_world.sources.eidos.client
)
- indra_world.sources.eidos.client.grounding_dict_to_list(groundings)[source]
Transform the webservice response into a flat list.
- indra_world.sources.eidos.client.reground_texts(texts, ont_yml, webservice, topk=10, is_canonicalized=False, filter=True, cache_path=None)[source]
Ground concept texts given an ontology with an Eidos web service.
- Parameters:
ont_yml (str) – A serialized YAML string representing the ontology.
webservice (str) – The address where the Eidos web service is running, e.g., http://localhost:9000.
topk (Optional[int]) – The number of top scoring groundings to return. Default: 10
is_canonicalized (Optional[bool]) – If True, the texts are assumed to be canonicalized. If False, Eidos will canonicalize the texts which yields much better groundings but is slower. Default: False
filter (Optional[bool]) – If True, Eidos filters the ontology to remove determiners from examples and other similar operations. Should typically be set to True. Default: True
- Returns:
A JSON dict of the results from the Eidos webservice.
- Return type:
Migration Table Processor (indra_world.sources.eidos.migration_table_processor
)
Processor (indra_world.sources.eidos.processor
)
- class indra_world.sources.eidos.processor.EidosProcessorCompositional(json_dict, grounding_ns)[source]
Bases:
EidosWorldProcessor
- class indra_world.sources.eidos.processor.EidosWorldProcessor(json_dict, grounding_ns)[source]
Bases:
EidosProcessor
Hume (indra_world.sources.hume
)
Hume is a general purpose reading system developed by BBN.
Currently, INDRA can process JSON-LD files produced by Hume. When available, the API will be extended with access to the reader as a service.
API (indra_world.sources.hume.api
)
- indra_world.sources.hume.api.process_jsonld(jsonld, extract_filter=None, grounding_mode=None)[source]
Process a JSON-LD string in the new format to extract Statements.
- Parameters:
jsonld (dict) – The JSON-LD object to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
A HumeProcessor instance, which contains a list of INDRA Statements as its statements attribute.
- Return type:
indra_world.sources.hume.HumeProcessor
- indra_world.sources.hume.api.process_jsonld_file(fname, extract_filter=None, grounding_mode='compositional')[source]
Process a JSON-LD file in the new format to extract Statements.
- Parameters:
fname (str) – The path to the JSON-LD file to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
A HumeProcessor instance, which contains a list of INDRA Statements as its statements attribute.
- Return type:
indra_world.sources.hume.HumeProcessor
Processor (indra_world.sources.hume.processor
)
- class indra_world.sources.hume.processor.HumeJsonLdProcessor(json_dict)[source]
This processor extracts INDRA Statements from Hume JSON-LD output.
- Parameters:
json_dict (dict) – A JSON dictionary containing the Hume extractions in JSON-LD format.
- tree
The objectpath Tree object representing the extractions.
- Type:
objectpath.Tree
Sofia (indra_world.sources.sofia
)
Sofia is a general purpose natural language processing system developed at UPitt and CMU by N. Miskov et al.
API (indra_world.sources.sofia.api
)
- indra_world.sources.sofia.api.process_json(json_obj, extract_filter=None, grounding_mode=None)[source]
Return processor by processing a JSON object returned by Sofia.
- Parameters:
json_obj (json) – A JSON object containing extractions from Sofia.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type:
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_json_file(fname, extract_filter=None, grounding_mode='compositional')[source]
Return processor by processing a JSON file produced by Sofia.
- Parameters:
fname (str) – The name of the JSON file to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type:
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_table(fname, extract_filter=None, grounding_mode='compositional')[source]
Return processor by processing a given sheet of a spreadsheet file.
- Parameters:
fname (str) – The name of the Excel file (typically .xlsx extension) to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type:
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_text(text, out_file='sofia_output.json', auth=None, extract_filter=None, grounding_mode='compositional')[source]
Return processor by processing text given as a string.
- Parameters:
text (str) – A string containing the text to be processed with Sofia.
out_file (Optional[str]) – The path to a file to save the reader’s output into. Default: sofia_output.json
auth (Optional[list]) – A username/password pair for the Sofia web service. If not given, the SOFIA_USERNAME and SOFIA_PASSWORD values are loaded from either the INDRA config or the environment.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute. If the API did not process the text, None is returned.
- Return type:
indra.sources.sofia.processor.SofiaProcessor
Processor (indra_world.sources.sofia.processor
)
- class indra_world.sources.sofia.processor.SofiaExcelProcessor(relation_rows, event_rows, entity_rows, **kwargs)[source]
Bases:
SofiaProcessor
An Excel processor extracting statements from reading done by Sofia
- extract_events(event_rows, relation_rows)[source]
Extract Event statements of a Sofia document in Excel format
- class indra_world.sources.sofia.processor.SofiaJsonProcessor(jd, **kwargs)[source]
Bases:
SofiaProcessor
A JSON processor extracting statements from reading done by Sofia
- class indra_world.sources.sofia.processor.SofiaProcessor(score_cutoff=None, grounding_mode='compositional')[source]
Bases:
object
A processor extracting statements from reading done by Sofia
- get_event(event_entry)[source]
Get an Event with the pre-set grounding mode
The grounding mode is set at initialization of the class and is stored in the attribute grounding_mode.
CWMS (indra_world.sources.cwms
)
CWMS is a variant of the TRIPS system. It is a general purpose natural language understanding system with applications in world modeling. For more information, see: http://trips.ihmc.us/parser/cgi/cwmsreader
API (indra_world.sources.cwms.api
)
- indra_world.sources.cwms.api.process_ekb(ekb_str, extract_filter=None, grounding_mode='flat')[source]
Processes an EKB string produced by CWMS.
- Parameters:
ekb_str (str) – EKB string to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type:
indra.sources.cwms.CWMSProcessor
- indra_world.sources.cwms.api.process_ekb_file(fname, extract_filter=None, grounding_mode='flat')[source]
Processes an EKB file produced by CWMS.
- Parameters:
fname (str) – Path to the EKB file to process.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type:
indra.sources.cwms.CWMSProcessor
- indra_world.sources.cwms.api.process_text(text, save_xml='cwms_output.xml', extract_filter=None, grounding_mode='flat')[source]
Processes text using the CWMS web service.
- Parameters:
text (str) – Text to process
save_xml (Optional[str]) – A file name in which to dump the output from CWMS. Default: cwms_output.xml
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type:
indra.sources.cwms.CWMSProcessor
Processor (indra_world.sources.cwms.processor
)
- class indra_world.sources.cwms.processor.CWMSProcessor(xml_string)[source]
Bases:
object
The CWMSProcessor currently extracts causal relationships between terms (nouns) in EKB. In the future, this processor can be extended to extract other types of relations, or to extract relations involving events.
For more details on the TRIPS EKB XML format, see http://trips.ihmc.us/parser/cgi/drum
- Parameters:
xml_string (str) – A TRIPS extraction knowledge base (EKB) in XML format as a string.
- tree
An ElementTree object representation of the TRIPS EKB XML.
- class indra_world.sources.cwms.processor.CWMSProcessorCompositional(xml_string)[source]
Bases:
CWMSProcessor
DART (indra_world.sources.dart
)
API (indra_world.sources.dart.api
)
- indra_world.sources.dart.api.get_record_key(rec)[source]
Return a key for a DART record for purposes of deduplication.
Client (indra_world.sources.dart.client
)
A client for accessing reader output from the DART system.
- class indra_world.sources.dart.client.DartClient(storage_mode='web', dart_url=None, dart_uname=None, dart_pwd=None, local_storage=None)[source]
A client for the DART web service with optional local storage.
- Parameters:
storage_mode (Optional[str]) – If web, the configured DART URL and credentials are used to communicate with the DART web service. If local, a local storage is used to access and store reader outputs.
dart_url (Optional[str]) – The DART service URL. If given, it overrides the DART_WM_URL configuration value.
dart_uname (Optional[str]) – The DART service user name. If given, it overrides the DART_WM_USERNAME configuration value.
dart_pwd (Optional[str]) – The DART service password. If given, it overrides the DART_WM_PASSWORD configuration value.
local_storage (Optional[str]) – A path that points to a folder for local storage. If the storage_mode is web, this local_storage is used as a local cache. If the storage_mode is local, it is used as the primary location to access reader outputs. If given, it overrides the INDRA_WM_CACHE configuration value.
- cache_record(record, overwrite=False)[source]
Download and cache a given record in local storage.
- Parameters:
record (dict) – A DART record.
- cache_records(records, overwrite=False)[source]
Download and cache a list of records in local storage.
- download_output(storage_key)[source]
Return content from the DART web service based on its storage key.
- get_outputs_from_records(records)[source]
Return reader outputs corresponding to a list of records.
- get_reader_output_records(readers=None, versions=None, document_ids=None, timestamp=None, tenant=None, ontology_id=None, unique=False)[source]
Return reader output metadata records by querying the DART API
- Query json structure:
{“readers”: [“MyAwesomeTool”, “SomeOtherAwesomeTool”], “versions”: [“3.1.4”, “1.3.3.7”], “document_ids”: [“qwerty1234”, “poiuyt0987”], “timestamp”: {“before”: “yyyy-mm-ddThh:mm:ss”, “after”: “yyyy-mm-ddThh:mm:ss”}}
- Parameters:
readers (list) – A list of reader names
versions (list) – A list of versions to match with the reader name(s)
document_ids (list) – A list of document identifiers
timestamp (dict("before"|"after",str)) – The timestamp string must be formatted “yyyy-mm-ddThh:mm:ss”.
tenant (Optional[str]) – Return only records for the given tenant.
ontology_id (Optional[str]) – Return only records for the given ontology ID.
unique (Optional[bool]) – If true, records that are duplicates are collapsed. Default: False.
- Returns:
The JSON payload of the response from the DART API
- Return type:
- indra_world.sources.dart.client.prioritize_records(records, priorities=None)[source]
Return unique records per reader and document prioritizing by version.
- Parameters:
- Returns:
records – A list of records that are unique per reader and document, picked by version priority when multiple records exist for the same reader and document.
- Return type:
Knowledge assembly modules (indra_world.assembly
)
Statement preprocessing (indra_world.assembly.preprocess
)
Assembly operations (indra_world.assembly.operations
)
- class indra_world.assembly.operations.CompositionalRefinementFilter(ontology, nproc=None)[source]
-
Return a set of statement hashes that a given statement is potentially related to.
- Parameters:
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns:
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type:
- indra_world.assembly.operations.get_expanded_events_influences(stmts)[source]
Return a list of all standalone events from a list of statements.
- indra_world.assembly.operations.location_matches_compositional(stmt)[source]
Return a matches_key which takes geo-location into account.
- indra_world.assembly.operations.location_refinement_compositional(st1, st2, ontology, entities_refined=True)[source]
Return True if there is a location-aware refinement between stmts.
- indra_world.assembly.operations.make_display_name(comp_grounding)[source]
Return display name from a compositional grounding with ‘of’ linkers.
- indra_world.assembly.operations.make_display_name_linear(comp_grounding)[source]
Return display name from compositional grounding with linear joining.
- indra_world.assembly.operations.merge_deltas(stmts_in)[source]
Gather and merge original Influence delta information from evidence.
This function is only applicable to Influence Statements that have subj and obj deltas. All other statement types are passed through unchanged. Polarities and adjectives for subjects and objects respectivey are collected and merged by travesrsing all evidences of a Statement.
- Parameters:
stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements whose influence deltas should be merged. These Statements are meant to have been preassembled and potentially have multiple pieces of evidence.
- Returns:
stmts_out – The list of Statements now with deltas merged at the Statement level.
- Return type:
list[indra.statements.Statement]
Matches functions (indra_world.assembly.matches
)
- indra_world.assembly.matches.event_location_time_matches(event)[source]
Return Event matches key which takes location and time into account.
- indra_world.assembly.matches.get_location(stmt)[source]
Return the grounded geo-location context associated with a Statement.
- indra_world.assembly.matches.get_location_from_object(loc_obj)[source]
Return geo-location from a RefContext location object.
- indra_world.assembly.matches.get_time(stmt)[source]
Return the time context associated with a Statement.
- indra_world.assembly.matches.has_location(stmt)[source]
Return True if a Statement has grounded geo-location context.
Refinement functions (indra_world.assembly.refinement
)
- class indra_world.assembly.refinement.CompositionalRefinementFilter(ontology, nproc=None)[source]
-
Return a set of statement hashes that a given statement is potentially related to.
- Parameters:
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns:
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type:
- indra_world.assembly.refinement.event_location_refinement(st1, st2, ontology, entities_refined, ignore_polarity=False)[source]
Return True if there is a location-aware refinement between Events.
- indra_world.assembly.refinement.event_location_time_refinement(st1, st2, ontology, entities_refined)[source]
Return True if there is a location/time refinement between Events.
- indra_world.assembly.refinement.get_agent_key(agent, comp_idx)[source]
Return a key for an Agent for use in refinement finding.
- Parameters:
agent (indra.statements.Agent or None) – An INDRA Agent whose key should be returned.
- Returns:
The key that maps the given agent to the ontology, with special handling for ungrounded and None Agents.
- Return type:
tuple or None
- indra_world.assembly.refinement.location_refinement(st1, st2, ontology, entities_refined)[source]
Return True if there is a location-aware refinement between stmts.
Incremental Assembler (indra_world.assembly.incremental_assembler
)
- class indra_world.assembly.incremental_assembler.AssemblyDelta(new_stmts, new_evidences, new_refinements, beliefs, matches_fun=None)[source]
Represents changes to the assembly structure as a result of new statements added to a set of existing statements.
- new_evidences
A dict of new evidences for existing or new statements keyed by statement hash.
- new_refinements
A list of statement hash pairs representing new refinement links.
- beliefs
A dict of belief scores keyed by all statement hashes (both old and new).
- class indra_world.assembly.incremental_assembler.IncrementalAssembler(prepared_stmts, refinement_filters=None, matches_fun=<function location_matches_compositional>, curations=None, post_processing_steps=None, ontology=None)[source]
Assemble a set of prepared statements and allow incremental extensions.
- Parameters:
prepared_stmts (list[indra.statements.Statement]) – A list of prepared INDRA Statements.
refinement_filters (Optional[list[indra.preassembler.refinement.RefinementFilter]]) – A list of refinement filter classes to be used for refinement finding. Default: the standard set of compositional refinement filters.
matches_fun (Optional[function]) – A custom matches function for determining matching statements and calculating hashes. Default: matches function that takes compositional grounding and location into account.
curations (dict[dict]) – A dict of user curations to be integrated into the assembly results, keyed by statement hash.
post_processing_steps (list[dict]) – Steps that can be used in an INDRA AssemblyPipeline to do post-processing on statements.
- refinement_edges
A set of tuples of statement hashes representing refinement links between statements.
- Type:
- add_statements(stmts)[source]
Add new statements for incremental assembly.
- Parameters:
stmts (list[indra.statements.Statement]) – A list of new prepared statements to be incrementally assembled into the set of existing statements.
- Returns:
An AssemblyDelta object representing the changes to the assembly as a result of the new added statements.
- Return type:
- static build_refinements_graph(stmts_by_hash, refinement_edges)[source]
Return a refinements graph based on statements and refinement edges.
Statistics (indra_world.assembly.stats
)
Ontology Module (indra_world.ontology
)
Module containing the implementation of an IndraOntology for the World Modelers use case.
World Ontology (indra_world.ontology.ontology
)
- class indra_world.ontology.ontology.WorldOntology(url, yml=None)[source]
Represents the ontology used for World Modelers applications.
- Parameters:
url (str) – The URL or file path pointing to a World Modelers ontology YAML.
- add_entry(entry, examples=None, neg_examples=None)[source]
Add a new ontology entry with examples.
This works by adding the entry to the yml attribute first and then reloading the entire yaml to build a new graph.
- build_relations(node, tree, prefix)[source]
Build relations for the classic ontology format <= v3.0
- build_relations_new_format(node, prefix)[source]
Build relations for the new ontology format > v3.0
Belief Engine (indra_world.belief
)
- indra_world.belief.get_eidos_bayesian_scorer(prior_counts=None)[source]
Return a BayesianScorer based on Eidos curation counts.
- Returns:
A BayesianScorer belief scorer instance.
- Return type:
scorer
Output assemblers (indra_world.assemblers
)
As opposed to INDRA, the importance of output/model assemblers in INDRA World is minor, since other systems such as Delphi (https://github.com/ml4ai/delphi) and DySE (https://dl.acm.org/doi/10.1145/3359115.3359123) take on the role of converting assembled INDRA Statements into probabilistic and logical dynamical models, respectively.
CAG Assembler (indra_world.assemblers.cag
)
Assemble simple graphs of assembled INDRA Statements that can be embedded into websites or notebooks.
Assembler (indra_world.assemblers.cag.assembler
)
- class indra_world.assemblers.cag.assembler.CAGAssembler(stmts=None)[source]
Assembles a causal analysis graph from INDRA Statements.
- Parameters:
stmts (Optional[list[indra.statement.Statements]]) – A list of INDRA Statements to be assembled. Currently supports Influence Statements.
- CAG
A networkx MultiDiGraph object representing the causal analysis graph.
- Type:
nx.MultiDiGraph
- export_to_cytoscapejs()[source]
Return CAG in format readable by CytoscapeJS.
- Returns:
A JSON-like dict representing the graph for use with CytoscapeJS.
- Return type:
- generate_jupyter_js(cyjs_style=None, cyjs_layout=None)[source]
Generate Javascript from a template to run in Jupyter notebooks.
- Parameters:
cyjs_style (Optional[dict]) – A dict that sets CytoscapeJS style as specified in https://github.com/cytoscape/cytoscape.js/blob/master/documentation/md/style.md.
cyjs_layout (Optional[dict]) – A dict that sets CytoscapeJS layout parameters.
- Returns:
A Javascript string to be rendered in a Jupyter notebook cell.
- Return type:
Figaro Assembler (indra_world.assemblers.figaro
)
A proof-of-concept assembler for INDRA Statements into probabilistic programs in the Figaro (https://github.com/p2t2/figaro) framework.
Assembler (indra_world.assemblers.figaro.assembler
)
TSV Assembler (indra_world.assemblers.tsv
)
Assemble tab separated spreadsheets of assembled INDRA Statements for curation purposes.
Indra World Service (indra_world.service
)
Note: This is the documentation of the codebase used in the INDRA World service. Documentation of the service API can be found here.
INDRA World Database (indra_world.service.db
)
Database Manager (indra_world.service.db.manager
)
- class indra_world.service.db.manager.DbManager(url)[source]
Manages transactions with the assembly database and exposes an API for various operations.
- add_curation_for_project(project_id, stmt_hash, curation)[source]
Add curations for a given project.
- add_dart_record(reader, reader_version, document_id, storage_key, date, output_version=None, labels=None, tenants=None)[source]
Insert a DART record into the database.
- add_records_for_project(project_id, record_keys)[source]
Add document IDs for a project with the given ID.
- add_statements_for_record(record_key, stmts, indra_version)[source]
Add a set of prepared statements for a given document.
- get_corpus_for_project(project_id)[source]
Return the corpus ID that a project was derived from, if available.
- get_dart_records(reader=None, document_id=None, reader_version=None, output_version=None, labels=None, tenants=None)[source]
Return storage keys for DART records given constraints.
- get_full_dart_records(reader=None, document_id=None, reader_version=None, output_version=None, labels=None, tenants=None)[source]
Return full DART records given constraints.
- get_statements_for_document(document_id, reader=None, reader_version=None, indra_version=None)[source]
Return prepared statements for a given document.
Database Schema (indra_world.service.db.Schema
)
Service controller (indra_world.service.controller
)
REST API (indra_world.service.app
)
- class indra_world.service.app.EidosProcessJsonld(api=None, *args, **kwargs)[source]
-
- post()[source]
Process an EIDOS JSON-LD and return INDRA Statements.
- Parameters:
jsonld (str) – The JSON-LD string to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
statements – A list of extracted INDRA Statements.
- Return type:
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.EidosProcessText(api=None, *args, **kwargs)[source]
-
- post()[source]
Process text with EIDOS and return INDRA Statements.
- Parameters:
text (str) – The text to be processed.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
statements – A list of extracted INDRA Statements.
- Return type:
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.GetAllRecords(api=None, *args, **kwargs)[source]
- class indra_world.service.app.GetProjectCurations(api=None, *args, **kwargs)[source]
- class indra_world.service.app.GetProjectRecords(api=None, *args, **kwargs)[source]
- class indra_world.service.app.GetProjects(api=None, *args, **kwargs)[source]
- class indra_world.service.app.Notify(api=None, *args, **kwargs)[source]
- class indra_world.service.app.SofiaProcessJson(api=None, *args, **kwargs)[source]
-
- post()[source]
Process a Sofia JSON and return INDRA Statements.
- Parameters:
json (str) – The JSON string to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns:
statements – A list of extracted INDRA Statements.
- Return type:
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.SubmitCurations(api=None, *args, **kwargs)[source]
-
- post()[source]
Submit curations.
- Parameters:
- Returns:
mappings – For any statement matches hashes that have changed due to the curations submitted here, the new hash (after applying the curation) is given. Statements whose hash didn’t change, or if a curation for some reason couldn’t be applied, the given statement is not added to the return value.
- Return type:
Corpus manager (indra_world.service.corpus_manager
)
This module allows running one-off assembly on a set of DART records (i.e., reader outputs) into a ‘seed corpus’ that can be dumped on S3 for loading into CauseMos.
- class indra_world.service.corpus_manager.CorpusManager(db_url, dart_records, corpus_id, metadata, dart_client=None, tenant=None, ontology=None)[source]
Corpus manager class allowing running assembly on a set of DART records.
- assemble()[source]
Run assembly on the prepared statements.
This function loads all the prepared statements associated with the corpus and then runs assembly on them.
- prepare(records_exist=False)[source]
Run the preprocessing pipeline on statements.
This function adds the new corpus to the DB, adds records to the new corpus, then processes the reader outputs for those records into statements, preprocesses the statements, and then stores these prepared statements in the DB.