INDRA World
World Modelers INDRA service stack
Using the INDRA World API
The API is deployed and documented at wm.indra.bio.
Setting up the INDRA World API locally
These instructions describe setting up and using the INDRA service stack for World Modelers applications.
First, you need to build the INDRA World Docker image as follows:
git clone https://github.com/indralab/indra_world.git
cd indra_world/docker
docker build --tag indra_world:latest .
Then, in the same folder, do:
docker-compose up -d
to run the INDRA world service as well as an associated postgres container with the relational database used by the service. The docker-compose file reads secret configuration values for accessing various resources from two files: indra_world.env and indra_world_db.env. These files are not part of the public code and need to be added manually.
INDRA assemblies on S3
Access to the INDRA-assembled corpora requires credentials to the shared World Modelers S3 bucket “world-modelers”. Each INDRA-assembled corpus is available within this bucket, under the “indra_models” key base. Each corpus is identified by a string identifier.
The corpus index
The list of corpora can be obtained either using S3’s list objects function or by reading the index.csv file which is maintained by INDRA. This index is a comma separated values text file which contains one row for each corpus. Each row’s first element is a corpus identifier, and the second element is the UTC date-time at which the corpus was uploaded to S3. An example row in this file looks as follows
test1_newlines,2020-05-08-22-34-29
where test1_newlines is the corpus identifier and 2020-05-08-22-34-29 is the upload date-time.
Structure of each corpus
Within the world-modelers bucket, under the indra_models key base, files for each corpus are organized under a subkey equivalent to the corpus identifier, for instance, all the files for the test1_newlines corpus are under the indra_models/test1_newlines/ key base. The list of files for each corpus are as follows
statements.json: a JSON dump of assembled INDRA Statements. As of May 2020, each statement’s JSON representation is on a separate line in this file. Any corpus uploaded before that has a standard JSON structure. This is the main file that CauseMos needs to ingest for UI interaction.
raw_statements.json: a JSON dump of raw INDRA Statements. This file is typically not needed in downstream usage, however, the INDRA curation service needs to have access to it for internal assembly tasks.
metadata.json: a JSON file containing key-value pairs that describe the corpus. The standard keys in this file are as follows:
corpus_id: the ID of the corpus (redundant with the corresponding entry in the index).
description: a human-readable description of how the corpus was obtained.
display_name: a human-readable display name for the corpus.
readers: a list of the names of the reading systems from which statements were obtained in the corpus.
assembly: a dictionary identifying attributes of the assembly process with the following keys:
level: the level of resolution used to assemble the corpus (e.g., “location_and_time”).
grounding_threshold: the threshold (if any) which was used to filter statements by grounding score (e.g., 0.7)
num_statements: the number of assembled INDRA Statements in the corpus ( i.e., statements.json).
num_documents: the number of documents that were read by readers to produce the statements that were assembled.
Note that any of these keys may be missing if unavailable, for instance, in the case of old uploads.
curations.json: a JSON file which persists curations as collected by INDRA. This is the basis of surfacing reader-specific curations in the download_curation endpoint (see above).
INDRA World Modules Reference
Knowledge Sources (indra_world.sources
)
Eidos (indra_world.sources.eidos
)
API (indra_world.sources.eidos.api
)
- indra_world.sources.eidos.api.process_json(json_dict, grounding_ns=None, extract_filter=None, grounding_mode=None)[source]
Return an EidosProcessor by processing a Eidos JSON-LD dict.
- Parameters
json_dict (dict) – The JSON-LD dict to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
EidosProcessor
- indra_world.sources.eidos.api.process_json_file(file_name, grounding_ns=None, extract_filter=None, grounding_mode='flat')[source]
Return an EidosProcessor by processing the given Eidos JSON-LD file.
This function is useful if the output from Eidos is saved as a file and needs to be processed.
- Parameters
file_name (str) – The name of the JSON-LD file to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
EidosProcessor
- indra_world.sources.eidos.api.process_json_str(json_str, grounding_ns=None, extract_filter=None, grounding_mode='flat')[source]
Return an EidosProcessor by processing the Eidos JSON-LD string.
- Parameters
json_str (str) – The JSON-LD string to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
ep – A EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
EidosProcessor
- indra_world.sources.eidos.api.process_text(text, save_json='eidos_output.json', webservice=None, grounding_ns=None, extract_filter=None, grounding_mode='flat')[source]
Return an EidosProcessor by processing the given text.
This constructs a reader object via Java and extracts mentions from the text. It then serializes the mentions into JSON and processes the result with process_json.
- Parameters
text (str) – The text to be processed.
save_json (Optional[str]) – The name of a file in which to dump the JSON output of Eidos.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
ep – An EidosProcessor containing the extracted INDRA Statements in its statements attribute.
- Return type
EidosProcessor
- indra_world.sources.eidos.api.reground_texts(texts, ont_yml, webservice=None, topk=10, filter=True, is_canonicalized=True)[source]
Return grounding for concept texts given an ontology.
- Parameters
ont_yml (str) – A serialized YAML string representing the ontology.
webservice (Optional[str]) – The address where the Eidos web service is running, e.g., http://localhost:9000. If None, a local Eidos JAR is invoked via pyjnius. Default: None
topk (Optional[int]) – The number of top scoring groundings to return. Default: 10
is_canonicalized (Optional[bool]) – If True, the texts are assumed to be canonicalized. If False, Eidos will canonicalize the texts which yields much better groundings but is slower. Default: False
filter (Optional[bool]) – If True, Eidos filters the ontology to remove determiners from examples and other similar operations. Should typically be set to True. Default: True
- Returns
A list of the top k scored groundings for each text in the list.
- Return type
Client (indra_world.sources.eidos.client
)
- indra_world.sources.eidos.client.grounding_dict_to_list(groundings)[source]
Transform the webservice response into a flat list.
- indra_world.sources.eidos.client.reground_texts(texts, ont_yml, webservice, topk=10, is_canonicalized=False, filter=True, cache_path=None)[source]
Ground concept texts given an ontology with an Eidos web service.
- Parameters
ont_yml (str) – A serialized YAML string representing the ontology.
webservice (str) – The address where the Eidos web service is running, e.g., http://localhost:9000.
topk (Optional[int]) – The number of top scoring groundings to return. Default: 10
is_canonicalized (Optional[bool]) – If True, the texts are assumed to be canonicalized. If False, Eidos will canonicalize the texts which yields much better groundings but is slower. Default: False
filter (Optional[bool]) – If True, Eidos filters the ontology to remove determiners from examples and other similar operations. Should typically be set to True. Default: True
- Returns
A JSON dict of the results from the Eidos webservice.
- Return type
Migration Table Processor (indra_world.sources.eidos.migration_table_processor
)
Processor (indra_world.sources.eidos.processor
)
- class indra_world.sources.eidos.processor.EidosProcessorCompositional(json_dict, grounding_ns)[source]
Bases:
indra_world.sources.eidos.processor.EidosWorldProcessor
- class indra_world.sources.eidos.processor.EidosWorldProcessor(json_dict, grounding_ns)[source]
Bases:
indra.sources.eidos.processor.EidosProcessor
Hume (indra_world.sources.hume
)
Hume is a general purpose reading system developed by BBN.
Currently, INDRA can process JSON-LD files produced by Hume. When available, the API will be extended with access to the reader as a service.
API (indra_world.sources.hume.api
)
- indra_world.sources.hume.api.process_jsonld(jsonld, extract_filter=None, grounding_mode=None)[source]
Process a JSON-LD string in the new format to extract Statements.
- Parameters
jsonld (dict) – The JSON-LD object to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
A HumeProcessor instance, which contains a list of INDRA Statements as its statements attribute.
- Return type
indra_world.sources.hume.HumeProcessor
- indra_world.sources.hume.api.process_jsonld_file(fname, extract_filter=None, grounding_mode='flat')[source]
Process a JSON-LD file in the new format to extract Statements.
- Parameters
fname (str) – The path to the JSON-LD file to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
A HumeProcessor instance, which contains a list of INDRA Statements as its statements attribute.
- Return type
indra_world.sources.hume.HumeProcessor
Processor (indra_world.sources.hume.processor
)
- class indra_world.sources.hume.processor.HumeJsonLdProcessor(json_dict)[source]
This processor extracts INDRA Statements from Hume JSON-LD output.
- Parameters
json_dict (dict) – A JSON dictionary containing the Hume extractions in JSON-LD format.
- tree
The objectpath Tree object representing the extractions.
- Type
objectpath.Tree
Sofia (indra_world.sources.sofia
)
Sofia is a general purpose natural language processing system developed at UPitt and CMU by N. Miskov et al.
API (indra_world.sources.sofia.api
)
- indra_world.sources.sofia.api.process_json(json_obj, extract_filter=None, grounding_mode=None)[source]
Return processor by processing a JSON object returned by Sofia.
- Parameters
json_obj (json) – A JSON object containing extractions from Sofia.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_json_file(fname, extract_filter=None, grounding_mode='flat')[source]
Return processor by processing a JSON file produced by Sofia.
- Parameters
fname (str) – The name of the JSON file to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_table(fname, extract_filter=None, grounding_mode='flat')[source]
Return processor by processing a given sheet of a spreadsheet file.
- Parameters
fname (str) – The name of the Excel file (typically .xlsx extension) to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.
- Return type
indra.sources.sofia.processor.SofiaProcessor
- indra_world.sources.sofia.api.process_text(text, out_file='sofia_output.json', auth=None, extract_filter=None, grounding_mode='flat')[source]
Return processor by processing text given as a string.
- Parameters
text (str) – A string containing the text to be processed with Sofia.
out_file (Optional[str]) – The path to a file to save the reader’s output into. Default: sofia_output.json
auth (Optional[list]) – A username/password pair for the Sofia web service. If not given, the SOFIA_USERNAME and SOFIA_PASSWORD values are loaded from either the INDRA config or the environment.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute. If the API did not process the text, None is returned.
- Return type
indra.sources.sofia.processor.SofiaProcessor
Processor (indra_world.sources.sofia.processor
)
- class indra_world.sources.sofia.processor.SofiaExcelProcessor(relation_rows, event_rows, entity_rows, **kwargs)[source]
Bases:
indra_world.sources.sofia.processor.SofiaProcessor
An Excel processor extracting statements from reading done by Sofia
- extract_events(event_rows, relation_rows)[source]
Extract Event statements of a Sofia document in Excel format
- class indra_world.sources.sofia.processor.SofiaJsonProcessor(jd, **kwargs)[source]
Bases:
indra_world.sources.sofia.processor.SofiaProcessor
A JSON processor extracting statements from reading done by Sofia
- class indra_world.sources.sofia.processor.SofiaProcessor(score_cutoff=None, grounding_mode='flat')[source]
Bases:
object
A processor extracting statements from reading done by Sofia
- get_event(event_entry)[source]
Get an Event with the pre-set grounding mode
The grounding mode is set at initialization of the class and is stored in the attribute grounding_mode.
CWMS (indra_world.sources.cwms
)
CWMS is a variant of the TRIPS system. It is a general purpose natural language understanding system with applications in world modeling. For more information, see: http://trips.ihmc.us/parser/cgi/cwmsreader
API (indra_world.sources.cwms.api
)
- indra_world.sources.cwms.api.process_ekb(ekb_str, extract_filter=None, grounding_mode='flat')[source]
Processes an EKB string produced by CWMS.
- Parameters
ekb_str (str) – EKB string to process
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type
indra.sources.cwms.CWMSProcessor
- indra_world.sources.cwms.api.process_ekb_file(fname, extract_filter=None, grounding_mode='flat')[source]
Processes an EKB file produced by CWMS.
- Parameters
fname (str) – Path to the EKB file to process.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type
indra.sources.cwms.CWMSProcessor
- indra_world.sources.cwms.api.process_text(text, save_xml='cwms_output.xml', extract_filter=None, grounding_mode='flat')[source]
Processes text using the CWMS web service.
- Parameters
text (str) – Text to process
save_xml (Optional[str]) – A file name in which to dump the output from CWMS. Default: cwms_output.xml
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’ and ‘migration’. If not given, only Influences are extracted since processing other relation types can be time consuming. This argument can be used if the extraction of other relation types such as Events are also of interest.
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
cp – A CWMSProcessor, which contains a list of INDRA statements in its statements attribute.
- Return type
indra.sources.cwms.CWMSProcessor
Processor (indra_world.sources.cwms.processor
)
- class indra_world.sources.cwms.processor.CWMSProcessor(xml_string)[source]
Bases:
object
The CWMSProcessor currently extracts causal relationships between terms (nouns) in EKB. In the future, this processor can be extended to extract other types of relations, or to extract relations involving events.
For more details on the TRIPS EKB XML format, see http://trips.ihmc.us/parser/cgi/drum
- Parameters
xml_string (str) – A TRIPS extraction knowledge base (EKB) in XML format as a string.
- tree
An ElementTree object representation of the TRIPS EKB XML.
DART (indra_world.sources.dart
)
API (indra_world.sources.dart.api
)
Client (indra_world.sources.dart.client
)
A client for accessing reader output from the DART system.
- class indra_world.sources.dart.client.DartClient(storage_mode='web', dart_url=None, dart_uname=None, dart_pwd=None, local_storage=None)[source]
A client for the DART web service with optional local storage.
- Parameters
storage_mode (Optional[str]) – If web, the configured DART URL and credentials are used to communicate with the DART web service. If local, a local storage is used to access and store reader outputs.
dart_url (Optional[str]) – The DART service URL. If given, it overrides the DART_WM_URL configuration value.
dart_uname (Optional[str]) – The DART service user name. If given, it overrides the DART_WM_USERNAME configuration value.
dart_pwd (Optional[str]) – The DART service password. If given, it overrides the DART_WM_PASSWORD configuration value.
local_storage (Optional[str]) – A path that points to a folder for local storage. If the storage_mode is web, this local_storage is used as a local cache. If the storage_mode is local, it is used as the primary location to access reader outputs. If given, it overrides the INDRA_WM_CACHE configuration value.
- cache_record(record, overwrite=False)[source]
Download and cache a given record in local storage.
- Parameters
record (dict) – A DART record.
- cache_records(records, overwrite=False)[source]
Download and cache a list of records in local storage.
- download_output(storage_key)[source]
Return content from the DART web service based on its storage key.
- get_outputs_from_records(records)[source]
Return reader outputs corresponding to a list of records.
- get_reader_output_records(readers=None, versions=None, document_ids=None, timestamp=None)[source]
Return reader output metadata records by querying the DART API
- Query json structure:
{“readers”: [“MyAwesomeTool”, “SomeOtherAwesomeTool”], “versions”: [“3.1.4”, “1.3.3.7”], “document_ids”: [“qwerty1234”, “poiuyt0987”], “timestamp”: {“before”: “yyyy-mm-dd”|”yyyy-mm-dd hh:mm:ss”, “after”: “yyyy-mm-dd”|”yyyy-mm-dd hh:mm:ss”, “on”: “yyyy-mm-dd”}}
- Parameters
readers (list) – A list of reader names
versions (list) – A list of versions to match with the reader name(s)
document_ids (list) – A list of document identifiers
timestamp (dict("on"|"before"|"after",str)) – The timestamp string must of format “yyyy-mm-dd” or “yyyy-mm-dd hh:mm:ss” (only for “before” and “after”).
- Returns
The JSON payload of the response from the DART API
- Return type
- indra_world.sources.dart.client.prioritize_records(records, priorities=None)[source]
Return unique records per reader and document prioritizing by version.
- Parameters
records (list of dict) – A list of records returned from the reader output query.
priorities (dict of list) – A dict keyed by reader names (e.g., cwms, eidos) with values representing reader versions in decreasing order of priority.
- Returns
records – A list of records that are unique per reader and document, picked by version priority when multiple records exist for the same reader and document.
- Return type
list of dict
Knowledge assembly modules (indra_world.assembly
)
Statement preprocessing (indra_world.assembly.preprocess
)
Assembly operations (indra_world.assembly.operations
)
- class indra_world.assembly.operations.CompositionalRefinementFilter(ontology, nproc=None)[source]
-
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
set of int
- indra_world.assembly.operations.get_expanded_events_influences(stmts)[source]
Return a list of all standalone events from a list of statements.
- indra_world.assembly.operations.location_matches_compositional(stmt)[source]
Return a matches_key which takes geo-location into account.
- indra_world.assembly.operations.location_refinement_compositional(st1, st2, ontology, entities_refined=True)[source]
Return True if there is a location-aware refinement between stmts.
- indra_world.assembly.operations.make_display_name(comp_grounding)[source]
Return display name from a compositional grounding with ‘of’ linkers.
- indra_world.assembly.operations.make_display_name_linear(comp_grounding)[source]
Return display name from compositional grounding with linear joining.
- indra_world.assembly.operations.merge_deltas(stmts_in)[source]
Gather and merge original Influence delta information from evidence.
This function is only applicable to Influence Statements that have subj and obj deltas. All other statement types are passed through unchanged. Polarities and adjectives for subjects and objects respectivey are collected and merged by travesrsing all evidences of a Statement.
- Parameters
stmts_in (list[indra.statements.Statement]) – A list of INDRA Statements whose influence deltas should be merged. These Statements are meant to have been preassembled and potentially have multiple pieces of evidence.
- Returns
stmts_out – The list of Statements now with deltas merged at the Statement level.
- Return type
list[indra.statements.Statement]
Matches functions (indra_world.assembly.matches
)
- indra_world.assembly.matches.event_location_time_matches(event)[source]
Return Event matches key which takes location and time into account.
- indra_world.assembly.matches.get_location(stmt)[source]
Return the grounded geo-location context associated with a Statement.
- indra_world.assembly.matches.get_location_from_object(loc_obj)[source]
Return geo-location from a RefContext location object.
- indra_world.assembly.matches.get_time(stmt)[source]
Return the time context associated with a Statement.
- indra_world.assembly.matches.has_location(stmt)[source]
Return True if a Statement has grounded geo-location context.
Refinement functions (indra_world.assembly.refinement
)
- class indra_world.assembly.refinement.CompositionalRefinementFilter(ontology, nproc=None)[source]
-
Return a set of statement hashes that a given statement is potentially related to.
- Parameters
stmt (indra.statements.Statement) – The INDRA statement whose potential relations we want to filter.
possibly_related (set or None) – A set of statement hashes that this statement is potentially related to, as determined by some other filter. If this parameter is a set (including an empty set), this function should return a subset of it (intuitively, this filter can only further eliminate some of the potentially related hashes that were previously determined to be potential relations). If this argument is None, the function must assume that no previous filter was run before, and should therefore return all the possible relations that it determines.
direction (str) – One of ‘less_specific’ or ‘more_specific. Since refinements are directed relations, this function can operate in two different directions: it can either find less specific potentially related stateemnts, or it can find more specific potentially related statements, as determined by this argument.
- Returns
A set of INDRA Statement hashes that are potentially related to the given statement.
- Return type
set of int
- indra_world.assembly.refinement.event_location_refinement(st1, st2, ontology, entities_refined, ignore_polarity=False)[source]
Return True if there is a location-aware refinement between Events.
- indra_world.assembly.refinement.event_location_time_refinement(st1, st2, ontology, entities_refined)[source]
Return True if there is a location/time refinement between Events.
- indra_world.assembly.refinement.get_agent_key(agent, comp_idx)[source]
Return a key for an Agent for use in refinement finding.
- Parameters
agent (indra.statements.Agent or None) – An INDRA Agent whose key should be returned.
- Returns
The key that maps the given agent to the ontology, with special handling for ungrounded and None Agents.
- Return type
tuple or None
- indra_world.assembly.refinement.location_refinement(st1, st2, ontology, entities_refined)[source]
Return True if there is a location-aware refinement between stmts.
Incremental Assembler (indra_world.assembly.incremental_assembler
)
- class indra_world.assembly.incremental_assembler.AssemblyDelta(new_stmts, new_evidences, new_refinements, beliefs, matches_fun=None)[source]
Represents changes to the assembly structure as a result of new statements added to a set of existing statements.
- new_evidences
A dict of new evidences for existing or new statements keyed by statement hash.
- beliefs
A dict of belief scores keyed by all statement hashes (both old and new).
- class indra_world.assembly.incremental_assembler.IncrementalAssembler(prepared_stmts, refinement_filters=None, matches_fun=<function location_matches_compositional>, curations=None, post_processing_steps=None, ontology=<indra_world.ontology.ontology.WorldOntology object>)[source]
Assemble a set of prepared statements and allow incremental extensions.
- Parameters
prepared_stmts (list[indra.statements.Statement]) – A list of prepared INDRA Statements.
refinement_filters (Optional[list[indra.preassembler.refinement.RefinementFilter]]) – A list of refinement filter classes to be used for refinement finding. Default: the standard set of compositional refinement filters.
matches_fun (Optional[function]) – A custom matches function for determining matching statements and calculating hashes. Default: matches function that takes compositional grounding and location into account.
curations (list[dict]) – A list of user curations to be integrated into the assembly results.
post_processing_steps (list[dict]) – Steps that can be used in an INDRA AssemblyPipeline to do post-processing on statements.
- refinement_edges
A set of tuples of statement hashes representing refinement links between statements.
- Type
- add_statements(stmts)[source]
Add new statements for incremental assembly.
- Parameters
stmts (list[indra.statements.Statement]) – A list of new prepared statements to be incrementally assembled into the set of existing statements.
- Returns
An AssemblyDelta object representing the changes to the assembly as a result of the new added statements.
- Return type
- static build_refinements_graph(stmts_by_hash, refinement_edges)[source]
Return a refinements graph based on statements and refinement edges.
Statistics (indra_world.assembly.stats
)
Ontology Module (indra_world.ontology
)
Module containing the implementation of an IndraOntology for the World Modelers use case.
World Ontology (indra_world.ontology.ontology
)
- class indra_world.ontology.ontology.WorldOntology(url)[source]
Represents the ontology used for World Modelers applications.
- Parameters
url (str) – The URL or file path pointing to a World Modelers ontology YAML.
- add_entry(entry, examples=None, neg_examples=None)[source]
Add a new ontology entry with examples.
This works by adding the entry to the yml attribute first and then reloading the entire yaml to build a new graph.
- Parameters
entry (str) – The new entry.
examples (Optional[list of str]) – Examples for the new entry.
neg_examples (Optional[list of str]) – Negative examples for the new entry.
- build_relations(node, tree, prefix)[source]
Build relations for the classic ontology format <= v3.0
- build_relations_new_format(node, prefix)[source]
Build relations for the new ontology format > v3.0
Belief Engine (indra_world.belief
)
- indra_world.belief.get_eidos_bayesian_scorer(prior_counts=None)[source]
Return a BayesianScorer based on Eidos curation counts.
- Returns
A BayesianScorer belief scorer instance.
- Return type
scorer
Output assemblers (indra_world.assemblers
)
As opposed to INDRA, the importance of output/model assemblers in INDRA World is minor, since other systems such as Delphi (https://github.com/ml4ai/delphi) and DySE (https://dl.acm.org/doi/10.1145/3359115.3359123) take on the role of converting assembled INDRA Statements into probabilistic and logical dynamical models, respectively.
CAG Assembler (indra_world.assemblers.cag
)
Assemble simple graphs of assembled INDRA Statements that can be embedded into websites or notebooks.
Assembler (indra_world.assemblers.cag.assembler
)
- class indra_world.assemblers.cag.assembler.CAGAssembler(stmts=None)[source]
Assembles a causal analysis graph from INDRA Statements.
- Parameters
stmts (Optional[list[indra.statement.Statements]]) – A list of INDRA Statements to be assembled. Currently supports Influence Statements.
- CAG
A networkx MultiDiGraph object representing the causal analysis graph.
- Type
nx.MultiDiGraph
- export_to_cytoscapejs()[source]
Return CAG in format readable by CytoscapeJS.
- Returns
A JSON-like dict representing the graph for use with CytoscapeJS.
- Return type
- generate_jupyter_js(cyjs_style=None, cyjs_layout=None)[source]
Generate Javascript from a template to run in Jupyter notebooks.
- Parameters
cyjs_style (Optional[dict]) – A dict that sets CytoscapeJS style as specified in https://github.com/cytoscape/cytoscape.js/blob/master/documentation/md/style.md.
cyjs_layout (Optional[dict]) – A dict that sets CytoscapeJS layout parameters.
- Returns
A Javascript string to be rendered in a Jupyter notebook cell.
- Return type
Figaro Assembler (indra_world.assemblers.figaro
)
A proof-of-concept assembler for INDRA Statements into probabilistic programs in the Figaro (https://github.com/p2t2/figaro) framework.
Assembler (indra_world.assemblers.figaro.assembler
)
TSV Assembler (indra_world.assemblers.tsv
)
Assemble tab separated spreadsheets of assembled INDRA Statements for curation purposes.
Indra World Service (indra_world.service
)
Note: This is the documentation of the codebase used in the INDRA World service. Documentation of the service API can be found here.
INDRA World Database (indra_world.service.db
)
Database Manager (indra_world.service.db.manager
)
- class indra_world.service.db.manager.DbManager(url)[source]
Manages transactions with the assembly database and exposes an API for various operations.
- add_curation_for_project(project_id, stmt_hash, curation)[source]
Add curations for a given project.
- add_dart_record(reader, reader_version, document_id, storage_key, date, output_version=None, labels=None, tenants=None)[source]
Insert a DART record into the database.
- add_records_for_project(project_id, record_keys)[source]
Add document IDs for a project with the given ID.
- add_statements_for_record(record_key, stmts, indra_version)[source]
Add a set of prepared statements for a given document.
- get_dart_records(reader=None, document_id=None, reader_version=None, output_version=None, labels=None, tenants=None)[source]
Return storage keys for DART records given constraints.
- get_full_dart_records(reader=None, document_id=None, reader_version=None, output_version=None, labels=None, tenants=None)[source]
Return full DART records given constraints.
Database Schema (indra_world.service.db.Schema
)
Service controller (indra_world.service.controller
)
REST API (indra_world.service.app
)
- class indra_world.service.app.EidosProcessJsonld(api=None, *args, **kwargs)[source]
- post()[source]
Process an EIDOS JSON-LD and return INDRA Statements.
- Parameters
jsonld (str) – The JSON-LD string to be processed.
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
statements – A list of extracted INDRA Statements.
- Return type
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.EidosProcessText(api=None, *args, **kwargs)[source]
- post()[source]
Process text with EIDOS and return INDRA Statements.
- Parameters
text (str) – The text to be processed.
webservice (Optional[str]) – An Eidos reader web service URL to send the request to. If None, the reading is assumed to be done with the Eidos JAR rather than via a web service. Default: None
grounding_ns (Optional[list]) – A list of name spaces for which INDRA should represent groundings, when given. If not specified or None, all grounding name spaces are propagated. If an empty list, no groundings are propagated. Example: [‘UN’, ‘WM’], Default: None
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
statements – A list of extracted INDRA Statements.
- Return type
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.Notify(api=None, *args, **kwargs)[source]
- post()[source]
Add and process DART record.
- Parameters
identity (str) – Name of the reader.
version (str) – Reader version.
document_id (str) – ID of a document to process.
storage_key (str) – Key to store the record with.
output_version (str) – The output version (typically ontology version).
labels (list of str) – A list of labels for the output.
tenants (list of str) – A list of tenants for the output.
- class indra_world.service.app.SofiaProcessJson(api=None, *args, **kwargs)[source]
- post()[source]
Process a Sofia JSON and return INDRA Statements.
- Parameters
json (str) – The JSON string to be processed.
extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’, ‘association’, ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None
grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.
- Returns
statements – A list of extracted INDRA Statements.
- Return type
list[indra.statements.Statement.to_json()]
- class indra_world.service.app.SubmitCurations(api=None, *args, **kwargs)[source]
- post()[source]
Submit curations.
- Parameters
- Returns
mappings – For any statement matches hashes that have changed due to the curations submitted here, the new hash (after applying the curation) is given. Statements whose hash didn’t change, or if a curation for some reason couldn’t be applied, the given statement is not added to the return value.
- Return type
Corpus manager (indra_world.service.corpus_manager
)
This module allows running one-off assembly on a set of DART records (i.e., reader outputs) into a ‘seed corpus’ that can be dumped on S3 for loading into CauseMos.
- class indra_world.service.corpus_manager.CorpusManager(db_url, dart_records, corpus_id, metadata, dart_client=None)[source]
Corpus manager class allowing running assembly on a set of DART records.
- assemble()[source]
Run assembly on the prepared statements.
This function loads all the prepared statements associated with the corpus and then runs assembly on them.
- prepare(records_exist=False)[source]
Run the preprocessing pipeline on statements.
This function adds the new corpus to the DB, adds records to the new corpus, then processes the reader outputs for those records into statements, preprocesses the statements, and then stores these prepared statements in the DB.
INDRA World Dashboard (indra_world.dashboard
)
The dashboard provides a simple web-based interface to run INDRA World assembly by searching for reader outputs under some constraints, specify assembly configurations, and parameters for the output (name, description, etc.).