Sofia (indra_world.sources.sofia)

Sofia is a general purpose natural language processing system developed at UPitt and CMU by N. Miskov et al.

API (indra_world.sources.sofia.api)

indra_world.sources.sofia.api.process_json(json_obj, extract_filter=None, grounding_mode=None)[source]

Return processor by processing a JSON object returned by Sofia.

Parameters:
  • json_obj (json) – A JSON object containing extractions from Sofia.

  • extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None

  • grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.

Returns:

sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.

Return type:

indra.sources.sofia.processor.SofiaProcessor

indra_world.sources.sofia.api.process_json_file(fname, extract_filter=None, grounding_mode='compositional')[source]

Return processor by processing a JSON file produced by Sofia.

Parameters:
  • fname (str) – The name of the JSON file to process

  • extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None

  • grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.

Returns:

A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.

Return type:

indra.sources.sofia.processor.SofiaProcessor

indra_world.sources.sofia.api.process_table(fname, extract_filter=None, grounding_mode='compositional')[source]

Return processor by processing a given sheet of a spreadsheet file.

Parameters:
  • fname (str) – The name of the Excel file (typically .xlsx extension) to process

  • extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None

  • grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.

Returns:

sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute.

Return type:

indra.sources.sofia.processor.SofiaProcessor

indra_world.sources.sofia.api.process_text(text, out_file='sofia_output.json', auth=None, extract_filter=None, grounding_mode='compositional')[source]

Return processor by processing text given as a string.

Parameters:
  • text (str) – A string containing the text to be processed with Sofia.

  • out_file (Optional[str]) – The path to a file to save the reader’s output into. Default: sofia_output.json

  • auth (Optional[list]) – A username/password pair for the Sofia web service. If not given, the SOFIA_USERNAME and SOFIA_PASSWORD values are loaded from either the INDRA config or the environment.

  • extract_filter (Optional[list]) – A list of relation types to extract. Valid values in the list are ‘influence’ and ‘event’. If not given, all relation types are extracted. This argument can be used if, for instance, only Influence statements are of interest. Default: None

  • grounding_mode (Optional[str]) – Selects whether ‘flat’ or ‘compositional’ groundings should be extracted. Default: ‘flat’.

Returns:

sp – A SofiaProcessor object which has a list of extracted INDRA Statements as its statements attribute. If the API did not process the text, None is returned.

Return type:

indra.sources.sofia.processor.SofiaProcessor

Processor (indra_world.sources.sofia.processor)

class indra_world.sources.sofia.processor.SofiaExcelProcessor(relation_rows, event_rows, entity_rows, **kwargs)[source]

Bases: SofiaProcessor

An Excel processor extracting statements from reading done by Sofia

extract_events(event_rows, relation_rows)[source]

Extract Event statements of a Sofia document in Excel format

Parameters:
Return type:

None

extract_relations(relation_rows)[source]

Extract Influence statements from relation events

Parameters:

relation_rows (Iterator[Tuple[Cell, ...]]) – The extracted relation data from an Excel document

Return type:

None

process_events(event_rows)[source]

Process the events of Sofia document extractions in Excel format

Parameters:

event_rows (Iterator[Tuple[Cell, ...]]) – The extracted event data from an Excel document

Returns:

A dict of event keyed by their event index

Return type:

processed_event_dict

class indra_world.sources.sofia.processor.SofiaJsonProcessor(jd, **kwargs)[source]

Bases: SofiaProcessor

A JSON processor extracting statements from reading done by Sofia

extract_events(jd)[source]

Extract Event statements from a Sofia document extraction

Parameters:

jd (Dict[str, str]) – A dictionary with document extractions

Return type:

None

extract_relations(jd)[source]

Extract Influence statements from a Sofia document extraction

Parameters:

jd (Dict[str, Any]) – A dictionary with document extractions

Return type:

None

process_entities(jd)[source]

Process the entities of a Sofia document extraction

Parameters:

jd (Dict[str, Any]) – The extracted data from a document

Returns:

A dictionary of processed entities keyed by their entity index

Return type:

ent_dict

process_events(jd)[source]

Process the event of a Sofia document extraction

Parameters:

jd (Dict[str, Any]) – The extracted data from a document

Returns:

A dictionary of processed events keyed by their event index

Return type:

processed_event_dict

class indra_world.sources.sofia.processor.SofiaProcessor(score_cutoff=None, grounding_mode='compositional')[source]

Bases: object

A processor extracting statements from reading done by Sofia

get_compositional_grounding(event_entry)[source]

Get the compositional grounding for an event

Parameters:

event_entry (Dict[str, str]) – The event to get the compositional grounding for

Returns:

The name of the grounding and a tuple of representing the compositional grounding

Return type:

grounding

get_event(event_entry)[source]

Get an Event with the pre-set grounding mode

The grounding mode is set at initialization of the class and is stored in the attribute grounding_mode.

Parameters:

event_entry (Dict[str, str]) – The event to process

Returns:

An Event statement

Return type:

event

get_event_compositional(event_entry)[source]

Get an Event with compositional grounding

Parameters:

event_entry (Dict[str, str]) – The event to process

Returns:

An Event statement

Return type:

event

get_event_flat(event_entry)[source]

Get an Event with flattened grounding

Parameters:

event_entry (Dict[str, str]) – The event to process

Returns:

An Event statement

Return type:

event

get_meaningful_events(raw_event_dict)[source]

Process events by extracting polarity

Parameters:

raw_event_dict (Dict[str, Any]) – A dict of events to process

Returns:

A dict of event data

Return type:

processed_event_dict

get_relation_events(rel_dict)[source]

Get a list of the event indices associated with a causal entry

Parameters:

rel_dict (Dict[str, str]) – A causal entry to extract event indices from

Returns:

A list of event indices

Return type:

relation_events