curiosity-ai/catalyst

Since build 1.0.38482, splitting a text into Spans is no longer deterministic.

Opened this issue · 0 comments

Describe the bug
IDocument Spans produces a list of spans within a document. This should be deterministic: splitting the same IDocument any number of times should produce the same result. Creating an IDocument with identical text should always result in the same Spans collection. This works correctly up to and including Nuget package version 1.0.38431. From v1.0.38482 to the current version it produces variable results for identical inputs with each run.

To Reproduce

  • Using a build >= 1.0.38482 create an IDocument from any text with multiple sentences. (We used a 915-word, 14-sentence block.)
  • Access and trace the spans created.
  • Send the same text again and again access and trace the spans created.

Expected behavior

  • For identical inputs, the output should be identical. Observed behaviour is that spanning varies considerably.

Sample Outputs
(First few lines of identical text input - traced to Visual Studio Debug window. IDocument is created, then Spans property is accessed.)

**FAULTY (Build : 1.0.38482 ) **

RUN A:
09:05:41:328 What We Offer
09:05:41:328 Create more personal computing.
09:05:41:578 Reinvent productivity and business processes.
09:05:41:578 Build the intelligent cloud and intelligent edge platform.
09:05:41:578 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:05:41:578 Founded in 1975, we develop and support software, services, devices, and
09:05:41:578 solutions that deliver new value for customers and help people and businesses realize their full potential.
09:05:41:578 We're committed to making the promise of AI real and doing it responsibly.
09:05:41:578 At Microsoft, we provide technology and resources to help our customers create a secure
09:05:41:578 Our work is guided by a core set of principles: fairness, reliability and safety, privacy and security, inclusiveness, transparency, and accountability.
09:05:41:578 , productive work environment.

RUN B:
09:05:42:082 What We Offer
09:05:42:082 Create more personal computing.
09:05:42:082 Build the intelligent cloud and intelligent edge platform.
09:05:42:082 Reinvent productivity and business processes.
09:05:42:082 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:05:42:082 Founded in 1975, we develop and support software, services, devices, and solutions that deliver new value for customers and help people and businesses realize their full potential.
09:05:42:082 We offer an array of services, including cloud-based solutions that provide customers with software, services, platforms, and content, and we provide solution support and consulting services.
09:05:42:082 At Microsoft, we provide technology and resources to help our customers create a secure, productive work environment.

CORRECT (Build: 1.0.34831)
Text is identical with each run:
09:14:11:865 What We Offer
09:14:11:865 Create more personal computing.
09:14:12:109 Build the intelligent cloud and intelligent edge platform.
09:14:12:109 Reinvent productivity and business processes.
09:14:12:109 To achieve our vision, our research and development efforts focus on three interconnected ambitions:
09:14:12:109 Founded in 1975, we develop and support software, services, devices, and solutions that deliver new value for customers and help people and businesses realize their full potential.
09:14:12:109 At Microsoft, we provide technology and resources to help our customers create a secure, productive work environment.
09:14:12:109 Our family of products plays a key role in the ways the world works, learns, and connects.
09:14:12:109 We're committed to making the promise of AI real and doing it responsibly.
09:14:12:109 We offer an array of services, including cloud-based solutions that provide customers with software, services, platforms, and content, and we provide solution support and consulting services.