mirror of
https://kevinblog.sytes.net/Code/Jibo-Revival-Group/JiboExperiments.git
synced 2026-06-16 14:16:17 +00:00
more fixes for testing
This commit is contained in:
@@ -19,6 +19,7 @@ It is intentionally broader than the current Node server. The Node server is a p
|
||||
|
||||
- expand HTTP `X-Amz-Target` coverage from observed traffic and fixtures
|
||||
- grow WebSocket compatibility from stub acceptance into realistic turn orchestration
|
||||
- keep websocket parity fixture-driven, starting with exact sequencing and payload-shape fidelity for the successful joke vertical slice before claiming broader skill coverage
|
||||
- replace in-memory state with Azure SQL-backed persistence
|
||||
- add structured fixture replay tests
|
||||
- harden region/bootstrap docs by software version
|
||||
@@ -34,6 +35,26 @@ We still need to map more than the current Node server expresses. Priority disco
|
||||
- upload, logging, backup, and key-sharing flows
|
||||
- per-version configuration differences and region handling
|
||||
|
||||
## Current WebSocket Discovery Focus
|
||||
|
||||
The next fixture-driven websocket work should continue to separate three buckets:
|
||||
|
||||
- discovered behavior
|
||||
Grounded by the Node oracle, sanitized fixtures, and live captures
|
||||
- implemented parity
|
||||
Only the narrow slices currently replayed and tested in `.NET`
|
||||
- future hypotheses
|
||||
Ideas to investigate later, but not behaviors to silently bake into the hosted cloud
|
||||
|
||||
Right now the strongest implemented vertical slice beyond basic listen completion is the successful joke turn:
|
||||
|
||||
- `CLIENT_ASR` transcript-carrying turn completion
|
||||
- synthetic `LISTEN` result shaping
|
||||
- `EOS`
|
||||
- delayed joke `SKILL_ACTION`
|
||||
|
||||
That should remain the model for future websocket work: capture first, fixture second, parity third.
|
||||
|
||||
## Speech, Animation, And ESML
|
||||
|
||||
The current joke flow is only a small foothold into Jibo expressiveness.
|
||||
|
||||
@@ -74,6 +74,7 @@ The current .NET pass covers only a narrow, explicitly synthetic subset of obser
|
||||
- `EOS` emission after completed turns
|
||||
- delayed `SKILL_ACTION` emission after `EOS` on completed turn flows to better match the Node oracle timing
|
||||
- first richer vertical slice for joke/chat `SKILL_ACTION` playback
|
||||
- fixture-backed joke-turn payload fidelity for `CLIENT_ASR -> LISTEN -> EOS -> delayed SKILL_ACTION`, including Node-like `EOS` envelope fields and the currently observed joke `SKILL_ACTION` metadata shape
|
||||
|
||||
This does not yet mean parity for:
|
||||
|
||||
@@ -81,8 +82,32 @@ This does not yet mean parity for:
|
||||
- real STT provider integration and external ASR lifecycle timing
|
||||
- early-EOS behavior
|
||||
- multi-step skill lifecycles beyond the current synthetic playback response
|
||||
- broad `SKILL_ACTION` payload coverage outside the currently observed joke/chat playback slice
|
||||
- broader interaction, animation, or ESML command families
|
||||
|
||||
### Successful Joke Turn: What Is Grounded Now
|
||||
|
||||
The highest-confidence websocket vertical slice after the starter parity pass is now:
|
||||
|
||||
- inbound `CLIENT_ASR` carrying `"tell me a joke"`
|
||||
- outbound synthetic `LISTEN` result with joke intent and remembered rules
|
||||
- outbound `EOS` carrying `ts`, `msgID`, `transID`, and an empty `data` object
|
||||
- outbound `SKILL_ACTION` about 75 ms later
|
||||
- joke `SKILL_ACTION` payload shape aligned with the Node oracle for:
|
||||
- `data.skill.id = "@be/joke"`
|
||||
- `data.action.config.jcp.type = "SLIM"`
|
||||
- `data.action.config.jcp.config.play.meta.prompt_id = "RUNTIME_PROMPT"`
|
||||
- `data.action.config.jcp.config.play.meta.prompt_sub_category = "AN"`
|
||||
- `data.action.config.jcp.config.play.meta.mim_id = "runtime-joke"`
|
||||
- `data.action.config.jcp.config.play.meta.mim_type = "announcement"`
|
||||
|
||||
What remains intentionally unclaimed for that slice:
|
||||
|
||||
- whether the joke payload is complete beyond those fields
|
||||
- whether other successful skills use the same payload shape
|
||||
- whether additional websocket messages appear in other successful skill paths
|
||||
- whether any timing gaps besides the observed 75 ms `EOS -> SKILL_ACTION` delay matter
|
||||
|
||||
Current raw-audio fallback behavior remains explicitly synthetic:
|
||||
|
||||
- when a buffered-audio turn can be resolved through the synthetic transcript-hint seam, `.NET` now auto-finalizes and emits `LISTEN` + `EOS` + `SKILL_ACTION`
|
||||
|
||||
Reference in New Issue
Block a user