Global Congress Data Collection Goal

Goal MG-CONGRESS-01

Objective: build and maintain a global, evidence-backed corpus of multiday congress-family events for Mi Gente.

Scope includes:

Congress
Weekender
Festival
Marathon
Retreat
Cruise
Dance Vacation

Scope and exclusions

Include: confirmed or highly probable multiday events with verifiable sources and date ranges.
Exclude by default: Bootcamp and any single-day records unless product policy is changed.
Geography: global.
Source quality: official websites, ticketing pages, organizer pages, and official social channels.
Confidence levels: 95–100 (Level 1), 75–94 (Level 2), 40–74 (Level 3), 0–39 (Level 4).
Core-corpus candidate logic: enforced by SQL in 20260530000000_migente_event_data_dictionary_v1.sql.

Multi-agent setup

This program is split into three lanes:

Architecture Agents

Own schema validity, migration health, triggers, generated views, docs, and release evidence.
Maintain corpus rules (Bootcamp and single-day exclusions), constraints, and FK integrity.
Own PR review expectations for architecture changes and maintain release safety notes.

Data-collection Agents

Own research, source map development, and batch ingestion.
Produce mobilis-research-upload/v2 YAML with canonical fields, raw/rich facts, field-level evidence, source links, media, and exclusion reasons for non-core records.
Preserve legacy candidate rows only as reconciliation inputs, not as the ongoing research contract.

Quality Agent

Run audit checks against the service-role-only public.migente_worldwide_congress_candidate_view.
Detect date, scope, and confidence exceptions.
Convert quality gaps into issue-level follow-up tasks.

Public launch pages must use the unified public.migente_public_routes_view, public.migente_public_listings_view, and public.migente_public_listing_details_view projections.

Use the dedicated issue templates:

Use the execution playbook:

Mi Gente Worldwide Congress Multi-Agent Program
Agent prompts: docs/migente-worldwide-congress-agent-prompts.md

First-pass overnight posture

Run only high-confidence, multiday entries with official or high-quality evidence in the first pass.
Skip and backlog rows that are expensive to validate (login walls, inconsistent source formats, weak provenance, repeated failures).
Keep backlog_only and needs_review explicit with causes and next action.
Continue the shard until the row-level hard-stop signals trigger.

v2 YAML workflow (recommended)

Overnight collection writes v2 YAML and review outputs only to:

tmp/migente-congress-runs/<shard>/<run-id>/

Required files per shard run:

mobilis-research-upload-v2.yaml
exceptions_or_backlog.csv
validation_summary.md
mobilis-research-upload-v2-preview.json
mobilis-research-upload-v2-promote-preview.json

Then run local preview and promotion dry-run:

cd workers/playwright-source-scanner
export SUPABASE_URL=https://local-test-placeholder.supabase.co
export SUPABASE_SERVICE_ROLE_KEY=local_locked_preview_placeholder

npm run dev -- mobilis-research-upload-preview \
  --file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2.yaml \
  --out-file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2-preview.json

npm run dev -- mobilis-research-upload-promote \
  --file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2.yaml \
  --out-file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2-promote-preview.json

Model and budget posture

Default per-batch model: OPENAI_TEXT_MODEL=gpt-4.1-mini.
Keep extraction in low-cost mode by default for overnight bulk collection.
Use stronger models only when explicitly approved for a blocked batch or a hard-to-parse source class.
Record model and quality thresholds in each batch issue header.

Public launch readiness

Supabase remains canonical for public congress data.
First-pass v2 YAML feeds review, import preview, media discovery, and public promotion preview before any write.
Public publication state is separate from validation state: draft, discovery, published, needs_review, hidden, archived.
discovery rows may appear publicly when they have source evidence, multiday dates, location, style, and confidence >= 75.
published rows require stricter readiness, no duplicate risk, confidence >= 85, and full public fields.
Existing migentedance.com and migentedmv.com URLs are captured as legacy URL mappings for dedupe and future redirects.

Phase 1 (Architecture)

Finalize corpus eligibility rules in SQL triggers and documented assumptions.
Publish migration + seed runbook in workers/playwright-source-scanner/supabase/.
Add ERD and issue-level validation docs to portal references.
Add milestone check that all architecture tasks are in GitHub issues labeled with ready-for-codex when Codex execution starts.

Phase 1 (Data collection)

Build a country-by-country source list for congress-family events.
Prioritize all active 2026–2028 known annual congresses.
For each event: ingest event brand, edition, dates, primary links, and venue.
Add initial ticket and hotel references only for confident matches.
Exclude bootcamps and single-day events from corpus eligibility, but preserve in research history where useful.
Add validation_status tagging for first-pass rows (pass, needs_review, backlog_only).
Generate v2 YAML review outputs before Supabase import (mobilis-research-upload-v2-preview.json, mobilis-research-upload-v2-promote-preview.json, backlog CSV, validation summary).
Review public readiness through the v2 preview/promotion reports before publishing anything on alpha.migentedance.com.

Phase 2 (Quality and backlog)

Add periodic consistency checks (core candidates, multiday checks, duplicates).
Create issues for missing sources and unresolved event type classification.
Set acceptance criteria for “worldwide complete enough” by continent and year.

Recommended labels and handoff

mg-congress-01
migente
world-congress
phase-1, phase-2, phase-3
architecture or data-collection
data-quality
ready-for-codex

For each batch:

include one data-collection issue for a country cluster,
include one architecture issue if rule exceptions are required,
add a weekly data-quality issue linked to audit snapshots.

GitHub tracking guidance

Create one issue per major city region when onboarding 25+ events.
Use labels like migente-world-congress, phase-1, data-collection for architecture split.
Attach SQL results snapshots to each issue (at minimum: event, edition, and social link rows).

Goal MG-CONGRESS-01​

Scope and exclusions​

Multi-agent setup​

Architecture Agents​

Data-collection Agents​

Quality Agent​

First-pass overnight posture​

v2 YAML workflow (recommended)​

Model and budget posture​

Public launch readiness​

Phase 1 (Architecture)​

Phase 1 (Data collection)​

Phase 2 (Quality and backlog)​

Recommended labels and handoff​

GitHub tracking guidance​