Global Congress Data Collection Goal
Goal MG-CONGRESS-01
Objective: build and maintain a global, evidence-backed corpus of multiday congress-family events for Mi Gente.
Scope includes:
CongressWeekenderFestivalMarathonRetreatCruiseDance Vacation
Scope and exclusions
- Include: confirmed or highly probable multiday events with verifiable sources and date ranges.
- Exclude by default:
Bootcampand any single-day records unless product policy is changed. - Geography: global.
- Source quality: official websites, ticketing pages, organizer pages, and official social channels.
- Confidence levels: 95–100 (Level 1), 75–94 (Level 2), 40–74 (Level 3), 0–39 (Level 4).
- Core-corpus candidate logic: enforced by SQL in
20260530000000_migente_event_data_dictionary_v1.sql.
Multi-agent setup
This program is split into three lanes:
Architecture Agents
- Own schema validity, migration health, triggers, generated views, docs, and release evidence.
- Maintain corpus rules (
Bootcampand single-day exclusions), constraints, and FK integrity. - Own PR review expectations for architecture changes and maintain release safety notes.
Data-collection Agents
- Own research, source map development, and batch ingestion.
- Produce
mobilis-research-upload/v2YAML with canonical fields, raw/rich facts, field-level evidence, source links, media, and exclusion reasons for non-core records. - Preserve legacy candidate rows only as reconciliation inputs, not as the ongoing research contract.
Quality Agent
- Run audit checks against the service-role-only
public.migente_worldwide_congress_candidate_view. - Detect date, scope, and confidence exceptions.
- Convert quality gaps into issue-level follow-up tasks.
Public launch pages must use the unified public.migente_public_routes_view, public.migente_public_listings_view, and public.migente_public_listing_details_view projections.
Use the dedicated issue templates:
Use the execution playbook:
- Mi Gente Worldwide Congress Multi-Agent Program
- Agent prompts:
docs/migente-worldwide-congress-agent-prompts.md
First-pass overnight posture
- Run only high-confidence, multiday entries with official or high-quality evidence in the first pass.
- Skip and backlog rows that are expensive to validate (login walls, inconsistent source formats, weak provenance, repeated failures).
- Keep
backlog_onlyandneeds_reviewexplicit with causes and next action. - Continue the shard until the row-level hard-stop signals trigger.
v2 YAML workflow (recommended)
Overnight collection writes v2 YAML and review outputs only to:
tmp/migente-congress-runs/<shard>/<run-id>/
Required files per shard run:
mobilis-research-upload-v2.yamlexceptions_or_backlog.csvvalidation_summary.mdmobilis-research-upload-v2-preview.jsonmobilis-research-upload-v2-promote-preview.json
Then run local preview and promotion dry-run:
cd workers/playwright-source-scanner
export SUPABASE_URL=https://local-test-placeholder.supabase.co
export SUPABASE_SERVICE_ROLE_KEY=local_locked_preview_placeholder
npm run dev -- mobilis-research-upload-preview \
--file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2.yaml \
--out-file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2-preview.json
npm run dev -- mobilis-research-upload-promote \
--file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2.yaml \
--out-file ../../tmp/migente-congress-runs/americas/2026-05-30/mobilis-research-upload-v2-promote-preview.json
Model and budget posture
- Default per-batch model:
OPENAI_TEXT_MODEL=gpt-4.1-mini. - Keep extraction in low-cost mode by default for overnight bulk collection.
- Use stronger models only when explicitly approved for a blocked batch or a hard-to-parse source class.
- Record model and quality thresholds in each batch issue header.
Public launch readiness
- Supabase remains canonical for public congress data.
- First-pass v2 YAML feeds review, import preview, media discovery, and public promotion preview before any write.
- Public publication state is separate from validation state:
draft,discovery,published,needs_review,hidden,archived. discoveryrows may appear publicly when they have source evidence, multiday dates, location, style, and confidence >= 75.publishedrows require stricter readiness, no duplicate risk, confidence >= 85, and full public fields.- Existing
migentedance.comandmigentedmv.comURLs are captured as legacy URL mappings for dedupe and future redirects.
Phase 1 (Architecture)
- Finalize corpus eligibility rules in SQL triggers and documented assumptions.
- Publish migration + seed runbook in
workers/playwright-source-scanner/supabase/. - Add ERD and issue-level validation docs to portal references.
- Add milestone check that all architecture tasks are in GitHub issues labeled with
ready-for-codexwhen Codex execution starts.
Phase 1 (Data collection)
- Build a country-by-country source list for congress-family events.
- Prioritize all active 2026–2028 known annual congresses.
- For each event: ingest event brand, edition, dates, primary links, and venue.
- Add initial ticket and hotel references only for confident matches.
- Exclude bootcamps and single-day events from corpus eligibility, but preserve in research history where useful.
- Add
validation_statustagging for first-pass rows (pass,needs_review,backlog_only). - Generate v2 YAML review outputs before Supabase import (
mobilis-research-upload-v2-preview.json,mobilis-research-upload-v2-promote-preview.json, backlog CSV, validation summary). - Review public readiness through the v2 preview/promotion reports before publishing anything on
alpha.migentedance.com.
Phase 2 (Quality and backlog)
- Add periodic consistency checks (core candidates, multiday checks, duplicates).
- Create issues for missing sources and unresolved event type classification.
- Set acceptance criteria for “worldwide complete enough” by continent and year.
Recommended labels and handoff
mg-congress-01migenteworld-congressphase-1,phase-2,phase-3architectureordata-collectiondata-qualityready-for-codex
For each batch:
- include one
data-collectionissue for a country cluster, - include one
architectureissue if rule exceptions are required, - add a weekly
data-qualityissue linked to audit snapshots.
GitHub tracking guidance
- Create one issue per major city region when onboarding 25+ events.
- Use labels like
migente-world-congress,phase-1,data-collectionfor architecture split. - Attach SQL results snapshots to each issue (at minimum: event, edition, and social link rows).