Folksonomia y Taxonomía en Inteligencia e investigación

Arquitectura SQL/MySQL para información e inteligencia investigativa

    Executive Insight: la verdadera ventaja analítica no está en almacenar datos, sino en poder interrogarlos con precisión, reconstruir eventos, vincular entidades, sostener hipótesis y detectar patrones invisibles en el tiempo.
  

Introducción

Diseñar una base de datos para investigación no consiste únicamente en almacenar incidentes o documentos. El objetivo real es transformar datos dispersos en una estructura capaz de reconstruir cronologías, detectar patrones, relacionar entidades, evaluar la fiabilidad de fuentes y sostener hipótesis analíticas con trazabilidad completa. En ese contexto, una combinación de taxonomía y folksonomía resulta especialmente eficaz: la primera aporta orden, consistencia y gobernanza; la segunda aporta flexibilidad, velocidad analítica y capacidad para capturar señales débiles que todavía no han sido formalizadas.

El modelo propuesto a continuación está pensado para entornos de investigación histórica, análisis de fraude, compliance, auditoría, seguridad corporativa, análisis relacional y reconstrucción de eventos pasados. La lógica se apoya en MySQL 8 como motor central y en una estructura relacional que separa claramente casos, eventos, entidades, relaciones, evidencias, fuentes, taxonomías, etiquetas libres e hipótesis. El resultado es un sistema que no solo almacena información, sino que la convierte en una base operativa para formular preguntas inteligentes.

Resumen ejecutivo

Esta propuesta plantea una arquitectura completa en MySQL 8 para gestionar información e inteligencia investigativa. El sistema está diseñado para organizar hechos pasados, vincular personas, organizaciones, dispositivos, cuentas, localizaciones y documentos, y permitir que analistas o investigadores extraigan conocimiento mediante consultas reproducibles.

La arquitectura se apoya en seis pilares: gestión de casos, eventos cronológicos, entidades y relaciones, evidencias y fuentes, clasificación formal mediante taxonomía y etiquetado flexible mediante folksonomía. Sobre esa base se añade una capa de hipótesis y un modelo simple de scoring para estimar el nivel de sospecha por entidad a partir de recurrencia, proximidad a eventos críticos, relaciones de riesgo y evidencias asociadas.

Además de la explicación conceptual, el artículo incorpora una versión operativa lista para MySQL 8 con DDL completo, inserts de ejemplo, 25 consultas de investigación y un modelo básico de puntuación. La finalidad no es solo guardar datos, sino ayudar a responder preguntas como qué ocurrió, cuándo, quién estuvo presente, qué conexiones se repiten, qué evidencia sostiene cada afirmación y qué entidades aparecen con mayor centralidad o riesgo en una investigación.

Índice

1. Arquitectura funcional
2. Matriz de arquitectura
3. Diagrama de flujo lógico
4. DDL completo para MySQL 8
5. Inserts de ejemplo
6. 25 queries de investigación
7. Modelo de scoring de sospecha por entidad
8. Roadmap técnico de evolución
9. Conclusión

Mensaje clave para dirección, compliance, auditoría o seguridad corporativa: una base investigativa bien diseñada reduce tiempo de análisis, mejora la trazabilidad, facilita la priorización de riesgos y convierte información dispersa en una estructura interrogable, defendible y reutilizable.

1. Arquitectura funcional

El modelo divide la información investigativa en capas. La primera capa es el caso, que actúa como contenedor lógico. La segunda es el evento, que representa cada hecho observado, reportado o inferido. La tercera es la de entidades, donde se modelan personas, teléfonos, vehículos, cuentas, empresas, direcciones o dispositivos. La cuarta capa corresponde a las relaciones entre entidades y entre entidades y eventos. La quinta capa se dedica a fuentes y evidencias, que garantizan trazabilidad. La sexta combina taxonomía, útil para clasificar formalmente, y folksonomía, útil para capturar etiquetas emergentes creadas por analistas.

Sobre estas capas se añade una dimensión de hipótesis. En investigación, no todo es hecho probado. Un modelo maduro distingue entre observación, indicio, inferencia e hipótesis. Esta separación permite que la base de datos funcione no solo como archivo histórico, sino como motor de análisis.

2. Matriz de arquitectura

Antes del código conviene resumir la lógica funcional del sistema en una matriz sencilla. Esta tabla ayuda a explicar a dirección, tecnología, compliance, auditoría interna o seguridad por qué cada bloque existe y qué valor aporta al proceso de investigación.

Capa	Función	Valor analítico	Ejemplo
Casos	Agrupan la investigación y sus objetos relacionados	Orden, segmentación y gobierno del dato	Fraude, incidente, auditoría interna, revisión histórica
Eventos	Registran hechos observados, reportados o inferidos	Reconstrucción cronológica y secuencial	Llamada, acceso, transferencia, encuentro, movimiento
Entidades	Modelan personas, organizaciones, cuentas, vehículos, dispositivos	Centralidad, vínculo, identidad y recurrencia	Persona, teléfono, empresa, cuenta bancaria
Relaciones	Conectan entidades entre sí	Descubrimiento de red y jerarquía relacional	Usó teléfono, transfirió a, se reunió con
Fuentes y evidencias	Documentan el sustento de cada afirmación	Trazabilidad, auditabilidad y cadena lógica	CCTV, extracto bancario, testimonio, CDR
Taxonomía	Clasificación formal y jerárquica	Consistencia, reporting y búsqueda estructurada	Preparation, Transaction, Communication
Folksonomía	Etiquetado libre por analistas	Captura de señales débiles y patrones emergentes	numero_desechable, cuenta_puente, madrugada
Hipótesis	Relacionan hechos y entidades con una línea analítica	Priorización y validación iterativa	Intermediación, pago fragmentado, operativa repetida
Scoring	Resume señales de riesgo en una puntuación interpretable	Priorización operativa	Entidad con alta recurrencia, tags de riesgo y evidencias

Lectura ejecutiva de la matriz. El modelo no se limita a guardar datos. Está diseñado para responder preguntas críticas: qué ocurrió, con qué secuencia, qué entidades se repiten, qué vínculos aparecen, qué evidencia soporta cada afirmación, qué patrones emergen por clasificación formal o por etiquetado libre y qué nodos merecen prioridad analítica.

3. Diagrama de flujo lógico

DATA INGESTION
   ↓
EVENTS  →  ENTITIES  →  RELATIONSHIPS
   ↓            ↓
EVIDENCE    TAXONOMY
   ↓            ↓
FOLKSONOMY → HYPOTHESES
   ↓
SCORING / INTELLIGENCE OUTPUT

Ventaja estratégica

Reducción del tiempo de análisis
Priorización automática de riesgos
Trazabilidad completa para auditoría, forensics o compliance
Detección de patrones repetidos
Escalabilidad hacia IA, NLP y modelos predictivos

4. DDL completo para MySQL 8

El siguiente script crea una estructura mínima pero robusta, preparada para crecer. Incluye claves primarias, foráneas, índices y soporte para búsqueda FULLTEXT.

DROP DATABASE IF EXISTS intel_lab;
CREATE DATABASE intel_lab CHARACTER SET utf8mb4 COLLATE utf8mb4_0900_ai_ci;
USE intel_lab;

CREATE TABLE cases (
    case_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    case_code VARCHAR(100) NOT NULL UNIQUE,
    title VARCHAR(255) NOT NULL,
    description TEXT,
    status VARCHAR(50) NOT NULL DEFAULT 'open',
    priority VARCHAR(50) DEFAULT 'medium',
    opened_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    closed_at DATETIME NULL
) ENGINE=InnoDB;

CREATE TABLE entity_types (
    entity_type_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    type_name VARCHAR(100) NOT NULL UNIQUE
) ENGINE=InnoDB;

CREATE TABLE locations (
    location_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    location_name VARCHAR(255),
    address_text VARCHAR(500),
    city VARCHAR(100),
    region VARCHAR(100),
    country VARCHAR(100),
    latitude DECIMAL(10,7),
    longitude DECIMAL(10,7)
) ENGINE=InnoDB;

CREATE TABLE sources (
    source_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    source_type VARCHAR(100) NOT NULL,
    source_name VARCHAR(255),
    source_reference VARCHAR(255),
    acquisition_date DATETIME NULL,
    reliability_score DECIMAL(5,2) DEFAULT 0.00,
    notes TEXT
) ENGINE=InnoDB;

CREATE TABLE entities (
    entity_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    entity_type_id BIGINT NOT NULL,
    display_name VARCHAR(255) NOT NULL,
    canonical_name VARCHAR(255),
    description TEXT,
    risk_score DECIMAL(5,2) DEFAULT 0.00,
    extra_metadata JSON NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (entity_type_id) REFERENCES entity_types(entity_type_id)
) ENGINE=InnoDB;

CREATE TABLE events (
    event_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    case_id BIGINT NOT NULL,
    event_type_id BIGINT NULL,
    title VARCHAR(255) NOT NULL,
    description TEXT,
    event_start DATETIME NULL,
    event_end DATETIME NULL,
    location_id BIGINT NULL,
    confidence_score DECIMAL(5,2) DEFAULT 0.00,
    source_reliability_score DECIMAL(5,2) DEFAULT 0.00,
    extra_metadata JSON NULL,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    updated_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    FOREIGN KEY (case_id) REFERENCES cases(case_id),
    FOREIGN KEY (location_id) REFERENCES locations(location_id)
) ENGINE=InnoDB;

CREATE TABLE event_entity_link (
    event_entity_link_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    event_id BIGINT NOT NULL,
    entity_id BIGINT NOT NULL,
    role_in_event VARCHAR(100),
    confidence_score DECIMAL(5,2) DEFAULT 0.00,
    FOREIGN KEY (event_id) REFERENCES events(event_id),
    FOREIGN KEY (entity_id) REFERENCES entities(entity_id),
    UNIQUE KEY uq_event_entity_role (event_id, entity_id, role_in_event)
) ENGINE=InnoDB;

CREATE TABLE entity_relationships (
    relationship_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    from_entity_id BIGINT NOT NULL,
    to_entity_id BIGINT NOT NULL,
    relationship_type VARCHAR(100) NOT NULL,
    valid_from DATETIME NULL,
    valid_to DATETIME NULL,
    confidence_score DECIMAL(5,2) DEFAULT 0.00,
    source_id BIGINT NULL,
    notes TEXT,
    FOREIGN KEY (from_entity_id) REFERENCES entities(entity_id),
    FOREIGN KEY (to_entity_id) REFERENCES entities(entity_id),
    FOREIGN KEY (source_id) REFERENCES sources(source_id)
) ENGINE=InnoDB;

CREATE TABLE evidences (
    evidence_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    source_id BIGINT NOT NULL,
    case_id BIGINT NOT NULL,
    evidence_type VARCHAR(100) NOT NULL,
    title VARCHAR(255),
    file_path VARCHAR(500),
    content_text LONGTEXT,
    hash_sha256 VARCHAR(128),
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (source_id) REFERENCES sources(source_id),
    FOREIGN KEY (case_id) REFERENCES cases(case_id)
) ENGINE=InnoDB;

CREATE TABLE event_evidence_link (
    event_id BIGINT NOT NULL,
    evidence_id BIGINT NOT NULL,
    PRIMARY KEY (event_id, evidence_id),
    FOREIGN KEY (event_id) REFERENCES events(event_id),
    FOREIGN KEY (evidence_id) REFERENCES evidences(evidence_id)
) ENGINE=InnoDB;

CREATE TABLE entity_evidence_link (
    entity_id BIGINT NOT NULL,
    evidence_id BIGINT NOT NULL,
    PRIMARY KEY (entity_id, evidence_id),
    FOREIGN KEY (entity_id) REFERENCES entities(entity_id),
    FOREIGN KEY (evidence_id) REFERENCES evidences(evidence_id)
) ENGINE=InnoDB;

CREATE TABLE taxonomy_terms (
    taxonomy_term_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    taxonomy_name VARCHAR(100) NOT NULL,
    parent_term_id BIGINT NULL,
    term_code VARCHAR(100),
    term_label VARCHAR(255) NOT NULL,
    term_description TEXT,
    FOREIGN KEY (parent_term_id) REFERENCES taxonomy_terms(taxonomy_term_id)
) ENGINE=InnoDB;

CREATE TABLE event_taxonomy_link (
    event_id BIGINT NOT NULL,
    taxonomy_term_id BIGINT NOT NULL,
    PRIMARY KEY (event_id, taxonomy_term_id),
    FOREIGN KEY (event_id) REFERENCES events(event_id),
    FOREIGN KEY (taxonomy_term_id) REFERENCES taxonomy_terms(taxonomy_term_id)
) ENGINE=InnoDB;

CREATE TABLE entity_taxonomy_link (
    entity_id BIGINT NOT NULL,
    taxonomy_term_id BIGINT NOT NULL,
    PRIMARY KEY (entity_id, taxonomy_term_id),
    FOREIGN KEY (entity_id) REFERENCES entities(entity_id),
    FOREIGN KEY (taxonomy_term_id) REFERENCES taxonomy_terms(taxonomy_term_id)
) ENGINE=InnoDB;

CREATE TABLE tags (
    tag_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    tag_label VARCHAR(100) NOT NULL UNIQUE,
    created_by VARCHAR(100),
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB;

CREATE TABLE event_tags (
    event_id BIGINT NOT NULL,
    tag_id BIGINT NOT NULL,
    PRIMARY KEY (event_id, tag_id),
    FOREIGN KEY (event_id) REFERENCES events(event_id),
    FOREIGN KEY (tag_id) REFERENCES tags(tag_id)
) ENGINE=InnoDB;

CREATE TABLE entity_tags (
    entity_id BIGINT NOT NULL,
    tag_id BIGINT NOT NULL,
    PRIMARY KEY (entity_id, tag_id),
    FOREIGN KEY (entity_id) REFERENCES entities(entity_id),
    FOREIGN KEY (tag_id) REFERENCES tags(tag_id)
) ENGINE=InnoDB;

CREATE TABLE time_annotations (
    time_annotation_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    event_id BIGINT NOT NULL,
    time_precision VARCHAR(50) NOT NULL,
    inferred_from TEXT,
    FOREIGN KEY (event_id) REFERENCES events(event_id)
) ENGINE=InnoDB;

CREATE TABLE hypotheses (
    hypothesis_id BIGINT PRIMARY KEY AUTO_INCREMENT,
    case_id BIGINT NOT NULL,
    title VARCHAR(255) NOT NULL,
    description TEXT NOT NULL,
    status VARCHAR(50) DEFAULT 'active',
    confidence_score DECIMAL(5,2) DEFAULT 0.00,
    created_at DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (case_id) REFERENCES cases(case_id)
) ENGINE=InnoDB;

CREATE TABLE hypothesis_event_link (
    hypothesis_id BIGINT NOT NULL,
    event_id BIGINT NOT NULL,
    support_type VARCHAR(50) DEFAULT 'supports',
    PRIMARY KEY (hypothesis_id, event_id),
    FOREIGN KEY (hypothesis_id) REFERENCES hypotheses(hypothesis_id),
    FOREIGN KEY (event_id) REFERENCES events(event_id)
) ENGINE=InnoDB;

CREATE TABLE hypothesis_entity_link (
    hypothesis_id BIGINT NOT NULL,
    entity_id BIGINT NOT NULL,
    support_type VARCHAR(50) DEFAULT 'supports',
    PRIMARY KEY (hypothesis_id, entity_id),
    FOREIGN KEY (hypothesis_id) REFERENCES hypotheses(hypothesis_id),
    FOREIGN KEY (entity_id) REFERENCES entities(entity_id)
) ENGINE=InnoDB;

CREATE INDEX idx_events_case_time ON events(case_id, event_start);
CREATE INDEX idx_events_location ON events(location_id);
CREATE INDEX idx_entities_name ON entities(display_name);
CREATE INDEX idx_entities_canonical_name ON entities(canonical_name);
CREATE INDEX idx_event_entity_event ON event_entity_link(event_id);
CREATE INDEX idx_event_entity_entity ON event_entity_link(entity_id);
CREATE INDEX idx_entity_rel_from_to ON entity_relationships(from_entity_id, to_entity_id);
CREATE INDEX idx_entity_rel_type ON entity_relationships(relationship_type);
CREATE INDEX idx_evidences_case_type ON evidences(case_id, evidence_type);
CREATE INDEX idx_tags_label ON tags(tag_label);

ALTER TABLE evidences ADD FULLTEXT ft_evidences_content (content_text);
ALTER TABLE events ADD FULLTEXT ft_events_text (title, description);
ALTER TABLE entities ADD FULLTEXT ft_entities_text (display_name, canonical_name, description);

CREATE OR REPLACE VIEW vw_case_timeline AS
SELECT 
    e.event_id,
    e.case_id,
    e.title,
    e.description,
    e.event_start,
    e.event_end,
    l.location_name,
    e.confidence_score
FROM events e
LEFT JOIN locations l ON e.location_id = l.location_id;

CREATE OR REPLACE VIEW vw_event_enriched AS
SELECT 
    e.event_id,
    e.case_id,
    e.title,
    e.event_start,
    GROUP_CONCAT(DISTINCT tt.term_label ORDER BY tt.term_label SEPARATOR ', ') AS taxonomy_terms,
    GROUP_CONCAT(DISTINCT t.tag_label ORDER BY t.tag_label SEPARATOR ', ') AS folk_tags
FROM events e
LEFT JOIN event_taxonomy_link etl ON e.event_id = etl.event_id
LEFT JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
LEFT JOIN event_tags et ON e.event_id = et.event_id
LEFT JOIN tags t ON et.tag_id = t.tag_id
GROUP BY e.event_id, e.case_id, e.title, e.event_start;

5. Inserts de ejemplo

Los siguientes datos de ejemplo ilustran una investigación ficticia sobre una operativa de coordinación, movimientos y transacciones vinculadas a un caso.

INSERT INTO cases (case_code, title, description, priority)
VALUES
('CASE-2026-001', 'Operativa coordinada en puerto seco', 'Caso ficticio para demostrar estructura investigativa', 'high');

INSERT INTO entity_types (type_name)
VALUES
('PERSON'), ('ORG'), ('PHONE'), ('VEHICLE'), ('BANK_ACCOUNT'), ('DEVICE'), ('ADDRESS'), ('EMAIL');

INSERT INTO locations (location_name, address_text, city, region, country, latitude, longitude)
VALUES
('Nave logística Norte', 'Polígono Norte, parcela 4', 'Barcelona', 'Cataluña', 'España', 41.4123000, 2.1845000),
('Gasolinera Delta', 'Salida B-20 km 7', 'Barcelona', 'Cataluña', 'España', 41.4362000, 2.2099000),
('Parking industrial Sur', 'Calle Metal 18', 'L''Hospitalet', 'Cataluña', 'España', 41.3477000, 2.1059000),
('Sucursal bancaria Centro', 'Avenida Central 110', 'Barcelona', 'Cataluña', 'España', 41.3891000, 2.1702000);

INSERT INTO sources (source_type, source_name, source_reference, acquisition_date, reliability_score, notes)
VALUES
('CCTV', 'Camara Nave Norte', 'CCTV-NN-001', '2026-03-15 10:00:00', 85.00, 'Cobertura parcial de acceso principal'),
('Witness', 'Declaración vigilante', 'WIT-2026-14', '2026-03-15 12:00:00', 70.00, 'Testimonio coherente con dudas sobre horas exactas'),
('Bank', 'Extracto bancario', 'BNK-99812', '2026-03-16 09:15:00', 92.00, 'Transferencias de bajo importe repetidas'),
('Telecom', 'CDR operador', 'TEL-CDR-221', '2026-03-16 11:30:00', 90.00, 'Metadatos de llamadas y celdas');

INSERT INTO entities (entity_type_id, display_name, canonical_name, description, risk_score, extra_metadata)
VALUES
(1, 'Karim El M.', 'KARIM EL M', 'Persona observada en tres eventos', 0.00, JSON_OBJECT('nationality','ES-MA')),
(1, 'Nadia R.', 'NADIA R', 'Persona vinculada a movimientos y contactos', 0.00, JSON_OBJECT('role','intermediary')),
(2, 'LogisTrade BCN SL', 'LOGISTRADE BCN SL', 'Empresa con actividad logística', 0.00, JSON_OBJECT('sector','logistics')),
(3, '+34600111222', '+34600111222', 'Número móvil prepago', 0.00, JSON_OBJECT('operator','unknown')),
(4, 'Furgoneta 4812-KLM', '4812-KLM', 'Vehículo blanco industrial', 0.00, JSON_OBJECT('brand','Ford')),
(5, 'ES21-1234-5678-0001', 'ES21-1234-5678-0001', 'Cuenta usada para microtransferencias', 0.00, JSON_OBJECT('bank','Banco Demo')),
(6, 'IMEI-900100200300400', 'IMEI-900100200300400', 'Terminal detectado en CDR', 0.00, JSON_OBJECT('device_type','smartphone')),
(8, 'nadiar@example.net', 'nadiar@example.net', 'Correo electrónico asociado', 0.00, JSON_OBJECT('provider','mail'));

INSERT INTO events (case_id, title, description, event_start, event_end, location_id, confidence_score, source_reliability_score, extra_metadata)
VALUES
(1, 'Acceso a nave', 'Ingreso de persona y vehículo en nave logística', '2026-03-15 06:40:00', '2026-03-15 07:10:00', 1, 82.00, 85.00, JSON_OBJECT('shift','early_morning')),
(1, 'Encuentro en gasolinera', 'Reunión breve con intercambio de objeto', '2026-03-15 08:15:00', '2026-03-15 08:28:00', 2, 76.00, 70.00, JSON_OBJECT('object_exchange', true)),
(1, 'Estacionamiento de espera', 'Vehículo estacionado en zona industrial', '2026-03-15 09:05:00', '2026-03-15 09:50:00', 3, 68.00, 85.00, JSON_OBJECT('engine_off', true)),
(1, 'Microtransferencia bancaria', 'Transferencia de 480 EUR a cuenta puente', '2026-03-15 10:20:00', '2026-03-15 10:21:00', 4, 91.00, 92.00, JSON_OBJECT('amount', 480)),
(1, 'Llamada de coordinación', 'Llamada corta entre número prepago y contacto principal', '2026-03-15 10:35:00', '2026-03-15 10:37:00', NULL, 88.00, 90.00, JSON_OBJECT('duration_sec', 121)),
(1, 'Segundo acceso a nave', 'Regreso a nave con menor permanencia', '2026-03-15 11:10:00', '2026-03-15 11:22:00', 1, 80.00, 85.00, JSON_OBJECT('repeat_visit', true));

INSERT INTO event_entity_link (event_id, entity_id, role_in_event, confidence_score)
VALUES
(1, 1, 'subject', 88.00),
(1, 5, 'vehicle', 92.00),
(1, 3, 'location_owner_related', 55.00),
(2, 1, 'participant', 74.00),
(2, 2, 'participant', 78.00),
(2, 4, 'caller_device_holder', 60.00),
(3, 5, 'vehicle', 87.00),
(3, 1, 'driver', 80.00),
(4, 6, 'receiver_account', 95.00),
(4, 2, 'beneficiary_related', 72.00),
(5, 4, 'caller', 93.00),
(5, 1, 'contact_person', 65.00),
(6, 1, 'subject', 84.00),
(6, 5, 'vehicle', 89.00);

INSERT INTO entity_relationships (from_entity_id, to_entity_id, relationship_type, valid_from, confidence_score, source_id, notes)
VALUES
(1, 4, 'used_phone', '2026-03-01 00:00:00', 85.00, 4, 'CDR vincula patrón de uso'),
(1, 5, 'used_vehicle', '2026-03-10 00:00:00', 82.00, 1, 'Captado en CCTV'),
(2, 6, 'linked_to_account', '2026-03-12 00:00:00', 79.00, 3, 'Beneficiaria potencial'),
(2, 8, 'used_email', '2026-03-12 00:00:00', 88.00, 3, 'Correo usado en operaciones previas'),
(3, 5, 'fleet_or_logistics_connection', '2026-01-01 00:00:00', 58.00, 2, 'Conexión débil pendiente de confirmar'),
(1, 2, 'met_repeatedly', '2026-03-15 00:00:00', 76.00, 2, 'Observados juntos en encuentro breve');

INSERT INTO evidences (source_id, case_id, evidence_type, title, file_path, content_text, hash_sha256)
VALUES
(1, 1, 'video', 'Clip acceso nave', '/evidence/cctv/acceso_nave.mp4', 'Se observa una furgoneta blanca entrando a las 06:40 con un ocupante.', 'hash001'),
(2, 1, 'statement', 'Declaración del vigilante', '/evidence/witness/vigilante.txt', 'El vigilante afirma haber visto a dos personas intercambiar un sobre pequeño en la gasolinera.', 'hash002'),
(3, 1, 'bank_record', 'Extracto cuenta puente', '/evidence/bank/extracto_480.pdf', 'Transferencia de 480 EUR recibida en cuenta ES21-1234-5678-0001.', 'hash003'),
(4, 1, 'cdr', 'CDR número prepago', '/evidence/telecom/cdr_600111222.csv', 'Llamada de 121 segundos y coincidencia de celdas entre 10:35 y 10:37.', 'hash004');

INSERT INTO event_evidence_link (event_id, evidence_id)
VALUES (1,1), (2,2), (4,3), (5,4), (6,1);

INSERT INTO entity_evidence_link (entity_id, evidence_id)
VALUES (1,1), (1,2), (2,2), (6,3), (4,4);

INSERT INTO taxonomy_terms (taxonomy_name, parent_term_id, term_code, term_label, term_description)
VALUES
('INVESTIGATION', NULL, 'INCIDENT', 'INCIDENT', 'Hecho investigado'),
('INVESTIGATION', 1, 'PREPARATION', 'PREPARATION', 'Acciones previas o de preparación'),
('INVESTIGATION', 1, 'COORDINATION', 'COORDINATION', 'Acciones de coordinación'),
('INVESTIGATION', 1, 'MOVEMENT', 'MOVEMENT', 'Movimientos físicos'),
('INVESTIGATION', 1, 'TRANSACTION', 'TRANSACTION', 'Movimientos económicos'),
('INVESTIGATION', 1, 'COMMUNICATION', 'COMMUNICATION', 'Interacciones comunicativas');

INSERT INTO event_taxonomy_link (event_id, taxonomy_term_id)
VALUES
(1, 2),
(1, 4),
(2, 3),
(3, 4),
(4, 5),
(5, 6),
(6, 2);

INSERT INTO tags (tag_label, created_by)
VALUES
('madrugada', 'analyst_1'),
('intercambio_objeto', 'analyst_1'),
('cuenta_puente', 'analyst_2'),
('numero_desechable', 'analyst_2'),
('repeticion_patron', 'analyst_1'),
('ruta_secundaria', 'analyst_3');

INSERT INTO event_tags (event_id, tag_id)
VALUES
(1,1),
(2,2),
(3,6),
(4,3),
(5,4),
(6,5);

INSERT INTO entity_tags (entity_id, tag_id)
VALUES
(1,5),
(2,3),
(4,4),
(5,6),
(6,3);

INSERT INTO time_annotations (event_id, time_precision, inferred_from)
VALUES
(1, 'exact', 'Timestamp CCTV'),
(2, 'approximate', 'Declaración testigo y cámara parcial'),
(5, 'exact', 'CDR operador');

INSERT INTO hypotheses (case_id, title, description, status, confidence_score)
VALUES
(1, 'Uso de cuenta puente para pagos de coordinación', 'La operativa sugiere pagos fragmentados vinculados a coordinación táctica.', 'active', 72.00),
(1, 'Nadia R. como intermediaria', 'La entidad Nadia R. aparece vinculada a encuentro, cuenta y contacto.', 'active', 68.00);

INSERT INTO hypothesis_event_link (hypothesis_id, event_id, support_type)
VALUES
(1, 4, 'supports'),
(1, 5, 'context'),
(2, 2, 'supports'),
(2, 4, 'supports');

INSERT INTO hypothesis_entity_link (hypothesis_id, entity_id, support_type)
VALUES
(1, 6, 'supports'),
(2, 2, 'supports'),
(2, 1, 'context');

6. 25 queries de investigación

Las consultas siguientes están redactadas para MySQL 8 y cubren cronología, centralidad, relación, correlación, trazabilidad, taxonomía, folksonomía, hipótesis y búsqueda textual.

-- 1. Línea temporal del caso
SELECT * FROM vw_case_timeline WHERE case_id = 1 ORDER BY event_start;

-- 2. Eventos con taxonomía y tags agregados
SELECT * FROM vw_event_enriched WHERE case_id = 1 ORDER BY event_start;

-- 3. Eventos por rango temporal
SELECT event_id, title, event_start
FROM events
WHERE case_id = 1
  AND event_start BETWEEN '2026-03-15 06:00:00' AND '2026-03-15 12:00:00'
ORDER BY event_start;

-- 4. Entidades con más presencia en el caso
SELECT en.entity_id, en.display_name, et.type_name, COUNT(*) AS total_eventos
FROM event_entity_link eel
JOIN entities en ON eel.entity_id = en.entity_id
JOIN entity_types et ON en.entity_type_id = et.entity_type_id
JOIN events ev ON eel.event_id = ev.event_id
WHERE ev.case_id = 1
GROUP BY en.entity_id, en.display_name, et.type_name
ORDER BY total_eventos DESC;

-- 5. Eventos asociados a una entidad concreta
SELECT e.event_id, e.title, e.event_start, eel.role_in_event
FROM event_entity_link eel
JOIN events e ON eel.event_id = e.event_id
WHERE eel.entity_id = 1
ORDER BY e.event_start;

-- 6. Relación entre dos entidades por eventos comunes
SELECT e.event_id, e.title, e.event_start
FROM event_entity_link a
JOIN event_entity_link b ON a.event_id = b.event_id
JOIN events e ON e.event_id = a.event_id
WHERE a.entity_id = 1 AND b.entity_id = 2
ORDER BY e.event_start;

-- 7. Red de relaciones directas de una entidad
SELECT e2.entity_id, e2.display_name, er.relationship_type, er.confidence_score
FROM entity_relationships er
JOIN entities e2 ON er.to_entity_id = e2.entity_id
WHERE er.from_entity_id = 1
ORDER BY er.confidence_score DESC;

-- 8. Eventos clasificados como PREPARATION
SELECT e.event_id, e.title, e.event_start
FROM events e
JOIN event_taxonomy_link etl ON e.event_id = etl.event_id
JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
WHERE tt.term_label = 'PREPARATION'
ORDER BY e.event_start;

-- 9. Frecuencia de categorías taxonómicas por caso
SELECT tt.term_label, COUNT(*) AS total
FROM event_taxonomy_link etl
JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
JOIN events e ON e.event_id = etl.event_id
WHERE e.case_id = 1
GROUP BY tt.term_label
ORDER BY total DESC;

-- 10. Frecuencia de tags folksonómicos
SELECT t.tag_label, COUNT(*) AS total
FROM event_tags et
JOIN tags t ON et.tag_id = t.tag_id
JOIN events e ON et.event_id = e.event_id
WHERE e.case_id = 1
GROUP BY t.tag_label
ORDER BY total DESC;

-- 11. Cruce entre taxonomía y folksonomía
SELECT tt.term_label AS categoria, t.tag_label AS etiqueta, COUNT(*) AS frecuencia
FROM events e
JOIN event_taxonomy_link etl ON e.event_id = etl.event_id
JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
JOIN event_tags eg ON e.event_id = eg.event_id
JOIN tags t ON eg.tag_id = t.tag_id
WHERE e.case_id = 1
GROUP BY tt.term_label, t.tag_label
ORDER BY frecuencia DESC, categoria, etiqueta;

-- 12. Eventos próximos a un evento dado en ventana de dos horas
SELECT e2.event_id, e2.title, e2.event_start
FROM events e1
JOIN events e2 ON e1.event_id <> e2.event_id
WHERE e1.event_id = 2
  AND e2.event_start BETWEEN DATE_SUB(e1.event_start, INTERVAL 2 HOUR)
                         AND DATE_ADD(e1.event_start, INTERVAL 2 HOUR)
ORDER BY e2.event_start;

-- 13. Evidencias que soportan un evento
SELECT ev.evidence_id, ev.title, ev.evidence_type, s.source_name
FROM event_evidence_link eel
JOIN evidences ev ON eel.evidence_id = ev.evidence_id
JOIN sources s ON ev.source_id = s.source_id
WHERE eel.event_id = 4;

-- 14. Evidencias vinculadas a una entidad
SELECT ev.evidence_id, ev.title, ev.evidence_type
FROM entity_evidence_link eel
JOIN evidences ev ON eel.evidence_id = ev.evidence_id
WHERE eel.entity_id = 1;

-- 15. Búsqueda FULLTEXT sobre evidencias
SELECT evidence_id, title,
       MATCH(content_text) AGAINST ('+transferencia +cuenta +480' IN BOOLEAN MODE) AS score
FROM evidences
WHERE MATCH(content_text) AGAINST ('+transferencia +cuenta +480' IN BOOLEAN MODE)
ORDER BY score DESC;

-- 16. Búsqueda FULLTEXT sobre eventos
SELECT event_id, title,
       MATCH(title, description) AGAINST ('+llamada +coordinacion' IN BOOLEAN MODE) AS score
FROM events
WHERE MATCH(title, description) AGAINST ('+llamada +coordinacion' IN BOOLEAN MODE)
ORDER BY score DESC;

-- 17. Entidades con tags considerados de riesgo
SELECT e.entity_id, e.display_name, GROUP_CONCAT(t.tag_label) AS tags_riesgo
FROM entities e
JOIN entity_tags et ON e.entity_id = et.entity_id
JOIN tags t ON et.tag_id = t.tag_id
WHERE t.tag_label IN ('numero_desechable','cuenta_puente','repeticion_patron')
GROUP BY e.entity_id, e.display_name;

-- 18. Eventos con baja fiabilidad de fuente y alta confianza analítica
SELECT event_id, title, confidence_score, source_reliability_score
FROM events
WHERE confidence_score >= 75 AND source_reliability_score < 80;

-- 19. Relaciones con confianza superior a 80
SELECT relationship_id, from_entity_id, to_entity_id, relationship_type, confidence_score
FROM entity_relationships
WHERE confidence_score >= 80
ORDER BY confidence_score DESC;

-- 20. Hipótesis activas con sus entidades asociadas
SELECT h.title AS hypothesis, e.display_name, hel.support_type, h.confidence_score
FROM hypotheses h
JOIN hypothesis_entity_link hel ON h.hypothesis_id = hel.hypothesis_id
JOIN entities e ON hel.entity_id = e.entity_id
WHERE h.status = 'active'
ORDER BY h.confidence_score DESC;

-- 21. Hipótesis activas con sus eventos asociados
SELECT h.title AS hypothesis, ev.title AS event_title, ev.event_start, hel.support_type
FROM hypotheses h
JOIN hypothesis_event_link hel ON h.hypothesis_id = hel.hypothesis_id
JOIN events ev ON hel.event_id = ev.event_id
WHERE h.status = 'active'
ORDER BY h.confidence_score DESC, ev.event_start;

-- 22. Número de evidencias por entidad
SELECT e.entity_id, e.display_name, COUNT(*) AS total_evidencias
FROM entity_evidence_link eel
JOIN entities e ON eel.entity_id = e.entity_id
GROUP BY e.entity_id, e.display_name
ORDER BY total_evidencias DESC;

-- 23. Entidades conectadas a eventos de TRANSACTION o COMMUNICATION
SELECT DISTINCT en.entity_id, en.display_name
FROM entities en
JOIN event_entity_link eel ON en.entity_id = eel.entity_id
JOIN event_taxonomy_link etl ON eel.event_id = etl.event_id
JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
WHERE tt.term_label IN ('TRANSACTION','COMMUNICATION')
ORDER BY en.display_name;

-- 24. Reconstrucción cronológica enriquecida para una entidad
SELECT e.event_start, e.title, eel.role_in_event, l.location_name
FROM event_entity_link eel
JOIN events e ON eel.event_id = e.event_id
LEFT JOIN locations l ON e.location_id = l.location_id
WHERE eel.entity_id = 1
ORDER BY e.event_start;

-- 25. Entidades con mayor centralidad simple por volumen de relaciones y eventos
SELECT 
    en.entity_id,
    en.display_name,
    COALESCE(ev_cnt.total_events,0) AS total_events,
    COALESCE(rel_out.total_rel_out,0) + COALESCE(rel_in.total_rel_in,0) AS total_relationships,
    COALESCE(ev_cnt.total_events,0) + COALESCE(rel_out.total_rel_out,0) + COALESCE(rel_in.total_rel_in,0) AS centrality_score
FROM entities en
LEFT JOIN (
    SELECT entity_id, COUNT(*) AS total_events
    FROM event_entity_link
    GROUP BY entity_id
) ev_cnt ON en.entity_id = ev_cnt.entity_id
LEFT JOIN (
    SELECT from_entity_id, COUNT(*) AS total_rel_out
    FROM entity_relationships
    GROUP BY from_entity_id
) rel_out ON en.entity_id = rel_out.from_entity_id
LEFT JOIN (
    SELECT to_entity_id AS entity_id, COUNT(*) AS total_rel_in
    FROM entity_relationships
    GROUP BY to_entity_id
) rel_in ON en.entity_id = rel_in.entity_id
ORDER BY centrality_score DESC, en.display_name;

7. Modelo de scoring de sospecha por entidad

Un sistema de scoring no sustituye el juicio analítico ni una valoración jurídica, pero resulta útil para priorizar revisión humana. A continuación se propone un modelo simple y explicable. La lógica combina cuatro factores: presencia en eventos, presencia en eventos de categorías sensibles, tags de riesgo y densidad relacional. A ello se suma un bonus por evidencias directamente asociadas.

Factor	Lógica	Peso orientativo
Presencia en eventos	Cuántos eventos contienen a la entidad	2 puntos por evento
Eventos sensibles	Participación en TRANSACTION, COMMUNICATION o PREPARATION	5 puntos por evento sensible
Tags de riesgo	Etiquetas como numero_desechable, cuenta_puente o repeticion_patron	8 puntos por tag
Relaciones	Número de relaciones directas salientes y entrantes	3 puntos por relación
Evidencias	Número de evidencias asociadas directamente	4 puntos por evidencia

WITH entity_event_counts AS (
    SELECT entity_id, COUNT(*) AS total_events
    FROM event_entity_link
    GROUP BY entity_id
),
sensitive_event_counts AS (
    SELECT eel.entity_id, COUNT(*) AS sensitive_events
    FROM event_entity_link eel
    JOIN event_taxonomy_link etl ON eel.event_id = etl.event_id
    JOIN taxonomy_terms tt ON etl.taxonomy_term_id = tt.taxonomy_term_id
    WHERE tt.term_label IN ('TRANSACTION','COMMUNICATION','PREPARATION')
    GROUP BY eel.entity_id
),
risk_tag_counts AS (
    SELECT et.entity_id, COUNT(*) AS risk_tags
    FROM entity_tags et
    JOIN tags t ON et.tag_id = t.tag_id
    WHERE t.tag_label IN ('numero_desechable','cuenta_puente','repeticion_patron','intercambio_objeto','ruta_secundaria')
    GROUP BY et.entity_id
),
relationship_counts AS (
    SELECT entity_id, SUM(rel_cnt) AS total_relationships
    FROM (
        SELECT from_entity_id AS entity_id, COUNT(*) AS rel_cnt
        FROM entity_relationships
        GROUP BY from_entity_id
        UNION ALL
        SELECT to_entity_id AS entity_id, COUNT(*) AS rel_cnt
        FROM entity_relationships
        GROUP BY to_entity_id
    ) x
    GROUP BY entity_id
),
evidence_counts AS (
    SELECT entity_id, COUNT(*) AS total_evidences
    FROM entity_evidence_link
    GROUP BY entity_id
)
SELECT 
    e.entity_id,
    e.display_name,
    COALESCE(eec.total_events,0) AS total_events,
    COALESCE(sec.sensitive_events,0) AS sensitive_events,
    COALESCE(rtc.risk_tags,0) AS risk_tags,
    COALESCE(rc.total_relationships,0) AS total_relationships,
    COALESCE(ec.total_evidences,0) AS total_evidences,
    (
        COALESCE(eec.total_events,0) * 2 +
        COALESCE(sec.sensitive_events,0) * 5 +
        COALESCE(rtc.risk_tags,0) * 8 +
        COALESCE(rc.total_relationships,0) * 3 +
        COALESCE(ec.total_evidences,0) * 4
    ) AS suspicion_score
FROM entities e
LEFT JOIN entity_event_counts eec ON e.entity_id = eec.entity_id
LEFT JOIN sensitive_event_counts sec ON e.entity_id = sec.entity_id
LEFT JOIN risk_tag_counts rtc ON e.entity_id = rtc.entity_id
LEFT JOIN relationship_counts rc ON e.entity_id = rc.entity_id
LEFT JOIN evidence_counts ec ON e.entity_id = ec.entity_id
ORDER BY suspicion_score DESC, e.display_name;

Interpretación recomendada

Una entidad con puntuación alta no debe considerarse automáticamente responsable ni culpable. El valor del scoring reside en la priorización analítica. Es decir, ayuda a decidir qué entidad revisar antes, dónde profundizar, qué cruces hacer y qué hipótesis merecen ser reforzadas o descartadas. En un entorno más maduro, esta lógica puede enriquecerse incorporando temporalidad, geoproximidad, peso diferencial por tipo de fuente, direccionalidad relacional, recurrencia secuencial, anomalías y comparación con casos históricos.

8. Roadmap técnico de evolución

Una vez desplegada la versión base, el siguiente paso consiste en transformar esta arquitectura en una plataforma más madura. El roadmap siguiente resume una evolución razonable por fases, desde una base relacional operativa hasta un entorno híbrido de inteligencia analítica.

Fase	Objetivo	Resultado esperado
Fase 1	Consolidar modelo relacional, roles y trazabilidad	Base consistente, auditable y apta para explotación operativa
Fase 2	Automatizar ingesta desde documentos, CSV, OCR y logs	Reducción de trabajo manual y mayor velocidad de carga
Fase 3	Incorporar NLP para extracción automática de entidades y eventos	Mayor escalabilidad y enriquecimiento del dato no estructurado
Fase 4	Añadir scoring avanzado y reglas de correlación temporal y geográfica	Priorización más precisa y detección de patrones complejos
Fase 5	Integración con dashboards BI, grafos y alertas	Visualización de redes, cronologías y focos de riesgo
Fase 6	Gobierno del dato, auditoría completa y segregación por perfiles	Entorno apto para compliance, forensics o investigación interna avanzada

Versión futura recomendada: motor relacional MySQL 8 como núcleo, capa ETL para ingesta, NLP para extracción, cuadro de mando para explotación, y motor grafo complementario para análisis de red cuando la densidad relacional crezca.

9. Conclusión

Una base investigativa bien diseñada no solo conserva información: la convierte en inteligencia explotable. La combinación de MySQL 8 con taxonomía, folksonomía, evidencias, relaciones y scoring básico permite pasar de un archivo documental pasivo a un entorno donde la reconstrucción de hechos, la detección de conexiones y la formulación de hipótesis se vuelven mucho más sólidas, auditables y repetibles.

Este modelo está pensado como una base inicial. Puede ampliarse con procedimientos almacenados, tablas de auditoría, control de acceso por roles, deduplicación de entidades, geocálculo, pipelines NLP para extracción automática de nombres y números, o incluso integración híbrida con motores grafo. Pero incluso en su versión actual ya ofrece una arquitectura operativa y coherente para resolver casos e investigar hechos pasados con una lógica más estructurada.

Disclaimer

Este documento es un modelo conceptual y técnico con fines analíticos. No constituye evidencia legal ni determina responsabilidad. El scoring es orientativo y requiere validación humana. Todo uso operativo debe ajustarse al marco legal, a la gobernanza del dato y a la normativa interna de la organización.

Search This Blog

BLOG #TECH