AgentOp
Agent Template

CSV Data Cleaner

The CSV Data Cleaner is a fully browser-based AI agent that turns messy, inconsistent spreadsheet data into clean, analysis-ready CSV files — without uploading your data to any external server. Upload your CSV by clicking or dragging and dropping, and the agent immediately runs a Python-based quality audit using pandas-style logic via Pyodide (Python compiled to WebAssembly). The automated issue detection scans every column for four categories of data quality problems: missing values (flagged HIGH if more than 20% of rows are empty, MEDIUM otherwise), duplicate rows (exact matches across all fields), inconsistent date formats (e.g. a column mixing YYYY-MM-DD, MM/DD/YYYY, and DD-MM-YYYY), and whitespace issues (leading or trailing spaces that break joins and lookups). Every issue is displayed with a severity badge before you do anything. From there, you have three ways to clean. Quick actions — Remove Duplicates, Fill Missing Values, Standardize Formats, Clean Whitespace — run with one click and log exactly what changed. AI Suggestions asks the language model to inspect your column names, data types, and detected issues and return 5–8 prioritised cleaning recommendations (e.g. "normalise phone numbers in the contact column", "convert revenue to float and fill nulls with 0"), each with an Apply button. Custom tasks let you describe any transformation in plain English — "split the full_name column into first and last", "replace all blank cells in status with 'pending'", "remove rows where age is over 120" — and the agent generates and executes the Python code in-browser. All cleaning steps are logged with timestamps in the activity panel. When finished, download the result as a clean CSV file. Compatible with GPT-4o, Claude 3.5 Sonnet, or a fully local WebLLM model. No data ever leaves your machine.

csv data-cleaning data-quality ai transformation missing-values duplicate-removal date-standardization whitespace pandas pyodide data-wrangling spreadsheet-cleaner browser-python no-upload
ozzo Nov 04, 2025

Preview Mode

This is a preview with sample data. The template uses placeholders like which will be replaced with actual agent data.

About This Template

CSV Data Cleaner is a browser-executable AI agent template built on AgentOp. It runs entirely in the browser using Python (via Pyodide) and can be deployed without a server — just download the generated HTML file and open it locally or host it anywhere.

Topics csv data-cleaning data-quality ai transformation missing-values duplicate-removal date-standardization whitespace pandas pyodide data-wrangling spreadsheet-cleaner browser-python no-upload
Template Preview

Template Metadata

Slug
csv-data-cleaner
Created By
ozzo
Created
Nov 04, 2025
Usage Count
0

Tags

csv data-cleaning data-quality ai transformation missing-values duplicate-removal date-standardization whitespace pandas pyodide data-wrangling spreadsheet-cleaner browser-python no-upload

Code Statistics

HTML Lines
164
CSS Lines
92
JS Lines
70
Python Lines
475

Source Code

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>{{agent_name}}</title>
    <script src="https://cdn.tailwindcss.com"></script>
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/font-awesome/6.4.0/css/all.min.css">
</head>
<body class="bg-gradient-to-br from-blue-50 to-indigo-100 min-h-screen">
    <div class="container mx-auto px-4 py-8 max-w-6xl">

        <!-- Header -->
        <div class="bg-white rounded-lg shadow-lg p-6 mb-6">
            <div class="flex items-center justify-between flex-wrap gap-4">
                <div>
                    <h1 class="text-3xl font-bold text-gray-800 mb-2">
                        <i class="fas fa-broom text-indigo-600 mr-2"></i>{{agent_name}}
                    </h1>
                    <p class="text-gray-600">{{description}}</p>
                </div>
                <div class="text-right">
                    <div class="text-sm text-gray-500">Provider: <span class="font-semibold">{{provider}}</span></div>
                    <div class="text-sm text-gray-500">Model: <span class="font-semibold">{{model_name}}</span></div>
                </div>
            </div>
        </div>

        <!-- Main Grid -->
        <div class="grid grid-cols-1 lg:grid-cols-2 gap-6">

            <!-- LEFT: Upload + Preview + Issues -->
            <div class="bg-white rounded-lg shadow-lg p-6 flex flex-col gap-6">

                <h2 class="text-xl font-bold text-gray-800">
                    <i class="fas fa-upload mr-2"></i>Upload CSV File
                </h2>

                <!-- Drop zone -->
                <div>
                    <label id="dropZone" class="drop-zone">
                        <div class="flex flex-col items-center justify-center h-full">
                            <i class="fas fa-cloud-upload-alt text-4xl text-indigo-400 mb-2"></i>
                            <p class="text-sm text-gray-600"><span class="font-semibold">Click to upload</span> or drag and drop</p>
                            <p class="text-xs text-gray-400 mt-1">CSV files only</p>
                        </div>
                        <input id="fileInput" type="file" accept=".csv" class="hidden" />
                    </label>
                    <div id="fileName" class="mt-2 text-sm text-indigo-700 font-medium hidden"></div>
                </div>

                <!-- Data Preview -->
                <div id="previewSection" class="hidden">
                    <h3 class="text-base font-semibold text-gray-700 mb-2">Data Preview</h3>
                    <div class="bg-gray-50 rounded-lg border border-gray-200 overflow-auto max-h-64">
                        <table class="min-w-full text-sm">
                            <thead id="previewHead" class="bg-indigo-50 sticky top-0"></thead>
                            <tbody id="previewBody" class="divide-y divide-gray-100"></tbody>
                        </table>
                    </div>
                    <p class="mt-2 text-sm text-gray-500">
                        <span id="rowCount" class="font-semibold text-gray-700"></span> rows &nbsp;·&nbsp;
                        <span id="colCount" class="font-semibold text-gray-700"></span> columns
                    </p>
                </div>

                <!-- Issues -->
                <div id="issuesSection" class="hidden">
                    <h3 class="text-base font-semibold text-gray-700 mb-2">
                        <i class="fas fa-exclamation-triangle text-yellow-500 mr-1"></i>Issues Detected
                    </h3>
                    <div id="issuesList" class="flex flex-col gap-2"></div>
                </div>

            </div>

            <!-- RIGHT: Operations -->
            <div class="bg-white rounded-lg shadow-lg p-6 flex flex-col gap-6">

                <h2 class="text-xl font-bold text-gray-800">
                    <i class="fas fa-magic mr-2"></i>Cleaning Operations
                </h2>

                <!-- Quick Actions -->
                <div>
                    <h3 class="text-sm font-semibold text-gray-600 uppercase tracking-wide mb-2">Quick Actions</h3>
                    <div class="grid grid-cols-2 gap-2">
                        <button class="btn-action" onclick="handleQuickAction('remove_duplicates')">
                            <i class="fas fa-copy mr-1"></i>Remove Duplicates
                        </button>
                        <button class="btn-action" onclick="handleQuickAction('fill_missing')">
                            <i class="fas fa-fill-drip mr-1"></i>Fill Missing Values
                        </button>
                        <button class="btn-action" onclick="handleQuickAction('standardize_formats')">
                            <i class="fas fa-calendar-alt mr-1"></i>Standardize Dates
                        </button>
                        <button class="btn-action" onclick="handleQuickAction('clean_whitespace')">
                            <i class="fas fa-eraser mr-1"></i>Clean Whitespace
                        </button>
                    </div>
                </div>

                <!-- AI Suggestions -->
                <div>
                    <button id="getSuggestionsBtn" onclick="getAISuggestions()"
                            class="w-full bg-gradient-to-r from-purple-600 to-indigo-600 hover:from-purple-700 hover:to-indigo-700 text-white px-4 py-3 rounded-lg font-semibold transition flex items-center justify-center gap-2">
                        <i class="fas fa-brain"></i>Get AI Suggestions
                    </button>
                </div>

                <!-- Suggestions list -->
                <div id="suggestionsSection" class="hidden">
                    <h3 class="text-base font-semibold text-gray-700 mb-2">AI Suggestions</h3>
                    <div id="suggestionsList" class="flex flex-col gap-2 max-h-56 overflow-y-auto pr-1"></div>
                </div>

                <!-- Custom Task -->
                <div>
                    <h3 class="text-sm font-semibold text-gray-600 uppercase tracking-wide mb-2">Custom Cleaning Task</h3>
                    <textarea id="customQuery" rows="3"
                        placeholder="e.g. 'Convert all dates to YYYY-MM-DD', 'lowercase the email column', 'remove rows where revenue is 0'"
                        class="w-full px-3 py-2 border border-gray-300 rounded-lg text-sm focus:outline-none focus:ring-2 focus:ring-indigo-400 resize-none"></textarea>
                    <button onclick="executeCustomCleaning()"
                            class="mt-2 w-full bg-indigo-600 hover:bg-indigo-700 text-white px-4 py-2 rounded-lg transition text-sm font-semibold flex items-center justify-center gap-2">
                        <i class="fas fa-play"></i>Execute
                    </button>
                </div>

                <!-- Download -->
                <div id="downloadSection" class="hidden">
                    <button onclick="downloadCleanedCSV()"
                            class="w-full bg-green-600 hover:bg-green-700 text-white px-4 py-3 rounded-lg font-semibold transition flex items-center justify-center gap-2">
                        <i class="fas fa-download"></i>Download Cleaned CSV
                    </button>
                </div>

            </div>
        </div>

        <!-- Activity Log -->
        <div class="bg-white rounded-lg shadow-lg p-6 mt-6">
            <h2 class="text-xl font-bold text-gray-800 mb-4">
                <i class="fas fa-history mr-2"></i>Activity Log
            </h2>
            <div id="activityLog" class="flex flex-col gap-1 max-h-48 overflow-y-auto text-sm">
                <p class="text-gray-400 italic">No activities yet. Upload a CSV file to begin.</p>
            </div>
        </div>

    </div>

    <!-- Loading Overlay -->
    <div id="loadingOverlay" class="hidden fixed inset-0 bg-black bg-opacity-50 flex items-center justify-center z-50">
        <div class="bg-white rounded-xl p-8 flex flex-col items-center gap-4 shadow-2xl">
            <div class="animate-spin rounded-full h-14 w-14 border-4 border-indigo-200 border-t-indigo-600"></div>
            <p id="loadingText" class="text-gray-700 font-semibold text-sm">Processing...</p>
        </div>
    </div>

    <script type="text/python">
{{python_code}}
    </script>
</body>
</html>
/* Drop zone */
.drop-zone {
    display: flex;
    align-items: center;
    justify-content: center;
    width: 100%;
    height: 8rem;
    border: 2px dashed #a5b4fc;
    border-radius: 0.75rem;
    background-color: #eef2ff;
    cursor: pointer;
    transition: background-color 0.15s, border-color 0.15s;
}
.drop-zone:hover,
.drop-zone.drag-over {
    background-color: #e0e7ff;
    border-color: #6366f1;
}

/* Quick action buttons */
.btn-action {
    display: flex;
    align-items: center;
    justify-content: center;
    width: 100%;
    padding: 0.5rem 0.75rem;
    font-size: 0.8125rem;
    font-weight: 500;
    color: #374151;
    background-color: #f9fafb;
    border: 1px solid #d1d5db;
    border-radius: 0.5rem;
    cursor: pointer;
    transition: background-color 0.15s, border-color 0.15s;
    white-space: nowrap;
    gap: 0.375rem;
}
.btn-action:hover {
    background-color: #f3f4f6;
    border-color: #9ca3af;
}
.btn-action:disabled {
    opacity: 0.5;
    cursor: not-allowed;
}

/* Issue severity badges */
.issue-badge {
    display: inline-flex;
    align-items: center;
    padding: 0.15rem 0.5rem;
    border-radius: 0.25rem;
    font-size: 0.7rem;
    font-weight: 700;
    letter-spacing: 0.04em;
    white-space: nowrap;
    flex-shrink: 0;
}
.issue-high   { background-color: #fee2e2; color: #991b1b; }
.issue-medium { background-color: #fef3c7; color: #92400e; }
.issue-low    { background-color: #dbeafe; color: #1e40af; }

/* Priority labels in suggestions */
.priority-high   { color: #dc2626; font-weight: 600; }
.priority-medium { color: #d97706; font-weight: 600; }
.priority-low    { color: #2563eb; font-weight: 600; }

/* Suggestion card apply button */
.btn-apply {
    flex-shrink: 0;
    padding: 0.25rem 0.75rem;
    font-size: 0.75rem;
    font-weight: 600;
    color: #fff;
    background-color: #4f46e5;
    border-radius: 0.375rem;
    border: none;
    cursor: pointer;
    transition: background-color 0.15s;
}
.btn-apply:hover { background-color: #4338ca; }

/* Scrollbar styling */
#activityLog::-webkit-scrollbar,
#suggestionsList::-webkit-scrollbar,
.overflow-auto::-webkit-scrollbar { width: 4px; height: 4px; }
#activityLog::-webkit-scrollbar-track,
#suggestionsList::-webkit-scrollbar-track,
.overflow-auto::-webkit-scrollbar-track { background: #f1f5f9; }
#activityLog::-webkit-scrollbar-thumb,
#suggestionsList::-webkit-scrollbar-thumb,
.overflow-auto::-webkit-scrollbar-thumb { background: #cbd5e1; border-radius: 2px; }
// ─── Drag-and-drop enhancement ───────────────────────────────────────────────
const dropZone = document.getElementById('dropZone');
const fileInput = document.getElementById('fileInput');

['dragenter', 'dragover'].forEach(evt =>
    dropZone.addEventListener(evt, e => {
        e.preventDefault();
        dropZone.classList.add('drag-over');
    })
);
['dragleave', 'drop'].forEach(evt =>
    dropZone.addEventListener(evt, e => {
        e.preventDefault();
        dropZone.classList.remove('drag-over');
    })
);
dropZone.addEventListener('drop', e => {
    const file = e.dataTransfer.files[0];
    if (file && file.name.endsWith('.csv')) {
        // Inject into the file input so Python's change handler fires
        const dt = new DataTransfer();
        dt.items.add(file);
        fileInput.files = dt.files;
        fileInput.dispatchEvent(new Event('change'));
    }
});

// ─── Keyboard shortcut: Ctrl+Enter fires Execute ─────────────────────────────
document.getElementById('customQuery').addEventListener('keydown', e => {
    if ((e.ctrlKey || e.metaKey) && e.key === 'Enter') {
        e.preventDefault();
        if (typeof executeCustomCleaning === 'function') executeCustomCleaning();
    }
});

// ─── Guard: show toast if no file uploaded yet ───────────────────────────────
function requireFile(label) {
    const fn = document.getElementById('fileName');
    if (fn.classList.contains('hidden')) {
        showToast(`Upload a CSV file before using "${label}"`, 'warning');
        return false;
    }
    return true;
}

// ─── Lightweight toast notification ─────────────────────────────────────────
function showToast(message, type = 'info') {
    const colors = {
        info:    'bg-blue-600',
        success: 'bg-green-600',
        warning: 'bg-yellow-500',
        error:   'bg-red-600'
    };
    const toast = document.createElement('div');
    toast.className = `fixed bottom-6 right-6 z-50 text-white text-sm font-semibold px-5 py-3 rounded-lg shadow-lg transition-opacity ${colors[type] || colors.info}`;
    toast.textContent = message;
    document.body.appendChild(toast);
    setTimeout(() => {
        toast.style.opacity = '0';
        setTimeout(() => toast.remove(), 400);
    }, 2800);
}

// ─── Expose applySuggestion globally for Python-rendered onclick buttons ─────
// Python will set window._ai_suggestions; this wrapper calls the Python proxy.
function applySuggestion(index) {
    if (typeof window._applySuggestionProxy === 'function') {
        window._applySuggestionProxy(index);
    }
}
import json
import csv
import io
import re
from datetime import datetime
from collections import Counter
from js import document, console, Blob, URL, window
from pyodide.ffi import create_proxy

# ============================================================================
# Global State
# ============================================================================
current_data = None
original_data = None
column_names = []
cleaning_history = []

# ============================================================================
# CSV Parsing & Issue Detection (deterministic — no LLM needed)
# ============================================================================
class CSVDataHandler:
    @staticmethod
    def parse_csv(csv_content):
        try:
            reader = csv.DictReader(io.StringIO(csv_content))
            data = list(reader)
            columns = reader.fieldnames if reader.fieldnames else []
            return data, columns
        except Exception as e:
            log_activity(f"Error parsing CSV: {str(e)}", "error")
            return None, None

    @staticmethod
    def detect_issues(data, columns):
        issues = []
        if not data:
            return issues

        for col in columns:
            missing_count = sum(1 for row in data if not row.get(col) or str(row.get(col)).strip() == '')
            if missing_count > 0:
                pct = (missing_count / len(data)) * 100
                issues.append({
                    'type': 'missing_values',
                    'severity': 'high' if pct > 20 else 'medium',
                    'column': col,
                    'count': missing_count,
                    'message': f"{col}: {missing_count} missing values ({pct:.1f}%)"
                })

        seen = set()
        dupes = 0
        for row in data:
            key = tuple(sorted(row.items()))
            if key in seen:
                dupes += 1
            else:
                seen.add(key)
        if dupes > 0:
            issues.append({
                'type': 'duplicates', 'severity': 'medium',
                'count': dupes, 'message': f"Found {dupes} duplicate rows"
            })

        date_patterns = [
            r'\d{4}-\d{2}-\d{2}',
            r'\d{2}/\d{2}/\d{4}',
            r'\d{2}-\d{2}-\d{4}',
        ]
        for col in columns:
            values = [str(row.get(col, '')).strip() for row in data if row.get(col)]
            formats_found = set()
            for val in values[:100]:
                for p in date_patterns:
                    if re.match(p, val):
                        formats_found.add(p)
            if len(formats_found) > 1:
                issues.append({
                    'type': 'inconsistent_format', 'severity': 'low',
                    'column': col,
                    'message': f"{col}: Inconsistent date formats detected"
                })

        whitespace_cols = [
            col for col in columns
            if any(str(row.get(col, '')).startswith(' ') or str(row.get(col, '')).endswith(' ')
                   for row in data[:100] if row.get(col))
        ]
        if whitespace_cols:
            issues.append({
                'type': 'whitespace', 'severity': 'low',
                'columns': whitespace_cols,
                'message': f"Whitespace issues in: {', '.join(whitespace_cols[:3])}"
            })

        return issues

# ============================================================================
# Tool Functions (called by AgentOp's LLM dispatcher)
# ============================================================================
async def get_cleaning_suggestions(columns: str, row_count: str, issues_json: str) -> str:
    """Analyze CSV column names, row count, and detected issues, then return 5-8 prioritised
    data cleaning suggestions as a JSON array. Each suggestion must have: action (snake_case),
    description (plain English), priority (high/medium/low), target (column name or 'all')."""
    # This tool description is sent to the LLM — the LLM fills in its response
    # via the callLLM bridge. We return a placeholder that the bridge replaces.
    return json.dumps({
        "columns": columns,
        "row_count": row_count,
        "issues": issues_json
    })

async def generate_cleaning_code(instruction: str, columns: str, sample_rows: str) -> str:
    """Given a plain-English cleaning instruction, the CSV column names, and sample rows,
    generate Python code that cleans a list-of-dicts named 'data' and stores the result
    in 'cleaned_data'. Return ONLY executable Python code with no markdown fences."""
    return json.dumps({
        "instruction": instruction,
        "columns": columns,
        "sample": sample_rows
    })

def get_tool_schemas():
    return [
        {
            "type": "function",
            "function": {
                "name": "get_cleaning_suggestions",
                "description": "Analyze CSV metadata and detected issues to produce prioritised cleaning suggestions as JSON.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "columns": {"type": "string", "description": "Comma-separated column names"},
                        "row_count": {"type": "string", "description": "Number of rows in the dataset"},
                        "issues_json": {"type": "string", "description": "JSON string of detected issues"}
                    },
                    "required": ["columns", "row_count", "issues_json"]
                }
            }
        },
        {
            "type": "function",
            "function": {
                "name": "generate_cleaning_code",
                "description": "Generate Python code to execute a plain-English CSV cleaning instruction on a list of dicts.",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "instruction": {"type": "string", "description": "The cleaning task in plain English"},
                        "columns": {"type": "string", "description": "Comma-separated column names"},
                        "sample_rows": {"type": "string", "description": "JSON of first 5 rows for context"}
                    },
                    "required": ["instruction", "columns", "sample_rows"]
                }
            }
        }
    ]

# ============================================================================
# Deterministic Cleaning Operations (no LLM needed)
# ============================================================================
def remove_duplicates(data):
    seen, cleaned = set(), []
    for row in data:
        key = tuple(sorted(row.items()))
        if key not in seen:
            seen.add(key)
            cleaned.append(row)
    return cleaned

def fill_missing_values(data, columns):
    for col in columns:
        values = [row[col] for row in data if row.get(col) and str(row[col]).strip()]
        if not values:
            continue
        try:
            nums = [float(v) for v in values]
            fill = f"{sum(nums)/len(nums):.2f}"
        except:
            fill = Counter(values).most_common(1)[0][0]
        for row in data:
            if not row.get(col) or str(row[col]).strip() == '':
                row[col] = fill
    return data

def standardize_formats(data, columns):
    for col in columns:
        for row in data:
            val = str(row.get(col, '')).strip()
            if not val:
                continue
            for fmt in ['%m/%d/%Y', '%d-%m-%Y', '%Y-%m-%d', '%m-%d-%Y']:
                try:
                    from datetime import datetime as dt
                    row[col] = dt.strptime(val, fmt).strftime('%Y-%m-%d')
                    break
                except:
                    continue
    return data

def clean_whitespace(data, columns):
    for row in data:
        for col in columns:
            if row.get(col):
                row[col] = str(row[col]).strip()
    return data

# ============================================================================
# UI Helpers
# ============================================================================
def log_activity(message, level="info"):
    log_div = document.getElementById("activityLog")
    if log_div.children.length == 1 and "No activities" in log_div.innerHTML:
        log_div.innerHTML = ""
    colors = {"info":"text-blue-600","success":"text-green-600","error":"text-red-600","warning":"text-yellow-600"}
    icons  = {"info":"fa-info-circle","success":"fa-check-circle","error":"fa-exclamation-circle","warning":"fa-exclamation-triangle"}
    ts = datetime.now().strftime("%H:%M:%S")
    entry = document.createElement("div")
    entry.className = "text-sm py-1"
    entry.innerHTML = (f'<span class="text-gray-400">[{ts}]</span> '
                       f'<i class="fas {icons.get(level,"fa-circle")} {colors.get(level,"text-gray-600")} mr-1"></i>'
                       f'<span class="{colors.get(level,"text-gray-600")}">{message}</span>')
    log_div.appendChild(entry)
    log_div.scrollTop = log_div.scrollHeight

def show_loading(msg="Processing..."):
    document.getElementById("loadingOverlay").classList.remove("hidden")
    document.getElementById("loadingText").textContent = msg

def hide_loading():
    document.getElementById("loadingOverlay").classList.add("hidden")

def update_preview():
    if not current_data or not column_names:
        return
    document.getElementById("rowCount").textContent = str(len(current_data))
    document.getElementById("colCount").textContent = str(len(column_names))
    thead = document.getElementById("previewHead")
    tbody = document.getElementById("previewBody")
    thead.innerHTML = tbody.innerHTML = ""
    hr = document.createElement("tr")
    for col in column_names:
        th = document.createElement("th")
        th.className = "px-3 py-2 text-left text-xs font-semibold text-gray-700"
        th.textContent = col
        hr.appendChild(th)
    thead.appendChild(hr)
    for rd in current_data[:10]:
        row = document.createElement("tr")
        for col in column_names:
            td = document.createElement("td")
            td.className = "px-3 py-2 text-xs text-gray-600"
            td.textContent = str(rd.get(col, ''))[:50]
            row.appendChild(td)
        tbody.appendChild(row)
    document.getElementById("previewSection").classList.remove("hidden")

def display_issues(issues):
    if not issues:
        return
    lst = document.getElementById("issuesList")
    lst.innerHTML = ""
    for issue in issues:
        div = document.createElement("div")
        div.className = "bg-gray-50 rounded p-3 border-l-4 border-yellow-400"
        div.innerHTML = (f'<div class="flex items-start">'
                         f'<span class="issue-badge issue-{issue["severity"]} mr-2">{issue["severity"].upper()}</span>'
                         f'<p class="text-sm text-gray-700">{issue["message"]}</p></div>')
        lst.appendChild(div)
    document.getElementById("issuesSection").classList.remove("hidden")
    log_activity(f"Detected {len(issues)} data quality issues", "warning")

def display_suggestions(suggestions_json):
    try:
        suggestions = json.loads(suggestions_json)
    except:
        log_activity("Could not parse AI suggestions", "warning")
        return
    div = document.getElementById("suggestionsList")
    div.innerHTML = ""
    priority_colors = {"high":"text-red-600","medium":"text-yellow-600","low":"text-blue-600"}
    for i, s in enumerate(suggestions):
        item = document.createElement("div")
        item.className = "bg-gradient-to-r from-purple-50 to-indigo-50 rounded-lg p-3 border border-indigo-200"
        pc = priority_colors.get(s.get('priority','low'), 'text-gray-600')
        item.innerHTML = (f'<div class="flex items-start justify-between">'
                          f'<div class="flex-1">'
                          f'<h4 class="font-semibold text-sm text-gray-800">{s.get("action","").replace("_"," ").title()}</h4>'
                          f'<p class="text-xs text-gray-600 mt-1">{s.get("description","")}</p>'
                          f'<span class="text-xs {pc} font-semibold mt-1 inline-block">Priority: {s.get("priority","low").upper()}</span>'
                          f'</div>'
                          f'<button onclick="applySuggestion({i})" class="ml-2 bg-indigo-600 hover:bg-indigo-700 text-white text-xs px-3 py-1 rounded">Apply</button>'
                          f'</div>')
        div.appendChild(item)
    window._ai_suggestions = suggestions
    document.getElementById("suggestionsSection").classList.remove("hidden")
    log_activity(f"Generated {len(suggestions)} AI suggestions", "success")

# ============================================================================
# Event Handlers
# ============================================================================
async def handle_file_upload(event):
    global current_data, original_data, column_names
    show_loading("Loading CSV file...")
    try:
        file = event.target.files.item(0)
        if not file:
            return
        buf = await file.arrayBuffer()
        csv_content = buf.to_bytes().decode('utf-8')
        data, columns = CSVDataHandler.parse_csv(csv_content)
        if data is None:
            return
        current_data = data
        original_data = [row.copy() for row in data]
        column_names = columns
        document.getElementById("fileName").textContent = f"Loaded: {file.name}"
        document.getElementById("fileName").classList.remove("hidden")
        update_preview()
        issues = CSVDataHandler.detect_issues(data, columns)
        if issues:
            display_issues(issues)
        document.getElementById("downloadSection").classList.remove("hidden")
        log_activity(f"Loaded {file.name}: {len(data)} rows, {len(columns)} columns", "success")
    except Exception as e:
        log_activity(f"Error loading file: {str(e)}", "error")
    finally:
        hide_loading()

async def handle_quick_action(action):
    global current_data
    if not current_data:
        log_activity("Please upload a CSV file first", "warning")
        return
    show_loading(f"Applying {action.replace('_',' ')}...")
    try:
        before = len(current_data)
        if action == "remove_duplicates":
            current_data = remove_duplicates(current_data)
            log_activity(f"Removed {before - len(current_data)} duplicate rows", "success")
        elif action == "fill_missing":
            current_data = fill_missing_values(current_data, column_names)
            log_activity("Filled missing values", "success")
        elif action == "standardize_formats":
            current_data = standardize_formats(current_data, column_names)
            log_activity("Standardized date formats to YYYY-MM-DD", "success")
        elif action == "clean_whitespace":
            current_data = clean_whitespace(current_data, column_names)
            log_activity("Cleaned leading/trailing whitespace", "success")
        update_preview()
        cleaning_history.append(action)
    except Exception as e:
        log_activity(f"Error during {action}: {str(e)}", "error")
    finally:
        hide_loading()

async def get_ai_suggestions_handler():
    global current_data, column_names
    if not current_data:
        log_activity("Please upload a CSV file first", "warning")
        return
    show_loading("Getting AI suggestions...")
    try:
        issues = CSVDataHandler.detect_issues(current_data, column_names)
        pyodide_instance = window.pyodide
        pyodide_instance.globals.set('_cols', ', '.join(column_names))
        pyodide_instance.globals.set('_rows', str(len(current_data)))
        pyodide_instance.globals.set('_issues', json.dumps(issues))
        result = await pyodide_instance.runPythonAsync(
            'await process_user_query(f"Analyze this CSV with columns: {_cols}, {_rows} rows. Issues: {_issues}. Return cleaning suggestions as JSON array.")'
        )
        if result:
            display_suggestions(result)
    except Exception as e:
        log_activity(f"Error getting suggestions: {str(e)}", "error")
    finally:
        hide_loading()

async def execute_custom_cleaning_handler():
    global current_data
    if not current_data:
        log_activity("Please upload a CSV file first", "warning")
        return
    query = document.getElementById("customQuery").value.strip()
    if not query:
        log_activity("Please enter a cleaning instruction", "warning")
        return
    show_loading("Executing custom cleaning...")
    try:
        sample = json.dumps(current_data[:5])
        pyodide_instance = window.pyodide
        pyodide_instance.globals.set('_instruction', query)
        pyodide_instance.globals.set('_columns', ', '.join(column_names))
        pyodide_instance.globals.set('_sample', sample)
        code_result = await pyodide_instance.runPythonAsync(
            'await process_user_query(f"Generate Python code to clean this CSV. Instruction: {_instruction}. Columns: {_columns}. Sample rows: {_sample}. Return ONLY executable Python code. Input is list-of-dicts named \'data\', output must be \'cleaned_data\'.")'
        )
        if code_result:
            code = re.sub(r'```python\n?|```', '', code_result).strip()
            local_vars = {'data': current_data}
            exec(code, {}, local_vars)
            if 'cleaned_data' in local_vars:
                current_data = local_vars['cleaned_data']
                update_preview()
                log_activity(f"Executed: {query}", "success")
        document.getElementById("customQuery").value = ""
    except Exception as e:
        log_activity(f"Error executing cleaning: {str(e)}", "error")
    finally:
        hide_loading()

async def apply_suggestion_handler(index):
    global current_data
    suggestions = getattr(window, '_ai_suggestions', None)
    if not suggestions or index >= len(suggestions):
        return
    s = suggestions[index]
    show_loading(f"Applying: {s.get('action','')}")
    try:
        instruction = f"{s.get('action','')}: {s.get('description','')}"
        sample = json.dumps(current_data[:5])
        pyodide_instance = window.pyodide
        pyodide_instance.globals.set('_instruction', instruction)
        pyodide_instance.globals.set('_columns', ', '.join(column_names))
        pyodide_instance.globals.set('_sample', sample)
        code_result = await pyodide_instance.runPythonAsync(
            'await process_user_query(f"Generate Python code to: {_instruction}. Columns: {_columns}. Sample: {_sample}. Return ONLY Python. Input=\'data\' (list of dicts), output=\'cleaned_data\'.")'
        )
        if code_result:
            code = re.sub(r'```python\n?|```', '', code_result).strip()
            local_vars = {'data': current_data}
            exec(code, {}, local_vars)
            if 'cleaned_data' in local_vars:
                current_data = local_vars['cleaned_data']
                update_preview()
                log_activity(f"Applied: {s.get('action','')}", "success")
    except Exception as e:
        log_activity(f"Error applying suggestion: {str(e)}", "error")
    finally:
        hide_loading()

def download_cleaned_csv():
    global current_data, column_names
    if not current_data:
        log_activity("No data to download", "warning")
        return
    try:
        out = io.StringIO()
        writer = csv.DictWriter(out, fieldnames=column_names)
        writer.writeheader()
        writer.writerows(current_data)
        blob = Blob.new([out.getvalue()], {"type": "text/csv"})
        url = URL.createObjectURL(blob)
        link = document.createElement("a")
        link.href = url
        link.download = f"cleaned_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.csv"
        link.click()
        URL.revokeObjectURL(url)
        log_activity("Downloaded cleaned CSV file", "success")
    except Exception as e:
        log_activity(f"Error downloading file: {str(e)}", "error")

# ============================================================================
# Setup
# ============================================================================
def setup():
    log_activity("CSV Data Cleaner initialized", "info")
    document.getElementById("fileInput").addEventListener("change", create_proxy(handle_file_upload))
    window.handleQuickAction    = create_proxy(handle_quick_action)
    window.getAISuggestions     = create_proxy(get_ai_suggestions_handler)
    window.executeCustomCleaning = create_proxy(execute_custom_cleaning_handler)
    window.applySuggestion      = create_proxy(apply_suggestion_handler)
    window.downloadCleanedCSV   = create_proxy(download_cleaned_csv)

setup()