# LivePulse — Session Changelog
**Date:** April 16, 2026  
**Session:** HF Spaces Deployment Debugging & Fixes

---

## Summary

This session was entirely focused on getting the deployed LivePulse app on Hugging Face Spaces (`huggingface.co/spaces/Divyonko/LivePulse`) to actually work end-to-end — from scraping YouTube live chat to displaying analytics in the dashboard.

---

## Issues Found & Fixed (in order)

### 1. Missing `return None` in `_get_live_chat_id`
**File:** `app.py`  
**Problem:** The `except` block in `_get_live_chat_id` was missing `return None`, meaning on an exception the function could fall through with undefined behavior.  
**Fix:** Added explicit `return None` in the `except` block.

---

### 2. No logging output visible in HF Spaces logs
**File:** `app.py`  
**Problem:** Python's root logger defaults to WARNING level. All our `logger.info()` calls were silently dropped — nothing useful appeared in the logs.  
**Fix:** Added `logging.basicConfig(level=logging.INFO, force=True)` so all INFO and above messages appear in HF Spaces logs.

---

### 3. Torchvision warnings flooding the logs
**File:** `Dockerfile`  
**Problem:** Streamlit's file watcher scans all imported modules including `transformers`, which tries to import `torchvision` (not installed). This produced hundreds of `ModuleNotFoundError: No module named 'torchvision'` lines, making real errors impossible to find.  
**Fix:** Added `ENV STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` to the Dockerfile to disable the file watcher entirely.

---

### 4. Improved HTTP error logging in `_get_live_chat_id`
**File:** `app.py`  
**Problem:** Generic `except Exception` swallowed the actual YouTube API error body (e.g. "API key invalid", "quota exceeded").  
**Fix:** Added a separate `urllib.error.HTTPError` handler that reads and logs the full error response body, making API failures immediately diagnosable.

---

### 5. API key presence logging
**File:** `app.py`  
**Problem:** No way to confirm whether the `YOUTUBE_API_KEY` secret was actually being read from HF Spaces environment.  
**Fix:** Added `logger.info("YOUTUBE_API_KEY present: %s (length=%d)", ...)` at scraper thread start.

---

### 6. Chat message fetch logging
**File:** `app.py`  
**Problem:** No confirmation that `liveChat/messages` API calls were succeeding.  
**Fix:** Added `logger.info("Fetched %d chat messages ...")` after each successful API poll.

---

### 7. `@st.cache_data` on `load_stream_data` returning stale empty results
**File:** `app.py`  
**Problem:** `load_stream_data` was decorated with `@st.cache_data(ttl=5)`. The cache key was just `redis_key` (a constant string), so it cached the first result (empty list) and kept returning it even after the scraper had written messages. Attempted fix with `_store_len` cache-busting parameter failed because Streamlit ignores parameters prefixed with `_` for hashing purposes.  
**Fix:** Removed `@st.cache_data` entirely from `load_stream_data`. Since the store is in-memory (later SQLite), there is zero I/O cost to reading it directly on every rerun.

---

### 8. Scraper thread blocking on ML inference for 60+ backlog messages
**File:** `app.py`  
**Problem:** On startup, the YouTube API returns a backlog of 50-70 messages from the last few minutes. The scraper was running full ML inference (MuRIL + XLM-R + BART = 3 models × 60 messages = 180 forward passes on CPU) before writing a single message to the store. This took several minutes, during which the UI showed "No messages yet" and users kept clicking Start again, killing and restarting the thread.  
**Fix:** Added `is_first_page` flag. On the first API page (backlog), messages are stored immediately with `Neutral/General` placeholder sentiment so the UI shows data within seconds. Full ML inference only runs on subsequent pages (new live messages, typically 5-15 at a time).

---

### 9. Per-message ML inference error logging
**File:** `app.py`  
**Problem:** If `predict_sentiment` or `predict_topic` threw an exception for a specific message, it was silently caught by `_safe_sentiment`/`_safe_topic` with no indication of which message failed or why.  
**Fix:** Added explicit `try/except` with `logger.error("ML inference failed for text=%r: %s", ...)` around each message's inference call in the scraper loop.

---

### 10. Root cause: In-memory store not shared across Streamlit worker processes
**File:** `app.py`  
**Problem:** This was the fundamental bug causing "No messages yet" despite the scraper working correctly. HF Spaces runs Streamlit with multiple worker processes. The scraper thread ran in worker process A and wrote to `_STORE` (a Python `dict` in that process's RAM). Browser requests were served by worker process B, which had its own separate empty `_STORE`. The two processes never shared memory — the UI always saw zero messages regardless of how many the scraper had collected.  
**Fix:** Replaced the entire in-memory `deque`-based store with **SQLite** at `/tmp/livepulse.db`. SQLite is a file on disk that all worker processes in the container share. The scraper writes to it; any worker serving the UI reads from the same file. All store functions (`store_rpush`, `store_lrange`, `store_llen`, `store_delete`) were rewritten to use SQLite queries with a threading lock.

---

## Files Changed

| File | Changes |
|------|---------|
| `app.py` | SQLite store, logging setup, backlog fix, cache removal, HTTP error handling, `return None` fix |
| `Dockerfile` | Added `STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` |

---

## What Was NOT Changed

- All dashboard features preserved: charts, alerts, word cloud, engagement score, leaderboard, multi-stream comparison, pinned messages, sentiment heatmap, topic distribution, confidence trend, CSV export
- ML models unchanged: MuRIL + XLM-R + BART ensemble still runs on new messages
- YouTube Data API v3 scraper logic unchanged
- `requirements.txt` unchanged
- `.gitattributes` (Git LFS for model weights) unchanged
- `README.md` unchanged

---

## Current State

The app is fully functional on HF Spaces:
- Scraper fetches YouTube live chat via YouTube Data API v3
- API key read from HF Spaces secret `YOUTUBE_API_KEY`
- Backlog messages stored immediately on start (with placeholder sentiment)
- New messages processed with full ML inference
- SQLite ensures scraper and UI share data across all worker processes
- Dashboard displays all analytics once messages are in the store