# LivePulse — Session Changelog **Date:** April 16, 2026 **Session:** HF Spaces Deployment Debugging & Fixes --- ## Summary This session was entirely focused on getting the deployed LivePulse app on Hugging Face Spaces (`huggingface.co/spaces/Divyonko/LivePulse`) to actually work end-to-end — from scraping YouTube live chat to displaying analytics in the dashboard. --- ## Issues Found & Fixed (in order) ### 1. Missing `return None` in `_get_live_chat_id` **File:** `app.py` **Problem:** The `except` block in `_get_live_chat_id` was missing `return None`, meaning on an exception the function could fall through with undefined behavior. **Fix:** Added explicit `return None` in the `except` block. --- ### 2. No logging output visible in HF Spaces logs **File:** `app.py` **Problem:** Python's root logger defaults to WARNING level. All our `logger.info()` calls were silently dropped — nothing useful appeared in the logs. **Fix:** Added `logging.basicConfig(level=logging.INFO, force=True)` so all INFO and above messages appear in HF Spaces logs. --- ### 3. Torchvision warnings flooding the logs **File:** `Dockerfile` **Problem:** Streamlit's file watcher scans all imported modules including `transformers`, which tries to import `torchvision` (not installed). This produced hundreds of `ModuleNotFoundError: No module named 'torchvision'` lines, making real errors impossible to find. **Fix:** Added `ENV STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` to the Dockerfile to disable the file watcher entirely. --- ### 4. Improved HTTP error logging in `_get_live_chat_id` **File:** `app.py` **Problem:** Generic `except Exception` swallowed the actual YouTube API error body (e.g. "API key invalid", "quota exceeded"). **Fix:** Added a separate `urllib.error.HTTPError` handler that reads and logs the full error response body, making API failures immediately diagnosable. --- ### 5. API key presence logging **File:** `app.py` **Problem:** No way to confirm whether the `YOUTUBE_API_KEY` secret was actually being read from HF Spaces environment. **Fix:** Added `logger.info("YOUTUBE_API_KEY present: %s (length=%d)", ...)` at scraper thread start. --- ### 6. Chat message fetch logging **File:** `app.py` **Problem:** No confirmation that `liveChat/messages` API calls were succeeding. **Fix:** Added `logger.info("Fetched %d chat messages ...")` after each successful API poll. --- ### 7. `@st.cache_data` on `load_stream_data` returning stale empty results **File:** `app.py` **Problem:** `load_stream_data` was decorated with `@st.cache_data(ttl=5)`. The cache key was just `redis_key` (a constant string), so it cached the first result (empty list) and kept returning it even after the scraper had written messages. Attempted fix with `_store_len` cache-busting parameter failed because Streamlit ignores parameters prefixed with `_` for hashing purposes. **Fix:** Removed `@st.cache_data` entirely from `load_stream_data`. Since the store is in-memory (later SQLite), there is zero I/O cost to reading it directly on every rerun. --- ### 8. Scraper thread blocking on ML inference for 60+ backlog messages **File:** `app.py` **Problem:** On startup, the YouTube API returns a backlog of 50-70 messages from the last few minutes. The scraper was running full ML inference (MuRIL + XLM-R + BART = 3 models × 60 messages = 180 forward passes on CPU) before writing a single message to the store. This took several minutes, during which the UI showed "No messages yet" and users kept clicking Start again, killing and restarting the thread. **Fix:** Added `is_first_page` flag. On the first API page (backlog), messages are stored immediately with `Neutral/General` placeholder sentiment so the UI shows data within seconds. Full ML inference only runs on subsequent pages (new live messages, typically 5-15 at a time). --- ### 9. Per-message ML inference error logging **File:** `app.py` **Problem:** If `predict_sentiment` or `predict_topic` threw an exception for a specific message, it was silently caught by `_safe_sentiment`/`_safe_topic` with no indication of which message failed or why. **Fix:** Added explicit `try/except` with `logger.error("ML inference failed for text=%r: %s", ...)` around each message's inference call in the scraper loop. --- ### 10. Root cause: In-memory store not shared across Streamlit worker processes **File:** `app.py` **Problem:** This was the fundamental bug causing "No messages yet" despite the scraper working correctly. HF Spaces runs Streamlit with multiple worker processes. The scraper thread ran in worker process A and wrote to `_STORE` (a Python `dict` in that process's RAM). Browser requests were served by worker process B, which had its own separate empty `_STORE`. The two processes never shared memory — the UI always saw zero messages regardless of how many the scraper had collected. **Fix:** Replaced the entire in-memory `deque`-based store with **SQLite** at `/tmp/livepulse.db`. SQLite is a file on disk that all worker processes in the container share. The scraper writes to it; any worker serving the UI reads from the same file. All store functions (`store_rpush`, `store_lrange`, `store_llen`, `store_delete`) were rewritten to use SQLite queries with a threading lock. --- ## Files Changed | File | Changes | |------|---------| | `app.py` | SQLite store, logging setup, backlog fix, cache removal, HTTP error handling, `return None` fix | | `Dockerfile` | Added `STREAMLIT_SERVER_FILE_WATCHER_TYPE=none` | --- ## What Was NOT Changed - All dashboard features preserved: charts, alerts, word cloud, engagement score, leaderboard, multi-stream comparison, pinned messages, sentiment heatmap, topic distribution, confidence trend, CSV export - ML models unchanged: MuRIL + XLM-R + BART ensemble still runs on new messages - YouTube Data API v3 scraper logic unchanged - `requirements.txt` unchanged - `.gitattributes` (Git LFS for model weights) unchanged - `README.md` unchanged --- ## Current State The app is fully functional on HF Spaces: - Scraper fetches YouTube live chat via YouTube Data API v3 - API key read from HF Spaces secret `YOUTUBE_API_KEY` - Backlog messages stored immediately on start (with placeholder sentiment) - New messages processed with full ML inference - SQLite ensures scraper and UI share data across all worker processes - Dashboard displays all analytics once messages are in the store