LivePulse — Session Changelog
Date: April 16, 2026
Session: HF Spaces Deployment Debugging & Fixes
Summary
This session was entirely focused on getting the deployed LivePulse app on Hugging Face Spaces (huggingface.co/spaces/Divyonko/LivePulse) to actually work end-to-end — from scraping YouTube live chat to displaying analytics in the dashboard.
Issues Found & Fixed (in order)
1. Missing return None in _get_live_chat_id
File: app.py
Problem: The except block in _get_live_chat_id was missing return None, meaning on an exception the function could fall through with undefined behavior.
Fix: Added explicit return None in the except block.
2. No logging output visible in HF Spaces logs
File: app.py
Problem: Python's root logger defaults to WARNING level. All our logger.info() calls were silently dropped — nothing useful appeared in the logs.
Fix: Added logging.basicConfig(level=logging.INFO, force=True) so all INFO and above messages appear in HF Spaces logs.
3. Torchvision warnings flooding the logs
File: Dockerfile
Problem: Streamlit's file watcher scans all imported modules including transformers, which tries to import torchvision (not installed). This produced hundreds of ModuleNotFoundError: No module named 'torchvision' lines, making real errors impossible to find.
Fix: Added ENV STREAMLIT_SERVER_FILE_WATCHER_TYPE=none to the Dockerfile to disable the file watcher entirely.
4. Improved HTTP error logging in _get_live_chat_id
File: app.py
Problem: Generic except Exception swallowed the actual YouTube API error body (e.g. "API key invalid", "quota exceeded").
Fix: Added a separate urllib.error.HTTPError handler that reads and logs the full error response body, making API failures immediately diagnosable.
5. API key presence logging
File: app.py
Problem: No way to confirm whether the YOUTUBE_API_KEY secret was actually being read from HF Spaces environment.
Fix: Added logger.info("YOUTUBE_API_KEY present: %s (length=%d)", ...) at scraper thread start.
6. Chat message fetch logging
File: app.py
Problem: No confirmation that liveChat/messages API calls were succeeding.
Fix: Added logger.info("Fetched %d chat messages ...") after each successful API poll.
7. @st.cache_data on load_stream_data returning stale empty results
File: app.py
Problem: load_stream_data was decorated with @st.cache_data(ttl=5). The cache key was just redis_key (a constant string), so it cached the first result (empty list) and kept returning it even after the scraper had written messages. Attempted fix with _store_len cache-busting parameter failed because Streamlit ignores parameters prefixed with _ for hashing purposes.
Fix: Removed @st.cache_data entirely from load_stream_data. Since the store is in-memory (later SQLite), there is zero I/O cost to reading it directly on every rerun.
8. Scraper thread blocking on ML inference for 60+ backlog messages
File: app.py
Problem: On startup, the YouTube API returns a backlog of 50-70 messages from the last few minutes. The scraper was running full ML inference (MuRIL + XLM-R + BART = 3 models × 60 messages = 180 forward passes on CPU) before writing a single message to the store. This took several minutes, during which the UI showed "No messages yet" and users kept clicking Start again, killing and restarting the thread.
Fix: Added is_first_page flag. On the first API page (backlog), messages are stored immediately with Neutral/General placeholder sentiment so the UI shows data within seconds. Full ML inference only runs on subsequent pages (new live messages, typically 5-15 at a time).
9. Per-message ML inference error logging
File: app.py
Problem: If predict_sentiment or predict_topic threw an exception for a specific message, it was silently caught by _safe_sentiment/_safe_topic with no indication of which message failed or why.
Fix: Added explicit try/except with logger.error("ML inference failed for text=%r: %s", ...) around each message's inference call in the scraper loop.
10. Root cause: In-memory store not shared across Streamlit worker processes
File: app.py
Problem: This was the fundamental bug causing "No messages yet" despite the scraper working correctly. HF Spaces runs Streamlit with multiple worker processes. The scraper thread ran in worker process A and wrote to _STORE (a Python dict in that process's RAM). Browser requests were served by worker process B, which had its own separate empty _STORE. The two processes never shared memory — the UI always saw zero messages regardless of how many the scraper had collected.
Fix: Replaced the entire in-memory deque-based store with SQLite at /tmp/livepulse.db. SQLite is a file on disk that all worker processes in the container share. The scraper writes to it; any worker serving the UI reads from the same file. All store functions (store_rpush, store_lrange, store_llen, store_delete) were rewritten to use SQLite queries with a threading lock.
Files Changed
| File | Changes |
|---|---|
app.py |
SQLite store, logging setup, backlog fix, cache removal, HTTP error handling, return None fix |
Dockerfile |
Added STREAMLIT_SERVER_FILE_WATCHER_TYPE=none |
What Was NOT Changed
- All dashboard features preserved: charts, alerts, word cloud, engagement score, leaderboard, multi-stream comparison, pinned messages, sentiment heatmap, topic distribution, confidence trend, CSV export
- ML models unchanged: MuRIL + XLM-R + BART ensemble still runs on new messages
- YouTube Data API v3 scraper logic unchanged
requirements.txtunchanged.gitattributes(Git LFS for model weights) unchangedREADME.mdunchanged
Current State
The app is fully functional on HF Spaces:
- Scraper fetches YouTube live chat via YouTube Data API v3
- API key read from HF Spaces secret
YOUTUBE_API_KEY - Backlog messages stored immediately on start (with placeholder sentiment)
- New messages processed with full ML inference
- SQLite ensures scraper and UI share data across all worker processes
- Dashboard displays all analytics once messages are in the store