AI & ML interests

Digital Humans, Computer Vision, Edge AI, Physical AI, Synthetic Data Generation.

Recent Activity

cahlen  updated a Space 28 days ago
enfuse/README
cahlen  published a model about 2 months ago
enfuse/smol-tools-4b-32k
cahlen  published a model about 2 months ago
enfuse/smol-tools-4b-16k
View all activity

cahlen 
posted an update about 9 hours ago
view post
Post
28
So I built a multimodal video annotation pipeline in my spare time, as you do.

corpus-mill turns any long-form video with people on camera into a time-aligned event corpus across audio, vision, OCR, faces, brand observations, music, and clip-worthy moments. Runs entirely on local GPU because — and I cannot stress this enough — your footage has no business being on someone else's servers.

The honest origin: I needed real multimodal supervision data, the public corpora are weirdly thin once you need per-frame / per-speaker / per-second labels with provenance, so I built one. Then it grew. Then I looked up and it was 30K LOC and ~30 stages and I thought, ok, maybe other people would want this.

Stack is the usual suspects: Whisper-large-v3 (faster-whisper), pyannote-3.1 (which secretly drags in 433 NeMo modules — surprise!), Qwen2.5-VL-7B for vision/OCR/shoppable detection, dlib + YuNet for faces, qwen2.5:7b / qwen3:14b via local Ollama for the LLM passes, chromaprint + PDQ for fingerprinting. Outputs as Parquet + SQLite. Apache 2.0.

There's a Docker compose that works, after I spent a day discovering that faster-whisper wants CUDA 12 cuBLAS while pyannote 4 wants CUDA 13, and the answer is "install both, point LD_LIBRARY_PATH at the cu12 wheels, ship it." That's now baked in. You're welcome.

Spare-time project, bugs are real, fixing them for your specific footage is on you. If you're training multimodal models and want a corpus pipeline you fully control on-prem, this might save you months. If not, the README is at least mildly entertaining.

https://github.com/cahlen/corpus-mill
cahlen 
posted an update 16 days ago
view post
Post
262
In the spirit of "just making shit"—because, frankly, you can these days—I decided to get back to basics. I built a lightweight RL MLP powered by WebGPU that runs directly in your browser.

The twist? I replaced the standard MLP with a Continued Fractions network. The real win here is interpretability; by applying a Taylor expansion to the continued fractions, we can actually decompose the output and see exactly which features influenced the outcome. It effectively replaces the usual "it’s just black-box magic" with actual visibility into the logic.

It was a fun experiment. Feel free to clone the repo and make it your own!

cahlen/neuron-runner
https://github.com/cahlen/neuron-runner
CoFrGeNet: Continued Fraction Architectures for Language Generation (2601.21766)
cahlen 
posted an update 22 days ago
view post
Post
2418
Huggingface just enabled cuda kernel repos!! This is crazy cool!

Expect a ton more portable number theory cuda kernels in the near future. I'm going to have a hell of a lot of fun with this new feature.

Appreciate it huggingface!

https://huggingface.co/kernels

  • 1 reply
·
cahlen 
updated a Space 28 days ago
cahlen 
posted an update 29 days ago
view post
Post
230
I was in the mood to fine tune a model, you know since we can just do shit these days, but I've also been on a pure math kick lately going back to some of my old academic stuff. And I remember specifically back then thinking about how difficult it was to narrow down a problem worth working on. This is hopefully help narrow that gap, at least for some niche parts of number theory.

Convergent is a model that knows number theory and computational mathematics. But I've also trained agentic tool calling loops, using prebuilt tools advertised on my computational number theory project website https://bigcompute.science

So essentially, you can use this to talk about unsolved problems, ways to attack them, how to write the cuda kernels to investigate them on your local GPU, or come up with brand new conjectures!

But most importantly it can zero you in on something that might be more interesting, especially if you have a model to talk with the problem about. I'm aware their aren't a ton of number theorists that just hangout together... heh.

So I'm sure there are bugs. Just post on github or huggingface and I'll be able to get to them if I have time.

Everything open source. Model weights, training data, training scripts. Have at it.

cahlen/Convergent-7B
cahlen/Convergent-7B-data
https://github.com/cahlen/convergent
https://bigcompute.science
https://mcp.bigcompute.science/mcp
cahlen 
posted an update about 1 month ago
view post
Post
286
Speaking of just being able to make shit these days.

4,753 WiFi networks. 1,544 BLE devices. 41 IR signal patterns. 117 kilometers. 257 open networks. 125 printers exposing WiFi Direct. 163 Samsung SmartTags. 121 separated AirTags. A Mercedes-Benz MBUX infotainment system advertising to anyone in range.

That's from just one leisurely evening drive in socal.

All classified, mapped, and queryable. From a device that fits in your pocket.

ESP32 Marauder + Flipper Zero + DGX Spark running OpenClaw with a local Qwen 3.5 35B abliterated model. All AI processing on-prem. No cloud APIs, no token burn. If you can run it with a local LLM, do so. Free inference forever.

Throw it in your bag and drive. Every 20 seconds — full sensor rotation. WiFi, BLE, infrared, SubGHz. GPS-tagged, timestamped, saved to SD. When WiFi is up, it fires data home. The device never waits. It just keeps scanning.

Two ESP32 chips plugged into a Flipper Zero. White chip does WiFi and BLE. Orange chip does IR and SubGHz. Both scan in parallel. Two sets of hands, one brain.

It classifies every device from raw BLE bytes — AirTags, SmartTags, printers, iBeacons. Builds a persistent knowledge graph. Filters out your own devices automatically. Plots everything on a live map with an AI chat panel. Ask it to highlight all separated AirTags and it does. OpenClaw does the heavy lifting, I just emit data to a sensor garbage disposal type API endpoint.

What interests me most is what happens over time. Same routes, different days. Devices that show up Monday but not Friday. Networks that vanish. Trackers that migrate. The ontology never rotates.

I'm probably never going to release this. But you could probably build it yourself.

Casually I call it 'MarauderClaw' because I added a full agentic action mode where OpenClaw can take over and control the flipper zero and Marauder entirely, but that just kinda seemed like a bad idea, so nah on that one boss.

Favorite WiFi name in the wild so far: Wu-Tang-Lan.
  • 1 reply
·
cahlen 
posted an update about 2 months ago
view post
Post
3089
It’s wild to me how you can just make shit now.

You can take a weekend with a raspberry pi 5, a pi camera, a 3d printer, and a smidgen of custom fine tuning (wakeword, whisper, tinybert, and pipertts) and you have physical device as a talking personal assistant.

What a time to be alive.

Edge ai, physical ai, ai augmented animatronics… tiny models. Tiny agents.

Going to be a wild year.
  • 6 replies
·