Starting this collection to gather models, spaces, dataset or even papers related to disability. Feel free to ping me if you see something relevant to add
🚀 How The Washington Post Uses AI to Empower Journalists 🔍📰
An exciting new example in the world of AI-assisted journalism! The Post has developed an internal tool called "Hayatacker" that's enhancing in-depth reporting. Here's why it matters:
🎥 What it does: • Extracts stills from video files • Processes on-screen text • Labels objects in images
🗳️ First big project: Analyzed 745 Republican campaign ads on immigration (Jan-Jun 2024)
🤝 Human-AI collaboration: • AI extracts and organizes data • Reporters verify and analyze findings
🔎 Thorough approach: • Manual review of all 745 ads • Reverse image searches when context is lacking • Cross-referencing with AdImpact transcripts
💡 Key insight from WaPo's Senior Editor for AI strategy Phoebe Connelly: "The more exciting choice is putting AI in the hands of reporters early on in the process."
This tool showcases how AI can augment journalistic capabilities without replacing human insight and verification. It's a powerful example of technology enhancing, not replacing, traditional reporting skills.
- Figure’s new humanoid robot leverages OpenAI for natural speech conversations Figure has unveiled its latest humanoid robot, the Figure 02. The most notable addition this time out arrives by way a longstanding partnership with OpenAI, which helped Figure raise a $675 million Series B back in February, valuing the South Bay firm at $2.6 billion. https://techcrunch.com/2024/08/06/figures-new-humanoid-robot-leverages-openai-for-natural-speech-conversations/
- World’s Five Leading Chipmakers Have Now Promised U.S. Investment The Biden administration award up to $450 million in grants to a South Korean chipmaker, SK Hynix, to help build its new chip facility in Indiana The US now has commitments from all five of the world’s leading-edge semiconductor manufacturers to construct chip plants in theUS with financial assistance from the administration https://www.nytimes.com/2024/08/06/business/economy/chipmakers-promise-investment.html
LLMs are only as good as the data they have been trained on, but the crucial aspect of pretraining data remains obscure. Our approach lifts the veil on building high-quality pretraining datasets by sharing every detail about this process to enable a wider community to build on top of it.
- The FineWeb-Edu dataset, which outperforms all openly accessible web datasets in a number of educational benchmarks. We built it by developing a quality classifier using annotations generated by an LLM.
- A new technical report explaining in detail how to create a large and high-quality web-scale dataset for LLM pretraining such as FineWeb