Thanks for sharing this!
Feynman Innovations
ajibawa-2023
AI & ML interests
LLM, RL, DL, ML, AGI. Developing LLMs (preferably fully fine tuned ) for various use cases.
Recent Activity
updated a dataset 8 days ago
ajibawa-2023/Test repliedto their post 10 days ago
Go-Code-Large
Dataset: https://huggingface.co/datasets/ajibawa-2023/Go-Code-Large
Go-Code-Large is a large-scale corpus of Go (Golang) programming language source code, comprising 316,427 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, cloud-native systems, and modern backend software engineering.
By offering a focused and curated dataset for Go, this corpus enables experimentation in concurrent programming, distributed systems, and performance-oriented backend services—domains where Go is widely adopted.
Go-Code-Large addresses the relative scarcity of large, language-specific datasets for Go, enabling targeted research into idiomatic Go patterns, concurrency primitives, and scalable system design. reacted to theirpost with 👍 11 days ago
Ruby-Code-Large
Dataset : https://huggingface.co/datasets/ajibawa-2023/Ruby-Code-Large
Ruby-Code-Large is a large-scale corpus of Ruby programming language source code comprising 331,743 code samples stored in .jsonl format. The dataset is designed to support research and development in large language model (LLM) pretraining, static analysis, web application development, and software engineering automation within the Ruby ecosystem.
By offering a substantial, language-focused dataset, Ruby-Code-Large enables targeted experimentation in dynamic programming, object-oriented design, and rapid application development—areas where Ruby is widely used, particularly in web frameworks and scripting.
Ruby-Code-Large addresses the lack of large, curated, Ruby-specific datasets, enabling focused research on expressive syntax, metaprogramming, and high-level abstractions.