AI wasn't helped in making hands with a particular number of fingers without a coherent way of achieving that from the start and a bunch of rather unfiltered data. It's getting better in many regards. For example, AI should have a rather good understanding of anatomy, and should probably start with bones of various animals, like anatomy courses for human students. Then, you could tell it the numbers of fingers, or to make a wing of a particular type. I'm sure lots of people have mentioned this before, but it's a good example.
Joseph Robert Turcotte PRO
AI & ML interests
Recent Activity
Organizations
Some time ago, I came across a research analysis from two investors at a16z. In the past year of 2025, ChatGPT actually tried to promote some new AI functions in fields such as shopping, but in fact, the effect was not good.
I think the fundamental reason lies in the user's mindset, or rather, the user's interaction logic in vertical fields. The most prominent and distinctive feature of ChatGPT is that all-encompassing dialogue box, which is also a common problem with many homogeneous AI products nowadays (it seems that without a dialogue box, the AI's capabilities are sealed off).Although it can be adapted to many scenario fields, it will appear very boring in more vertical scenarios
Ask yourself, would you prefer the image-text waterfall flow interaction in shopping scenarios like Xiaohongshu, or the monotonous search box of ChatGPT? The answer is actually obvious from the start.
For all vertical scenarios, the interaction logic was already very well-developed before the emergence of AI. The user experience brought by such interaction logic is definitely not something that a single dialogue box can replace.
And if we want to create a good AI product in a vertical field, we should think more about how to silently embed the powerful capabilities of AI into the original interaction, and continuously iterate to provide users with a better experience.@lilianweng@clem@AdinaY
That's an interesting way of accomplishing it. I would assume enough filtering would accomplish much the same with huge datasets for multi-modal AI, but focusing primarily on improving and even simplify existing scenarios with AI should be largely the goal of smol models and similar schemes to reduce the size and complexity of interactions while preserving precision and value. The focus could and should help in particular scenarios, but I admittedly don't know the particulars well enough, so I'd be largely speculating on what others may or may not be able to accomplish with current limitations.
tegridydev/research-papers
Currently building out the foundation topics and raw .pdf research paper files
Will be processing and cleaning up and converting into high quality training datasets
Check it out, give it a like and leave a comment below or join community discussion and suggest what fields and research topics you want to see included!
Jan 27th just got interesting on Open-source AI modles.
β Kimi K2.5: How to make models "think" across text and vision natively?
moonshotai/Kimi-K2.5
β DeepSeek-OCR 2: How to make models "see" more like humans, not scanners?
deepseek-ai/DeepSeek-OCR-2
One focuses on depth of reasoning, the other on precision of vision.
What's the key differentiator for a multimodal model in your view: raw power or computational elegance?
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/
The model:
β Scraped my GitHub repos automatically
β Pulled my experience from LinkedIn
β Designed an Aurora Glass theme
β Mapped every skill to projects
β Added animations I'd never code myself
Version 2.0 is under intensive development. The next version will support Skills, making it easier for users to create their own expert agents to solve various challenging and complex real-world problems.
Meanwhile, we are also exploring whether we can automatically generate high-quality expert Skills after a task is completed, reducing the difficulty of writing Skills and letting the LoongFlow framework automatically output the best Skills for challenging scenarios!
I hope we can learn to agree in the future that anything with a degree of confidence is trying at an art rather than pure science, but that it will highlight how much of our work was art anyway.
It shouldn't have to explain to you why it's better. Better should be a rather floaty standard, as in Stable Diffusion. In general, better means better in all metrics, but if you need specific metrics, it should know to keep all betterment that fits rather general standards better. I don't think it does. If it can know it's better, it shouldn't have to explain that it's done that, no matter if you prefer black boxes or not. And this is how it goes: Electronics engineers feed models their goals, and they produce circuits in shapes they can't explain, but that work. This is happening now. Most of our science is of this type, where we simply accept that things fit without knowing fully why.
What happens when you tell something like Stable Diffusion to build a human hand without any advice on what actually composes a human hand and a bunch of biomimetic parts with materials science to choose from? When does it invent a better hand accidentally?
Article: https://robonine.com/increasing-the-structural-rigidity-of-the-manipulator/
If I spend enough time, I should be able make a bot from scratch that's 20 times smaller, using the same code and structure. Why am I saying this? Am I going to do it? No, but everyone should know that most of what people do is throwing a ton of information at bots to work out for themselves algorithmically. We need more experimental bots, particularly to skip a few steps toward getting the same answer. So, I'm always glad to see work of this sort, whether it's trying different datasets with different LLMs, or whatever.
Repo: raincandy-u/Rain-100M
Data: HuggingFaceFW/fineweb-edu, ~3B tokens, English only
Tokenizer: custom 16k BPE, context length 4096
Architecture: 12 Transformer layers, hidden size 768, 12 heads, MLP 2048, SiLU, bf16
Rain-100M is a raw base model (not instruction-tuned or safety-aligned), aimed at small-scale research, debugging training pipelines, and CPU/edge experiments. If you run evaluations, finetunes, or visualizations with it, I would be very interested in your results!
The visual effects of this model are simply beyond imagination itβs every bit as good as NanoBanana, no compromise at all.
I fine-tuned my micro-scene prompts by adding text overlays and background effects, and its adaptability is truly breathtaking. With just one prompt, you can generate scene posters for any movie or novel.
Every detail, from scene design to text style and atmospheric effects, perfectly aligns with the tone of the original material.
No forced elements, just seamless, film-grade visual effects that exactly match what I envisioned.
π Repo: https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=Hunyuan-Image-3.0-Instruct
- Guardpoint is our new medical reasoning model; trained on medical knowledge, management, diagnosis, and tasks from DeepSeek-V3.2-Speciale!
- Structured medical reasoning responses are efficient and informative, cutting token costs for faster inference!
- Wide-ranging knowledge base: trained on a wide variety of medical disciplines, patient types, and query structures!
- High quality medical responses emphasize performance, brevity, specificity, statistical rationality, and openness.
Get it now:
Guardpoint for gpt-oss-120b: ValiantLabs/gpt-oss-120b-Guardpoint
Guardpoint for gpt-oss-20b: ValiantLabs/gpt-oss-20b-Guardpoint
Powered by our new structured medical reasoning dataset: sequelbox/Superpotion-DeepSeek-V3.2-Speciale
Guardpoint is also available for Qwen 3:
Guardpoint for Qwen 3 32B: ValiantLabs/Qwen3-32B-Guardpoint
Guardpoint for Qwen 3 14B: ValiantLabs/Qwen3-14B-Guardpoint
We've been working hard on Guardpoint; we're really excited to share it with everyone! It's also our best finetune so far for gpt-oss. Try it out and see what you think!
We'll be bringing Guardpoint, Shining Valiant, and Esper to more models soon, along with further experimental releases. We're planning to do a lot with Deepseek's upcoming release; it should unlock a lot of new possibilities for specialist and experimental models!
Get our experimental models: https://huggingface.co/collections/sequelbox/experimental-reasoning-models
Get our reasoning datasets: https://huggingface.co/collections/sequelbox/reasoning-datasets
Help support our releases, donations used for our experimental models and datasets: sequelbox/SupportOpenSource
Fight for open source with us!
love,
allegra
When does the planned context become the signifier of that context in the code itself? When something is stable in code. Even having to recover, or being able to, means it's storing far too much about context without getting to the context itself. All language needs the same simplification. Or maybe I just don't see reflexivity in AI yet. Maybe I don't see it building itself with awareness of what it is to others, unlike NASNet.
The "Janus Interface" paper details a new attack that could recover forgotten PII through fine-tuning APIs. This is a solution-oriented paper because it highlights a problem that needs fixing.
Testing such a high-stakes attack requires equally high-stakes data. The Ai4Privacy 300k dataset was a key part of their evaluation, providing a testbed for extracting sensitive Social Security Numbers. Our dataset, with its synthetic structured SSN data, helped the researchers at Indiana University, Stanford & CISPA, and others demonstrate that their attack works on more than just emails. It could affect highly sensitive personal identifiers.
We're excited to see our open-source dataset used in such cutting-edge security research. It's a win for the community when researchers can use our resources to stress-test the safety of modern AI systems. This work is a direct and explicit call for stronger protections on fine-tuning interfaces.
π This is why open data for security research is so important. Check out the full paper: https://arxiv.org/pdf/2310.15469
π Stay updated on the latest in privacy-preserving AIβfollow us on LinkedIn: https://www.linkedin.com/company/ai4privacy/posts/
I wrote a deep dive into how Magic AI's 100M token context window might work, starting from their HashHop benchmark and building up to MALM - a Memory-Augmented Language Model.
Key insight: treating each key as a single token enables perfect retrieval at unlimited context lengths.
The article covers:
- How HashHop works and why its perfect accuracy is suspicious
- Building a tokenized solver that achieves 100% accuracy
- Scaling to MALM for real code search tasks
- Why this approach could handle 100M+ tokens
Read the full article: https://huggingface.co/blog/codelion/reverse-engineering-magic-hashhop
Try the model: codelion/malm-165m
Code: https://github.com/codelion/hash-hop
Why it had to be done π
PyTorch's Dynamo compiler is increasingly becoming the default interoperability layer for ML systems. Anything that relies on torch.export or torch.compile, from model optimization to cross-framework integrations, benefits directly when models can be captured as a single dynamo-traced graph !
Transformers models are now easier to:
βοΈ Compile end-to-end with torch.compile backends
π¦ Export reliably via torch.export and torch.onnx.export
π Deploy to ONNX / ONNX Runtime, Intel Corporation's OpenVINO, NVIDIA AutoDeploy (TRT-LLM), AMD's Quark, Meta's Executorch and more hardware-specific runtimes.
This work aims at unblocking entire TorchDynamo-based toolchains that rely on exporting Transformers across runtimes and accelerators.
We are doubling down on Transformers commitment to be a first-class citizen of the PyTorch ecosystem, more exportable, more optimizable, and easier to deploy everywhere.
There are definitely some edge-cases that we still haven't addressed so don't hesitate to try compiling / exporting your favorite transformers and to open issues / PRs.
PR in the comments ! More updates coming coming soon !