deepghs (DeepGHS)

posted an update 3 days ago

Post

2126

Introducing the D.Markdown Experimental Models, Proxima and Epsilon OCR models, built on top of Qwen3-VL and Qwen2.5-VL respectively. Proxima is optimized for Markdown generation and is capable of embedding inline programming code snippets and generating rich nodes such as HTML, XML, JSON, and YAML. Epsilon is optimized for reconstructing complex layouts including tables, forms, and mathematical content. 🌌✨

● proxima-ocr-d.markdown-post3.0.l: prithivMLmods/proxima-ocr-d.markdown-post3.0.l
● epsilon-ocr-d.markdown-post3.0.m: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m
● proxima-ocr-d.markdown-post3.0.l-gguf: prithivMLmods/proxima-ocr-d.markdown-post3.0.l-GGUF
● epsilon-ocr-d.markdown-post3.0.m-gguf: prithivMLmods/epsilon-ocr-d.markdown-post3.0.m-GGUF

● Collection: https://huggingface.co/collections/prithivMLmods/dynamic-markdowns
● Multimodal Apps: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

👉 These models are stage progression models, and currently they may contain artifacts.

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 4 days ago

Post

1047

Try CUA GUI Operator 🖥️ Space, the demo of some interesting multimodal ultra-compact Computer Use Agent (CUA) models in a single app, including Fara-7B, UI-TARS-1.5-7B, and Holo models, to perform GUI localization tasks.

● CUA-GUI-Operator [Demo]: prithivMLmods/CUA-GUI-Operator
● Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Other related multimodal spaces

● Qwen3-VL: prithivMLmods/Qwen3-VL-HF-Demo
● Multimodal-VLM-v1.0: prithivMLmods/Multimodal-VLM-v1.0
● Vision-to-VibeVoice-en: prithivMLmods/Vision-to-VibeVoice-en

I have planned to add Chrome sandboxes to streamline it and turn it into a browser based CUA multimodal tool, which will be added to the same space soon.

To know more about it, visit the app page or the respective model page!

1 reply

·

prithivMLmods

posted an update 6 days ago

Post

3511

One speech model with seven voices, streamlined with multimodal capabilities for vision tasks. Performs vision(image-text) to audio inference with Qwen2.5-VL + VibeVoice-Realtime-0.5B. Vision to VibeVoice (EN) - The demo is live. 🗣️🔥

🤗 Vision-to-VibeVoice-en [Demo]: prithivMLmods/Vision-to-VibeVoice-en
✨ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨ Speech [VibeVoice-Realtime-0.5B]: microsoft/VibeVoice-Realtime-0.5B
✨ Vision [Qwen2.5-VL]: Qwen/Qwen2.5-VL-7B-Instruct

To know more about it, visit the app page or the respective model page!

6 replies

·

wangsssssss

authored a paper 7 days ago

Flowing Backwards: Improving Normalizing Flows via Reverse Representation Alignment

Paper • 2511.22345 • Published 14 days ago • 12

prithivMLmods

posted an update 10 days ago

Post

3678

Hello everyone,

The

strangerzonehf [HF] Community / Organization Page, which is maintained by me, has reached the Top 10 Developer Pages ranking at 6th place, contributing 3.4% in the calendar cycle from August 2024 to August 2025. It is also the only South Asia / Indian page in the list. I could not be more proud to be doing things for the community. ❤️🤗

Source: https://www.dataprovenance.org/economies-of-open-intelligence.pdf

It is a pleasure to be a part of it.
Thank you!
@prithivMLmods

AbstractPhil

posted an update 12 days ago

Post

321

Many updates. Cantor route experiments, GeoViT-david-beans 75% test standalone cifar100 geofractal 30m encoder. MultiHeaded Cantor Attention heavily optimized. The migration is primarily complete between geofractal and geovocab2.
https://github.com/AbstractEyes/geofractal/blob/main/src/geofractal/model/david_beans/model.py
Cantor route staircase and wormhole excavation findings posted. A full article will be posted to represent the findings of cantor routing and the potentials for self-learning fractals through loss.
https://github.com/AbstractEyes/lattice_vocabulary/blob/master/src/geovocab2/proofs/cantor_steps_experiments.md
The steps experiments show profoundly important implications for cross-contamination problems with fractal and linear spaces, with some currently assessed as useful utilities as of today.
Today the classification experiment will continue by using mini-experts applied to patches within a miniature david-beans. The mini-experts were an accident that showed improvement to the fidelity and not destruction, so those experiments are to be continued. geovit-david-beans trainer was added to the first repo.

1 reply

·

prithivMLmods

posted an update 14 days ago

Post

10626

Introducing the Super-OCRs Demo, a comparison of state-of-the-art multimodal OCR VLMs, including HunyuanOCR, DeepSeekOCR, Dots, and Nanonets in one space for performing OCR, rendering LaTeX and Markdown, and visual grounding (layout). Find the related Spaces and models below.🤗🔥

✨Super-OCRs[Demo]: prithivMLmods/Super-OCRs-Demo
✨Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
✨GitHub: https://github.com/PRITHIVSAKTHIUR/Super-OCRs-Demo

⭐ Models Used:
✦ HunyuanOCR: tencent/HunyuanOCR
✦ DeepSeek-OCR: (-) deepseek-ai/DeepSeek-OCR (+) prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
✦ Dots.OCR: (-) rednote-hilab/dots.ocr (+) prithivMLmods/Dots.OCR-Latest-BF16
✦ Nanonets-OCR2-3B: nanonets/Nanonets-OCR2-3B

⭐ Some Other Relevant Apps:
✦ Qwen3-VL-HF-Demo: prithivMLmods/Qwen3-VL-HF-Demo
✦ Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
✦ Multimodal-OCR: prithivMLmods/Multimodal-OCR
✦ Multimodal-OCR2: prithivMLmods/Multimodal-OCR2
✦ Multimodal-OCR3: prithivMLmods/Multimodal-OCR3
✦ DeepSeek-OCR-experimental: prithivMLmods/DeepSeek-OCR-experimental

To know more about it, visit the app page or the respective model page!

wangsssssss

authored a paper 16 days ago

DeCo: Frequency-Decoupled Pixel Diffusion for End-to-End Image Generation

Paper • 2511.19365 • Published 17 days ago • 63

AbstractPhil

posted an update 17 days ago

Post

291

For those using my geovocab2 repo for SimplexFactory, CantorRouteFactory, fusion modulations, model code import, training weights and models, or specific extraction systems; I will be refactoring in the coming days.
The new repo for all geometric, cantor, and fractal-based trainings will be;
https://github.com/AbstractEyes/geofractal
The change is due to MY own excessive abuse of the vocabulary repo and the excessive overuse of subfolders attached to a working pycharm project. These behaviors should be decoupled and I apologize for making such code bloat through experimentation.

Directly installing the geofractal repo will install geovocab2 as a sidecar. However, there will be a clause within the geovocab2 to warn the user.

You have my deepest and most sincere apologies for breaking your active working code if I do. I know this is difficult work so please bare with my efforts as I progress the codebase to it's next state of truth vs experimentation.

Please, reach out to me directly if you have problems converting.

It is meant to be a DIRECT and UTILIZABLE pain-free conversion that will enable the same interface from both geovocab2 and all future updated model code changes applied to geofractal - once the geofractal module is imported.
The original goevocab2 will contain outdated train code instead of full deprecation with a direct warning - and the geovocab2 repo will be folding in geovocab and geovocab2 into matching aliased systems - allowing the factory and extraction structure to behave within geovocab2 and training to behave within geofractal by design.

I will be introducing a direct alias system that will hopefully allow a smooth transition system to the new codebase, but there's never a way to account for those you don't know are using your work. This will include pyi files for the aliases and some necessary elemental additions that may break current functionality in systems I'm unaware of. Please reach out if I break something crucial that you require.

prithivMLmods

posted an update 18 days ago

Post

3202

Introducing the advanced sketch-board editor "Nano-Banana-Pro-Sketch-Board" powered by the Gemini 2.5 Flash Image and Gemini 3 Pro Preview Image models through the Gemini API. This version includes more features than the Nano-Banana-AIO app for drawing and prompt-based concept transformation of freestyle sketches. 🔥🍌

✨Nano-Banana-Pro-Sketch-Board: prithivMLmods/Nano-Banana-Pro-Sketch-Board
✨Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
✨Github: https://github.com/PRITHIVSAKTHIUR/Nano-Banana-Pro-Sketch-Board
✨Model-Garden: https://tinyurl.com/4xxs9dvy

Some Other Relevant Apps [OSS]

⭐Qwen-Image-Edit-2509-LoRAs-Fast-Fusion: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⭐Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast
⭐Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⭐Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2

Note: The Nano-Banana-Pro-Sketch-Board demo requires a Gemini API key for the editing process. Your API key will be removed when the app is reloaded or closed. Your key remains safe and will not be exposed to any medium. Also, the Gemini 3 Pro Preview Image model may require a paid API key from a Google Cloud project with billing enabled.

To know more about it, visit the app info section or the respective Model Garden page!

prithivMLmods

posted an update 19 days ago

Post

1307

Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 22 days ago

Post

1488

Try the all-new trending Qwen-Image-Edit-2509 (Multi-Image-Edits) specialized adapter demos, including Cloth-Design-Fuse, Texture Edit, Guided-Objects-Patching, and more — all in a single Hugging Face Space. The demo link is provided below. 🤗🔥

⮞ Space[Demo]: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast-Fusion
⮞ Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
⮞ Base Model: Qwen/Qwen-Image-Edit-2509

Similar applications↗️

⮞ Kontext-Photo-Mate-v2: https://huggingface.co/spaces/prithivMLmods/Kontext-Photo-Mate-v2
⮞ Photo-Mate-i2i: prithivMLmods/Photo-Mate-i2i
⮞ Qwen-Image-Edit-2509-LoRAs-Fast: prithivMLmods/Qwen-Image-Edit-2509-LoRAs-Fast

To know more about it, visit the app page or the respective model page!

prithivMLmods

posted an update 23 days ago

Post

3509

Made a demo for multimodal understanding of Qwen3-VL space for tasks including point annotation, detection, captioning, guided text inferences, and more. Find the demo link below. 🤗↗️

⮞ Space[Demo]: prithivMLmods/Qwen3-VL-HF-Demo
⮞ Model Used: Qwen/Qwen3-VL-4B-Instruct
⮞ Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
⮞ GitHub: https://github.com/PRITHIVSAKTHIUR/Qwen-3VL-Multimodal-Understanding

To know more about it, visit the app page or the respective model page!