_______ __ _______
| | |.---.-..----.| |--..-----..----. | | |.-----..--.--.--..-----.
| || _ || __|| < | -__|| _| | || -__|| | | ||__ --|
|___|___||___._||____||__|__||_____||__| |__|____||_____||________||_____|
on Gopher (inofficial)
HTML Visit Hacker News on the Web
COMMENT PAGE FOR:
DIR I indexed 669 GB of my GoPro videos using my M1 Max computer and local ML models
zzsshh wrote 3 hours 16 min ago:
Related article on indexing videos but with a local text description
and using Gemma4:
HTML [1]: https://blog.simbastack.com/indexed-a-year-of-video-locally/
LeonardoTolstoy wrote 3 hours 54 min ago:
What models did you use for the stages? I see Qwen2.5-VL-7B-Instruct
mentioned as an advanced option, so I assume maybe
Qwen2.5-VL-3B-Instruct by default (which is what I also use for a lot
of stuff, it is incredibly good at "clean" OCR, but as you maybe
indicate not the best at "describing a scene").
EDITED: I didn't realize Whisper was a local model. I never tried
transcription before, so I had always figured it was a pay model by
OpenAI. I'll have to check it out (although the runtime listed here is
a bit daunting).
For that project I'll say I don't see much degradation in embedding
quality at much much worse quality than 720p (all the way down to
240p), which speeds things up considerably. Although I don't really do
face or object detection, just scene embeddings. To me any process
whereby it would take longer to process the video than watch it is
probably a no go in general. Obviously a challenge for local-first
analysis.
____tom____ wrote 4 hours 15 min ago:
I wonder how long it would take on faster hardware. I have ten times
that much footage, but 67 * 10 hours is a lot of processing.
I might be better off getting something with a beefy GPU on AWS or
Google cloud.
insumanth wrote 6 hours 58 min ago:
I will be doing these things with local LLMs
Take a fast, small and powerful LLM running locally to index my
personal data like images, videos, documents and enrich them and tag
with the enriched metadata.
Want to group by people - Search tagged metadata and group it
What to search an image by description - tagged metadata
What to organize by anything - tagged metadata
This should (hopefully) put an end to my file clutter
nitin_flanker wrote 5 hours 7 min ago:
I am in no way a tech savy person, don't know coding, don't know
networking or AI much either. But I definitely want to have a system
like this. An AI powered gallery / video repository that can help me
find moments, people, colors, objects from 100s of 1000s of files.
Local LLMs sound so cool but I know they won't be easy to setup or
use for common joe like me.
Mashimo wrote 4 hours 42 min ago:
Immich can do part of this. For photos it does lm object detection
and ocr for text. I think for video is currently only the first
frame. It also has face / people detection.
And once set up it's easy to use even for non technical people.
havercosine wrote 8 hours 24 min ago:
Well done! I couldn't understand how you are building reels out of it
via the agent. Is it some sort of AI tool calling that takes image
links and builds a reel via some video editing tool ? Or +/- time delta
around the timestamp returned from the indexed from a given query +
join them together?
iliashad wrote 8 hours 9 min ago:
Thank you! I'm using RAG, I have every video scene indexed
individually in the vector database. When I'm asking the agent, it'll
use an Ollama model to understand the request, use the available
search tool (searching using transcription text, faces, visual, audio
or combined) something like when you use Claude or Chat GPT it'll use
the web search tool to find you info online. Then, I can filter out
video scenes using the Ollama to better present accurate and unique
video scene, then send those video results to Davinci Resolve using
their API to create a video timeline using those video clips
lee_wc wrote 9 hours 16 min ago:
[1] When trying to read this article, the main website was throwing
errors to CloudFlare unfortunately
HTML [1]: https://archive.is/O6CLQ
iliashad wrote 7 hours 44 min ago:
Can you check again ? I'm not sure why it's show a cloudflare error
GreenSalem wrote 11 hours 40 min ago:
A lawyer I know who specialises in rape,
and is excellent at getting the obviously guilty exonerated,
lost a case last year because of GoPro videos.
Her client was recording while committing the abhorrent crime.
The criminal would otherwise have got off.
From my perspective, the GoPro camera produced a good outcome.
Still, one has wonder why anyone to record their criminal actions.
Yiin wrote 11 hours 20 min ago:
word "her" in this context gave me heavy feelings, what makes one to
pick such a career move...
fennecfoxy wrote 9 min ago:
Why? You're being sexist and I hope you can understand why.
GreenSalem wrote 6 hours 49 min ago:
Beggars cant be choosers.
She would rather have done corporate law but did not have the
academic credentials or the networks needed for a job at the likes
of Latham Watkins or White and Case.
Still it is good for society that criminals get the worst lawyers
to defend them.
djmips wrote 8 hours 59 min ago:
$
synergy20 wrote 12 hours 16 min ago:
can vlm be used instead or it's too heavy and slow
Mawr wrote 12 hours 42 min ago:
> Many of the videos I captured amazing moments, and sometimes it's
kind of hard to watch the full videos to get those moments.
Yep. I had the same problem.
> Then, run the frame analysis pipeline [...] I have a face recognition
plugin using my custom faces data, object detection, on-screen text,
shot type, and scene description [...] we will have three vector DB
collections that have all the information about our videos, like video
location metadata, camera name, faces recognized, objects detected,
on-screen text, transcription, description of each scene, and many more
[...] we can get better indexed data if you use the advanced mode
indexing to use the Qwen2.5-VL-7B-Instruct model to understand and
describe your video much better, but at a slower indexing speed
Yeah, uhm... ok :)
If anyone else has a similar problem, the real solution is as follows:
1. When recording, if you witness an interesting moment worth saving
later, press the power button â this will mark the current moment in
the video as a chapter.
2. Find the chapters later when editing and cut them into clips.
3. You're done :)
This has two main benefits over the insanity above:
1. It's trivially simple instead of insanely complex and inefficient.
2. It will reliably catch all the stuff you find interesting, since
you're the one doing the marking.
The downsides:
1. Doesn't work retroactively.
2. It may miss interesting stuff if you miss it at the time as well.
3. Only works for this use case.
4. Nerds won't salivate over your usage of cutting edge tech.
Noumenon72 wrote 11 hours 18 min ago:
What tool has this "press power to mark chapter" feature?
tredre3 wrote 9 hours 32 min ago:
The GoPro, it's called HiLight Tag.
asdfasgasdgasdg wrote 13 hours 35 min ago:
Cool build but the example videos you provide at the end are . . . not
what I would hope for when thinking about the highlights of 2000+
videos of biking? For example the dog barking video only has one scene
repeated two or three times and it's five seconds long?
iliashad wrote 13 hours 31 min ago:
Fair enough, what would like to see as an example video and I would
make it.
For the dog barking videos, those are only the video scenes that I
have a dog barking sound in the video.
I'll keep adding more prompts and example videos, keep an eye for
that
asdfasgasdgasdg wrote 13 hours 10 min ago:
I don't have any preconceptions about specific content I want to
see. I'd just think that so many hours of such cool adventures
would have greater variety. It made me wonder if your AI really did
such a good job of indexing it. It made me think maybe the tech
isn't quite ready yet?
Did you ever visit crazyguyonabike.com? A long time ago I had the
pleasure of following the journey of a friend of a friend of a
friend on that site: [1] Stuff like that I guess?
HTML [1]: https://www.crazyguyonabike.com/doc/?doc_id=2405
PreownedPlaid wrote 13 hours 42 min ago:
this is really cool. was looking to do something similar on mbp 64gb
iliashad wrote 12 hours 40 min ago:
That's really great, thank you!
esjeon wrote 14 hours 52 min ago:
> Then, run the frame analysis pipeline, which will divide the video
into separate video scenes (1s each, or 1fps)
> (â¦)
> Frames analyzed 57,537
Aha, it makes total sense. This number sounds much more reasonable than
â669 GBâ, since the actual total size of processed frames would be
like 10-30 GB.
(Not downplaying anything. Doing-at-home always requires some math on
practicality)
> Total compute time 67h 40m 42s
Iâm just curious tho â is there any paying options that can
accelerate this kind of process? Just spin up GPU instances?
egorfine wrote 4 hours 30 min ago:
Yep. Go to vast.ai, spin up a cheap GPU instance, add a bit of code
to the project and let it run it finish in just a few hours for like
ten bucks.
But it's not as fun as running local model right here on your
computer on your own desk. It feels like magic.
iliashad wrote 14 hours 29 min ago:
> Aha, it makes total sense. This number sounds much more reasonable
than â669 GBâ, since the actual total size of processed frames
would be like 10-30 GB.
The reason why is â669 GBâ is the total raw footage size when I'm
doing the video processing, I downscaled each frame to 720p to make
the video processing much faster and I don't need full original
quality in order to get accurate results (as far as I know and
experiment with).
> Iâm just curious tho â is there any paying options that can
accelerate this kind of process? Just spin up GPU instances?
For now, I found that NVIDIA GPU for example RTX 3060 with 12GB Vram
was much faster than my M1 Max. (still working on optimizing for
speed and accuracy).
ngai_aku wrote 10 hours 25 min ago:
What PAYG providers do people here recommend? Most powerful machine
at home is an M1 MBA (16GB), so I too am interested in short term
options where I can still benefit from the privacy of local models
villgax wrote 6 hours 58 min ago:
Runpod
fennecfoxy wrote 17 min ago:
Seconding runpod.
They were having availability issues with GPUs (of course) but
especially their UI where you'd customise a template only to
try to start a pod, the GPU be unavailable and the UI reset
forcing you to make the changes all over again.
But they have fixed that since, now starting a pod is more from
a live page where as GPU availability status changes it updates
in realtime/if your deploy fails you just try again - your
customised env vars etc are still there.
Plus they also addressed the GPU availability problem as
something they're working to fix and it's understandable seeing
as nobody can get their hands on GPUs atm.
justinram11 wrote 15 hours 40 min ago:
Something I've enjoyed more than I expected is Google and Apple photos
sending me photo memories and compilations of various things in my life
and my kids lives over the last decade.
I'm really bullish on taking more video of my kids, with the thought
that it will become easier and easier for AI to put them into little
compilations I can enjoy later.
mwelpa wrote 4 hours 46 min ago:
I wish I could connect Apple photos to my Spotify account and have
photo memories connected with songs I listened to at the time :)
alias_neo wrote 4 hours 0 min ago:
Music memories are the best.
I booted up my old PS3 from my uni days (20 years ago?) and found
all of the music I had on it because I used it for everything at
the time. Some seriously nostalgic music I'd completely forgotten
about.
goodmythical wrote 12 hours 43 min ago:
You don't mind Google using your kids to train their models and
advertising algorithms?
Years from now they'll be getting "hey look at BIKE BRANDS' NEWEST
CHEAP BIKE REMEMBER WHEN YOU USED TO RIDE BIKE BRAND BIKES"
satvikpendem wrote 11 hours 57 min ago:
I think most people really don't care, and/or will just adblock
those sorts of things when they do arrive.
whattheheckheck wrote 7 min ago:
What about in 10 years when they auto search and label users for
political dissent and likelihood of impact
marci wrote 6 hours 6 min ago:
Don't worry. Most people spend most of their compute time on a
phone, where you're ability to filter ads is way more
enshitified.
JMiao wrote 14 hours 5 min ago:
do you use android and ios, or is there another benefit to having
personal media with both?
dave8088 wrote 9 hours 33 min ago:
I run both on my phone as a lazy (but flawed) backup strategy.
iliashad wrote 14 hours 2 min ago:
Can you please elaborate more?
ngai_aku wrote 10 hours 21 min ago:
I think most people are either in on Google or in on Apple
whereas the OP indicated they have their media stored with both
iliashad wrote 14 hours 49 min ago:
Thatâs good to hear, open source ML models are getting better and
better. I did a small experiment to generate a Spotify year in review
like video here is a preview video
HTML [1]: https://github.com/IliasHad/edit-mind/tree/expirement/year-i...
cake-rusk wrote 15 hours 55 min ago:
I have an RTX 5090 card but it only has 32 GB RAM, can something like
this work on my machine?
iliashad wrote 14 hours 47 min ago:
Yes, and itâll result in much faster results than the ones that I
did with my computer
wferrell wrote 16 hours 58 min ago:
HTML [1]: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vide...
tontonius wrote 17 hours 2 min ago:
if anyone is interested in searching large video collections local and
offline I suggest taking a look at Jumper [1] comes with some nifty
features like NLE- integrations, people search, MCP, API etc
Disclaimer: one of the co-founders
HTML [1]: https://docs.getjumper.io
____tom____ wrote 5 hours 33 min ago:
Your docs say you integrate with Davinci resolve.
Other comments mention davinci resolve has this built in. How would
you compare the two?
dotancohen wrote 16 hours 1 min ago:
The link just timed out for me. I'm in Israel, connecting via
residential WiFi. All other sites that I regularly use connect just
fine.
tontonius wrote 8 hours 0 min ago:
hmm weird works for me.. what about [1] ?
HTML [1]: https://getjumper.io/
dotancohen wrote 6 hours 33 min ago:
They're both working now.
nyxtom wrote 17 hours 13 min ago:
Now this ^^ is an awesome use case!
iliashad wrote 14 hours 18 min ago:
Thank you, would like to know your use case for this kind of project
and which prompt you want to genearte ?
WhitneyLand wrote 17 hours 56 min ago:
Iâd like to see embedding of actual video clips become practical in
this type of workflow.
Frame level embedding it covering a lot, but can miss out on a lot of
action related searches.
iliashad wrote 14 hours 19 min ago:
Sure, I'm using ( [1] ) which can help me understand action like
falling down, because I can provide for example 5 frames (down scaled
to 720p) to understand what is happening in this part of the video
HTML [1]: https://huggingface.co/collections/Qwen/qwen25-vl
asenna wrote 18 hours 14 min ago:
Funny this is almost EXACTLY what I did a few days ago on the same
machine using very similar techniques and was on the front-page of HN
as well: [1] [2] I wasn't familiar with your project though,
interesting stuff.
I'm trying to add more photography related features to Framedex but
yeah there's so much we can do locally, exciting times.
HTML [1]: https://news.ycombinator.com/item?id=48222733
HTML [2]: https://blog.simbastack.com/indexed-a-year-of-video-locally/
iliashad wrote 14 hours 33 min ago:
That's great, I checked your article when it was in front page
because someone mentioned my project in the comments.
Good job for the article and the project. That's great, yes local
models are getting better and better
robrain wrote 18 hours 28 min ago:
DaVinci 21 has indexing built-in (AI IntelliSearch). Not to diminish
the work you did, but this is now available to many users (probably
only Studio users since it has AI in the name)
iliashad wrote 18 hours 25 min ago:
Yes, I didnât look at it. But does it upload your videos to the
cloud or process them locally? And does it allow to provide custom
faces data to help labeling faces in your videos ?
I think Adobe premiere pro have it as well but cloud processed
teovall wrote 18 hours 9 min ago:
The AI features in DaVinci Resolve are all processed locally. It
does not currently have face tagging.
robrain wrote 17 hours 53 min ago:
Havenât tried it yet, and I donât know if it matches OPâs
requirements, but the blurb says âYou can even search for
individual facesâ
HTML [1]: https://www.blackmagicdesign.com/products/davinciresolve...
Schiendelman wrote 6 hours 44 min ago:
This is what took me from free to paid user, and it was well
worth it.
iliashad wrote 18 hours 8 min ago:
Thatâs great to know, thank you!
m3kw9 wrote 18 hours 45 min ago:
Grab frames, lower res, classify, combine meta data. Write to sql
iliashad wrote 18 hours 13 min ago:
Not really. Grab frames, lower res, classify, combine metadata,
transcribe the audio, convert those data (text, visual and audio) to
embedding, save them over a vector DB and SQL DB. Which helped me to
do semantic search, RAG, search using a screenshot of the video to
find the exact the moment in the video plus search using an audio
file as well. And other features unlocked with vector DB
ingvay7 wrote 16 hours 38 min ago:
Really cool work and workflow. strongly prefer this kind of local,
open pipeline that i control over a dependency on Adobe tools and
lock ins.
iliashad wrote 14 hours 17 min ago:
I agree with that, thank you for your feedback. Also, maybe
you're not a video editor and you just wanna search your videos.
The video editing integrations are optional and you have full
control. You can switch between Adobe Premiere Pro, Final cut Pro
or Davinci Resolve
ingvay7 wrote 13 hours 33 min ago:
cannot wait to incorporate this to my workflow. thanks
iliashad wrote 59 min ago:
That's great, would love to hear your feedback then
fl0id wrote 19 hours 14 min ago:
it is possible to use apple gpu with containers. either with podman +
runkit + recent mesa or with recent vllm-metal from docker
HTML [1]: https://www.docker.com/blog/docker-model-runner-vllm-metal-mac...
iliashad wrote 18 hours 19 min ago:
I was looking for a solution for this issue of running docker
containers over MPS and utilizing their GPU power. I think this
project will be the solution for it, Iâll try it very soon and add
support for it. Thank you, much appreciated
WarOnPrivacy wrote 19 hours 18 min ago:
I was surprised to learn that the
M1 Max CPU is an ARM/SoC, comparable to an 11th gen Intel i9
Do I have it right? Would Windows ARM performance be similar for those
cpu?
ref:
HTML [1]: https://www.cpubenchmark.net/compare/4585vs4245/Apple-M1-Max-1...
voidmain0001 wrote 12 hours 15 min ago:
No comparison. M1 Max has 400GB/s RAM bandwidth while Snapdragon X2
Elite, the latest and greatest , has 228GB/s RAM bandwidth.
Rohansi wrote 11 hours 5 min ago:
I don't disagree with your conclusion but the comparison of max
bandwidth between the two SoCs is not enough. Neither of them will
use all of that bandwidth doing AI work because the GPU will be
compute limited. That's why dedicated GPUs perform so significantly
better without having significantly higher bandwidth.
iliashad wrote 18 hours 27 min ago:
To your question, I canât deny or confirm that because I didnât
tried it this project over a Windows machine yet or a machine with
this config
owldown wrote 18 hours 30 min ago:
âComparableâ is maybe true if we are talking about single core
performance, but for memory bandwidth, the M1 Max is about 8 times
faster. Wider bus, lower latency, not even close.
pachouli-please wrote 18 hours 43 min ago:
It's also a bit apples (heh) to oranges for a handful of reasons, but
most impactful
- "unified" ram makes all the system ram available as VRAM
- dedicated ai coaccelerator thingy
Both of these reasons allow the apple silicon chips to crush
conventional cpus in these kind of AI model workload stuffs
No idea about what the windows arm stuff is capable of. I know they
use Qualcomm snapdragon chips though.
iliashad wrote 19 hours 30 min ago:
I would love your feedback and suggestions for new improvements or
features you wanna have, either in the source available version, the
desktop app or blog post itself?
Beijinger wrote 20 hours 54 min ago:
Does it work for porn collections too?
iliashad wrote 18 hours 30 min ago:
Why itâs always the same question? Hahah. I posted my project over
Reddit and I got the same one hahah
fennecfoxy wrote 16 min ago:
Ha ha ha, it's because most humans overlap on a few things - like
eating, shitting, sleeping and fucking, ha ha ha.
lifestyleguru wrote 20 hours 15 min ago:
Last time I tried whisper, it hallucinated an elaborate conversation
from sounds of slapping and moaning and it took minutes to spit every
single line of it.
dotancohen wrote 16 hours 16 min ago:
If I remember correctly, the whisper documentation actually
recommends to trim non-speech portions as the models halucinate
heavily during those portions.
3eb7988a1663 wrote 18 hours 59 min ago:
Parakeet has been trained to detect non-voice sounds and exclude
that from identification, so you might have better luck with that
family.
supertroop wrote 20 hours 32 min ago:
Not sure if youâre being sarcastic but I think this is an
interesting question. Would deep seek be useful here since it is
local?
fibers wrote 16 hours 17 min ago:
just because it is local does not mean it wouldn't reject explicit
content. you can definitely try and find abilated models and can
attempt to use unsloth or something similar to tune it properly.
kaycey2022 wrote 10 hours 13 min ago:
Is abliteration even necessary. While âplaying aroundâ I have
noticed that most models are very strict only in the first
prompt. The moment you get past that with a good turn, the next
turn on you can get them to do _anything_.
okr wrote 16 hours 23 min ago:
Depends how deep you wanna go.
sarjann wrote 20 hours 33 min ago:
Asking the important questions
nntwozz wrote 11 hours 51 min ago:
I was meandering through the comments about to leave the topic when
my interest suddenly piqued upon reading the word porn.
fhdkweig wrote 19 hours 22 min ago:
The internet is for porn.
HTML [1]: https://www.youtube.com/watch?v=LTJvdGcb7Fs
pduggishetti wrote 20 hours 47 min ago:
You'll need a lora for this, porn content rejection is heavy. Or
you'll need a abliterated model, not sure if vision also works.
You might want to add something like yolo finetune to detect scenes +
face recognition too.
dotancohen wrote 16 hours 18 min ago:
For GP's purpose, can face recognition techniques be repurposed
for, um, other body parts recognition? Sometimes the actresses are
facing away from camera. There are exposed lips, if that helps.
fennecfoxy wrote 14 min ago:
Yes, for actresses _and_ actors I'm sure you'd get the same level
of performance as you would for any facial recognition use case.
You can't do facial recognition on someone's back, but I'm sure
there are other techniques/models that can be applied, many
people have unique marks/features etc.
vorticalbox wrote 18 hours 38 min ago:
Vision still works perfectly fine in abliterated models.
avadodin wrote 8 hours 34 min ago:
Just because they don't refuse it doesn't mean they are useful.
I found a few pornographic pictures on the web to hand to
Abliterated Gemma4 12B(literally just to test this) and it needs
pushing just to accept that people can be naked.
It didn't refuse but it also didn't provide useful descriptions
such as "this is a pornographic picture of a woman".
> G4: There is a person lying down in a scientific context, if I
had to guess they are a biologist in a classroom
> me: Is she wearing any clothes?
> G4: No.
Also, it is obsessed with penises âseeing them in compositions
where there is only a female. I suppose it's been trained to ban
dick pics or something.
Prompting may help some but 12B seems to be a bit worse than E4B
with the vision/audio model at voice and text reading so maybe
that one would do better.
pduggishetti wrote 17 hours 46 min ago:
Never tried any of this for porn, just speaking out how I would
go about it tbh!
rho138 wrote 21 hours 4 min ago:
This would fit most best as a âShow HN:â post :)
iliashad wrote 19 hours 44 min ago:
I tried to edit it and add Show HN, but it doesn't show the edited
version. Thank you!
culi wrote 20 hours 24 min ago:
The title should link to the "full article". I wonder if OP's domain
name is banned or something and they're doing this to get around it
lgats wrote 21 hours 4 min ago:
the link
HTML [1]: https://iliashaddad.com/blog/i-indexed-669-gb-of-my-gopro-vide...
iliashad wrote 19 hours 45 min ago:
Thank you
DIR <- back to front page