How Do You Structure Zettels from Long-form YouTube Tutorials?

vesper · August 2025

I often watch 2–4 hour YouTube tutorials or lectures on topics like Python, data structures, or media theory. These videos contain both conceptual explanations and live demos, but I struggle to extract structured notes without pausing constantly.
Has anyone developed a Zettelkasten-friendly workflow for capturing insights from long-form videos?
Do you take fleeting notes first, then convert them into Zettels? Or pause regularly to write permanent notes on the spot?

andang76 · August 2025

I essentially process the video in two phases.
I do phase 1 and then phase 2 for the whole video, or alternating the two phases in multiple sessions, sometimes I'm inspired to alternate.

Phase 1
One single "literature note" for every video, in which I write my reflections, questions, ideas and points captured from the video, as raw and frictionless as I can.
I stop the video according to the time need to write.
I use hierarchical bullet list style, that induce me to be raw and concise.
Sometimes capture a screenshot when it is full of meanings.
For some bullet it could be very useful to write the timestamp of the video.

Phase 2
I revisit the bullet list And I start to decide what to do with every bullet.
I can develop the bullet in a bigger piece of text, I can move and rearrange bullets and create small clusters, I can transform piece of texts in concepts, I can add new bullets as further developments. All this in the bullet list. And I try to make and write a "title" for relevant texts or clusters, in bold.
In the next section of the literature note, I start composing an outline of the titles I've expressed into the bullet list; these title almost always become links new notes, And I can start writing the body of these notes copying or developing further the stuff obtained in the outline, giving enough context. Other times texts are updates on already existent notes.
At the end of the process this outline represents the network of thoughts extracted from the video, and I can start to distribute this links into structure notes or other notes, and sometimes this outlines become an initial structure note about a field.

The process of writing directly main notes from the video is too inefficient to me.
It's better "grind" the video into raw pieces of texts during the watching, and then make the main notes at a later time starting from the raw pieces, with the necessary timing for thought

zettelsan · August 2025

This is most useful for YouTube contents that are words-driven, may not be entirely suitable for visually-oriented tutorials, but one thing you could do when you import the material into your notes is to have some tool transcribe the audio, so that you don't have to do that manually.

For example, I have Snipd app which allows me to have AI summarize and transcribe one of Sascha's coaching sessions available on YouTube. I use it to seed a literature note, which looks like this:

This is prior to my own "processing" so pretty much everything is as is exported from Snipd. Starting from this, I can edit or add my own notes to make it my own, reading or capturing off of screens things that are not spoken.

I don't meant to shill for Snipd but it allows uploading videos for processing like this. I believe there are other transcriber tools/services, or reading subtitles directly from YouTube.

andang76 · August 2025

Yes, transcriptions are a good resource to facilitate work, but I'm personally a bit reluctant to rely on them. Exclusively on them, at least.

A video almost always contains a wealth of information and effectiveness of message delivery, that a transcript often doesn't capture.

For example, someone who speaks with a certain tone can highlight the importance of a concept.
Timings, body language, expressions, can make a difference when given the same amount of available textual information.

A conversation between two people on the topic can convey much more than the text. I remember di

Another example, the content of a single frame, which can be illuminating much more than the contextual explanation.

Non-verbal communication aspects of the video, in general.

A video experience can have its own specific benefits.
Different modes, different activations: Listening activates different cognitive processes compared to reading. Watching and listening can help reinforce certain concepts more effectively through multisensory engagement

So I still recommend watching the video at least once as part of the work.
Balancing time and effort needed with the importance of the work, of course. If I need only to capture a small main concept from a video, the quickest way could be the best way.

Transcript has its own advantage, like Speed, Searchability, easier annotation, and ability to pick up details from words read rather than heard above all

zettelsan · August 2025

@andang76 said:
Yes, transcriptions are a good resource to facilitate work, but I'm personally a bit reluctant to rely on them. Exclusively on them, at least.

A video almost always contains a wealth of information and effectiveness of message delivery, that a transcript often doesn't capture

Machine transcription and AI summaries can indeed feed into the collector's fallacy, which is a major downside users should be aware of.

That said, processing video content is far more time-consuming without transcription and timestamping. You’d have to listen and manually transcribe quotes verbatim, which is fine if you’re diving straight into processing or rephrasing. But most people can’t meticulously handle every piece of content like that, especially given the sheer volume and varying quality of material out there.

With written sources, quoting and referencing is straightforward thanks to textual data and page numbers. For videos, machine transcription and timestamping make creating the framework for structure or literature notes much more time-efficient.

ameliapond · August 2025

True, transcriptions sometimes cannot show the whole picture of a video. Maybe we just need some tool not only do the transcription but also good at summarizing or rewriting for us. I love watching youtube videos or online courses for self-learning and I enjoy building my own knowledge tank based on these resources.

First, I will start with a clear goal of what to watch and learn. Most people just treat video courses on youtube like stream, knowledge flowing aside our brain and then disappearing. I usually use memo to help me catalog what to watch for the next few weeks, or months. Make a plan if necessary. So I can quickly locate where I have watched and what to learn next.

I recommend watching the video for at least 3 minutes. I think it's enough to know the main topic and style of the whole video and, most importantly, make sure it's high-quality and useful to you. I like checking the comments while watching.

Then, turning long videos into notes really costs me much time. I prefer using tools like y2doc or something alike. For example, use this: http://y2doc.com to transcribe and pull key points, highlighted words and timestamps for me. The whole structure it gives to me saves my time to watch the same part again and again.

The final step is to add my own explanations, annotations or screenshots to my notes. The notes generated by y2doc is editable markdown format, so I often roughly process my notes on it and then move it to my notetaking app for more detailed review.

GeoEng51 · August 2025

@ameliapond said:

Then, turning long videos into notes really costs me much time. I prefer using tools like y2doc or something alike. For example, use this: http://y2doc.com to transcribe and pull key points, highlighted words and timestamps for me. The whole structure it gives to me saves my time to watch the same part again and again.

Interesting service, y2doc. Thanks for sharing. I don't watch a lot of podcasts or YouTube videos, but I tried it on one video someone shared with me recently and I can see how one might use it. Do I understand things correctly, that the service just creates a transcript of what was said, logically organized into "chapters", with occasional "screen shots" from the video? They imply that there is also a summary of what was said, but I don't see how to access that (the documentation is pretty sparse).

ameliapond · August 2025

@GeoEng51 said:

@ameliapond said:

Then, turning long videos into notes really costs me much time. I prefer using tools like y2doc or something alike. For example, use this: http://y2doc.com to transcribe and pull key points, highlighted words and timestamps for me. The whole structure it gives to me saves my time to watch the same part again and again.

Interesting service, y2doc. Thanks for sharing. I don't watch a lot of podcasts or YouTube videos, but I tried it on one video someone shared with me recently and I can see how one might use it. Do I understand things correctly, that the service just creates a transcript of what was said, logically organized into "chapters", with occasional "screen shots" from the video? They imply that there is also a summary of what was said, but I don't see how to access that (the documentation is pretty sparse).

Yes, that's it. You can click the timestamps on the left and jump to which part you're most interested in. I find it most effecient to take notes. To get a summary, just click the AI conversion and it will change styles for you, summary, blog, conversation...(though I often keep every button default).

Zettelkasten Forum

How Do You Structure Zettels from Long-form YouTube Tutorials?

Comments

Howdy, Stranger!

Quick Links

Categories

In this Discussion