Zettelkasten Forum


[OT] Markdown: split a file along headers?

I know this is not actually related to most of what we discuss here, but I figure there's probably quite a few people who use markdown extensively here. And I did put it in Random.

I have a markdown file that has a bunch of 2nd level headers. I would like to have a bunch of smaller markdown files, each of which consists of one of those headers. Is there any reasonable way to do this easily? I'm experimenting with the differences between writing a large document (a novel) as a single markdown document and as a series of scenes, because there are a number of ways to combine smaller files.

I keep thinking there must be a pandoc invocation for this, but I can't find it.

Any suggestions? Command-line is fine, writing python code is not.

Thanks!

Comments

  • Depends on what you deem reasonable :)

    This works for some files in Ruby.

    #!/usr/bin/env ruby
    
    require "strscan"
    
    OUT_FILENAME = "Output_"
    
    DOC = <<EOF
    # The doc
    
    Hello 
    
    ##Ignored
    
    Text
    
    ## Not Ignored
    
    more text
    ## ignoring the heading without space
    
    then more text
    
    ## Another heading
    
    C'est ça
    EOF
    
    scanner = StringScanner.new DOC
    sections = []
    
    def find_block(scanner)
      block = scanner.scan_until(/\n\n/) 
      if block.nil? 
        block = scanner.rest
        scanner.terminate
      end
      return block
    end
    
    def will_start_section(scanner)
      scanner.peek(3) == '## '
    end
    
    current_section_buffer = ""
    
    while !scanner.eos?
      if will_start_section(scanner) && !current_section_buffer.empty?
        sections << current_section_buffer
        current_section_buffer = ""
      end
    
      block = find_block(scanner)
      current_section_buffer += block  
    end
    
    # Append remainder when at end of the doc
    if !current_section_buffer.empty?
      sections << current_section_buffer
    end
    
    # Write out to disk:
    
    # sections.each_with_index do |section, index|
    #   filename = OUT_FILENAME + index.to_s + ".txt"
    #   File.open(filename, "w") do |file|
    #     file.puts section
    #   end
    # end
    
    

    Author at Zettelkasten.de • https://christiantietze.de/

  • I don't have a solution but FWIW I usually write these kinds of things in Scrivener -- it supports markdown and you can write your chapters/scenes/whatever in different files ("scrivenings"), which you compile into a whole document at the end. Not software agnostic though....

  • @mediapathic said:
    I keep thinking there must be a pandoc invocation for this, but I can't find it.

    I don't think you can do this with just the Pandoc command line interface. The closest option you may be thinking of is is --shift-heading-level-by=NUMBER.

    StackOverflow says to try csplit.

  • @JustinW80 said:

    StackOverflow says to try csplit.

    Looks like csplit did the trick. For the record, if anyone else comes looking for this, osx csplit has a slightly different syntax than any of the examples. So, the correct solution is, first:

    brew install coreutils if you haven't already to get "gcsplit", which has the right syntax, and then

    gcsplit --prefix='novelname' --suffix-format='%03d.md' novel-file.md /##/ "{*}"

    Thanks!

  • @mediapathic glad you found gcsplit. I've archived this as I am working on a book project and working with pandoc a bunch. Love pandoc, it can do incredible conversions. When I first looked at your question, I thought of regex. here is as far as I got.

    ##(.*(?:\n(?!##).*)+)

    Above will split a file into "Capture Groups". The next part, which I didn't yet get to, would be using maybe Keyboard Maestro to copy each capture group to its own file.

    @JustinW80's solution is far more elegant.

    Will Simpson
    My peak cognition is behind me. One day soon, I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    My Internet HomeMy Now Page

Sign In or Register to comment.