Zettelkasten Forum


[OT] Markdown: split a file along headers?

I know this is not actually related to most of what we discuss here, but I figure there's probably quite a few people who use markdown extensively here. And I did put it in Random.

I have a markdown file that has a bunch of 2nd level headers. I would like to have a bunch of smaller markdown files, each of which consists of one of those headers. Is there any reasonable way to do this easily? I'm experimenting with the differences between writing a large document (a novel) as a single markdown document and as a series of scenes, because there are a number of ways to combine smaller files.

I keep thinking there must be a pandoc invocation for this, but I can't find it.

Any suggestions? Command-line is fine, writing python code is not.

Thanks!

Comments

  • Depends on what you deem reasonable :)

    This works for some files in Ruby.

    #!/usr/bin/env ruby
    
    require "strscan"
    
    OUT_FILENAME = "Output_"
    
    DOC = <<EOF
    # The doc
    
    Hello 
    
    ##Ignored
    
    Text
    
    ## Not Ignored
    
    more text
    ## ignoring the heading without space
    
    then more text
    
    ## Another heading
    
    C'est ça
    EOF
    
    scanner = StringScanner.new DOC
    sections = []
    
    def find_block(scanner)
      block = scanner.scan_until(/\n\n/) 
      if block.nil? 
        block = scanner.rest
        scanner.terminate
      end
      return block
    end
    
    def will_start_section(scanner)
      scanner.peek(3) == '## '
    end
    
    current_section_buffer = ""
    
    while !scanner.eos?
      if will_start_section(scanner) && !current_section_buffer.empty?
        sections << current_section_buffer
        current_section_buffer = ""
      end
    
      block = find_block(scanner)
      current_section_buffer += block  
    end
    
    # Append remainder when at end of the doc
    if !current_section_buffer.empty?
      sections << current_section_buffer
    end
    
    # Write out to disk:
    
    # sections.each_with_index do |section, index|
    #   filename = OUT_FILENAME + index.to_s + ".txt"
    #   File.open(filename, "w") do |file|
    #     file.puts section
    #   end
    # end
    
    

    Author at Zettelkasten.de • https://christiantietze.de/

  • I don't have a solution but FWIW I usually write these kinds of things in Scrivener -- it supports markdown and you can write your chapters/scenes/whatever in different files ("scrivenings"), which you compile into a whole document at the end. Not software agnostic though....

  • @mediapathic said:
    I keep thinking there must be a pandoc invocation for this, but I can't find it.

    I don't think you can do this with just the Pandoc command line interface. The closest option you may be thinking of is is --shift-heading-level-by=NUMBER.

    StackOverflow says to try csplit.

  • @JustinW80 said:

    StackOverflow says to try csplit.

    Looks like csplit did the trick. For the record, if anyone else comes looking for this, osx csplit has a slightly different syntax than any of the examples. So, the correct solution is, first:

    brew install coreutils if you haven't already to get "gcsplit", which has the right syntax, and then

    gcsplit --prefix='novelname' --suffix-format='%03d.md' novel-file.md /##/ "{*}"

    Thanks!

  • @mediapathic glad you found gcsplit. I've archived this as I am working on a book project and working with pandoc a bunch. Love pandoc, it can do incredible conversions. When I first looked at your question, I thought of regex. here is as far as I got.

    ##(.*(?:\n(?!##).*)+)

    Above will split a file into "Capture Groups". The next part, which I didn't yet get to, would be using maybe Keyboard Maestro to copy each capture group to its own file.

    @JustinW80's solution is far more elegant.

    Will Simpson
    I must keep doing my best even though I'm a failure. My peak cognition is behind me. One day soon I will read my last book, write my last note, eat my last meal, and kiss my sweetie for the last time.
    kestrelcreek.com

Sign In or Register to comment.