fix(37): protect against empty lines by towi · Pull Request #38 · DCsunset/pandoc-include

towi · 2023-08-03T15:10:45Z

removeLeadingWhitespaces() to return the input string if it only contains whitespaces.

added try-except around dedent for better error messages

removeLeadingWhitespaces() to return the input string if it only contains whitespaces. added try-except around dedent for better error messages

DCsunset · 2026-05-11T21:48:02Z

Sorry for the late response. Could you give a few use cases and explain when the error would occur?

towi · 2026-05-12T13:00:11Z

My mistake, yes. I should have included that.

Hard to rememeber now, but the AI found the original failure case.

A minimal reproducer is an included snippet using dedent that contains an empty line:

   # Repro                                                                                                                                                                         
                                                                                                                                                                                   
   ```python                                                                                                                                                                       
   !include`snippetStart="#BEGIN", snippetEnd="#END", dedent=4` snippet.py

                                                                                                                                                                                   
 snippet.py:                                                                                                                                                                       
                                                                                                                                                                                   
```python                                                                                                                                                                         
   #BEGIN                                                                                                                                                                          
       def hello():                                                                                                                                                                
                                                                                                                                                                                   
           return "world"                                                                                                                                                          
   #END

Without this fix, pandoc-include fails while dedenting the empty line with a rather unhelpful error:

   TypeError: sequence item 1: expected str instance, NoneType found

There is no indication which include file or config caused the problem.

With this fix, empty/whitespace-only lines are preserved correctly during dedent, so the included output becomes:

   def hello():                                                                                                                                                                    
                                                                                                                                                                                   
       return "world"

I also found this in my original project: several Python snippets included with dedent=4 contained blank lines, so this explains the issue I originally ran into.

This is my Makefile rule to reproduce it:

PANDOC_INCLUDE_REPO ?= /tmp/DCsunset/pandoc-include
PANDOC_INCLUDE_BUG_COMMIT := 45b65df64d4b857792f9a8701d707b6cdd33d4d2

bug:
        if [ ! -d ${PANDOC_INCLUDE_REPO}/.git ]; then \
                mkdir -p /tmp/pi-github-repos/DCsunset; \
                git clone https://github.com/DCsunset/pandoc-include.git ${PANDOC_INCLUDE_REPO}; \
        fi
        cd ${PANDOC_INCLUDE_REPO} && git fetch -q origin pull/38/head && git checkout -q ${PANDOC_INCLUDE_BUG_COMMIT}
        -docker run --rm \
                -v ${PANDOC_INCLUDE_REPO}:/repo \
                -v ${PWD}:/data \
                ghcr.io/towi/pandoc-pretty-pdf \
                sh -lc 'pip install -q -e /repo && cd /data && pandoc --filter pandoc-include 10-anhang.md -o /tmp/10-anhang.html'
        cd ${PANDOC_INCLUDE_REPO} && git checkout -q master

with 10-anhang.md excerpt:

...
Rufen Sie dann `mypy meine-datei.py` auf und sehen Sie sich die
Fehlermeldungen an.  Sie können auch `mypy --strict meine-datei.py`
verwenden, um noch mehr Meldungen zu erhalten.

Hier ist das Program `prime-counts.py`, das die Häufigkeit von Primzahlen
in festgelegten Blockgrößen mithilfe des *Siebs von Eratosthenes* berechnet:

` ` `python
    !include`snippetStart="#BEGIN_MAIN", snippetEnd="#END_MAIN", dedent=4` 10d-mypy-example1.py
` ` `

Lassen Sie es mit `python prime-counts.py` laufen, gibt es aus: 

` ` `
{0: 25, 100: 21, 200: 16, 300: 16, 400: 17, 500: 14, 600: 16, 700: 14, 800: 15, 900: 14}
` ` ` 

    ...

and 10d-mypy-example.md:

#!/usr/bin/env python3
#BEGIN_MAIN
from typing import List, Dict

def sieve_of_eratosthenes(limit: int) -> List[bool]:
    """Implementiert das Sieb des Eratosthenes zur Generierung von Primzahlen.
    >>> sieve_of_eratosthenes(10)
    [False, False, True, True, False, True, False, True, False, False, False]
    """
    sieve: List[bool] = [True] * (limit+1)
    sieve[0:2] = [False, False]  # 0 und 1 sind keine Primzahlen
    current: int
    for current in range(2, int(limit**0.5) + 1):
        if sieve[current]:
            sieve[current * 2::current] = [False] * len(sieve[current * 2::current])
    return sieve

def prime_frequency(n: int, m: int) -> Dict[int,int]:
    """Berechnet die Häufigkeit von Primzahlen innerhalb Blöcken der Größe m.
    >>> prime_frequency(100, 10)
    {0: 4, 10: 4, 20: 2, 30: 2, 40: 3, 50: 2, 60: 2, 70: 3, 80: 2, 90: 1}
    """
    sieve: List[int] = sieve_of_eratosthenes(n)
    blocks: Dict[int,List[int]] = { i: sieve[i:i+m] for i in range(0, n, m) }
    counts: Dict[int,int] = {
        i: len([ is_prime for is_prime in block if is_prime ])
        for i, block in blocks.items()
    }
    return counts

if __name__ == "__main__":
    print(prime_frequency(1_000, 100))
#END_MAIN
    print(prime_frequency(1_000_000, 100_000))
    import doctest
    doctest.testmod()

Output:

buch/2023-functional$ make bug
if [ ! -d /tmp/DCsunset/pandoc-include/.git ]; then \
	mkdir -p /tmp/pi-github-repos/DCsunset; \
	git clone https://github.com/DCsunset/pandoc-include.git /tmp/DCsunset/pandoc-include; \
fi
Cloning into '/tmp/DCsunset/pandoc-include'...
remote: Enumerating objects: 908, done.
remote: Counting objects: 100% (186/186), done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 908 (delta 94), reused 120 (delta 55), pack-reused 722 (from 1)
Receiving objects: 100% (908/908), 401.56 KiB | 1.13 MiB/s, done.
Resolving deltas: 100% (513/513), done.
cd /tmp/DCsunset/pandoc-include && git fetch -q origin pull/38/head && git checkout -q 45b65df64d4b857792f9a8701d707b6cdd33d4d2
docker run --rm \
	-v /tmp/DCsunset/pandoc-include:/repo \
	-v /home/towi/buch/2023-functional:/data \
	ghcr.io/towi/pandoc-pretty-pdf \
	sh -lc 'pip install -q -e /repo && cd /data && pandoc --filter pandoc-include 10-anhang.md -o /tmp/10-anhang.html'
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Traceback (most recent call last):
  File "/usr/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/repo/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
  File "/usr/lib/python3.10/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/panflute/io.py", line 208, in run_filters
    doc = doc.walk(action, doc=doc, stop_if=stop_if)
  File "/usr/lib/python3.10/site-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 86, in walk
    ans = list(chain.from_iterable(ans))
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 84, in <genexpr>
    ans = ((item,) if type(item) is not list else item for item in ans)
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 82, in <genexpr>
    ans = (item.walk(action, doc, stop_if) for item in self)
  File "/usr/lib/python3.10/site-packages/panflute/base.py", line 272, in walk
    altered = action(self, doc)
  File "/repo/pandoc_include/main.py", line 309, in action
    codes.append(read_file(fn, config))
  File "/repo/pandoc_include/main.py", line 160, in read_file
    content = "\n".join(dedent(content, config["dedent"]))
TypeError: sequence item 1: expected str instance, NoneType found
Error running filter pandoc-include:
Filter returned error status 1
make: [Makefile:17: bug] Error 83 (ignored)
cd /tmp/DCsunset/pandoc-include && git checkout -q master

Does that help?

fix(37): protect against empty lines

7e50089

removeLeadingWhitespaces() to return the input string if it only contains whitespaces. added try-except around dedent for better error messages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(37): protect against empty lines#38

fix(37): protect against empty lines#38
towi wants to merge 1 commit into
DCsunset:masterfrom
towi:master

towi commented Aug 3, 2023

Uh oh!

DCsunset commented May 11, 2026

Uh oh!

towi commented May 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

towi commented Aug 3, 2023

Uh oh!

DCsunset commented May 11, 2026

Uh oh!

towi commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

towi commented May 12, 2026 •

edited

Loading