Skip to content

fix(37): protect against empty lines#38

Open
towi wants to merge 1 commit into
DCsunset:masterfrom
towi:master
Open

fix(37): protect against empty lines#38
towi wants to merge 1 commit into
DCsunset:masterfrom
towi:master

Conversation

@towi
Copy link
Copy Markdown

@towi towi commented Aug 3, 2023

removeLeadingWhitespaces() to return the input string if it only contains whitespaces.

added try-except around dedent for better error messages

removeLeadingWhitespaces() to return the input string if it only contains whitespaces.

added try-except around dedent for better error messages
@DCsunset
Copy link
Copy Markdown
Owner

Sorry for the late response. Could you give a few use cases and explain when the error would occur?

@towi
Copy link
Copy Markdown
Author

towi commented May 12, 2026

My mistake, yes. I should have included that.

Hard to rememeber now, but the AI found the original failure case.

A minimal reproducer is an included snippet using dedent that contains an empty line:

   # Repro                                                                                                                                                                         
                                                                                                                                                                                   
   ```python                                                                                                                                                                       
   !include`snippetStart="#BEGIN", snippetEnd="#END", dedent=4` snippet.py                                                                                                         
                                                                                                                                                                                   
 snippet.py:                                                                                                                                                                       
                                                                                                                                                                                   
```python                                                                                                                                                                         
   #BEGIN                                                                                                                                                                          
       def hello():                                                                                                                                                                
                                                                                                                                                                                   
           return "world"                                                                                                                                                          
   #END                                                                                                                                                                            

Without this fix, pandoc-include fails while dedenting the empty line with a rather unhelpful error:

   TypeError: sequence item 1: expected str instance, NoneType found                                                                                                               

There is no indication which include file or config caused the problem.

With this fix, empty/whitespace-only lines are preserved correctly during dedent, so the included output becomes:

   def hello():                                                                                                                                                                    
                                                                                                                                                                                   
       return "world"                                                                                                                                                              

I also found this in my original project: several Python snippets included with dedent=4 contained blank lines, so this explains the issue I originally ran into.

This is my Makefile rule to reproduce it:

PANDOC_INCLUDE_REPO ?= /tmp/DCsunset/pandoc-include
PANDOC_INCLUDE_BUG_COMMIT := 45b65df64d4b857792f9a8701d707b6cdd33d4d2

bug:
        if [ ! -d ${PANDOC_INCLUDE_REPO}/.git ]; then \
                mkdir -p /tmp/pi-github-repos/DCsunset; \
                git clone https://github.com/DCsunset/pandoc-include.git ${PANDOC_INCLUDE_REPO}; \
        fi
        cd ${PANDOC_INCLUDE_REPO} && git fetch -q origin pull/38/head && git checkout -q ${PANDOC_INCLUDE_BUG_COMMIT}
        -docker run --rm \
                -v ${PANDOC_INCLUDE_REPO}:/repo \
                -v ${PWD}:/data \
                ghcr.io/towi/pandoc-pretty-pdf \
                sh -lc 'pip install -q -e /repo && cd /data && pandoc --filter pandoc-include 10-anhang.md -o /tmp/10-anhang.html'
        cd ${PANDOC_INCLUDE_REPO} && git checkout -q master

with 10-anhang.md excerpt:

...
Rufen Sie dann `mypy meine-datei.py` auf und sehen Sie sich die
Fehlermeldungen an.  Sie können auch `mypy --strict meine-datei.py`
verwenden, um noch mehr Meldungen zu erhalten.

Hier ist das Program `prime-counts.py`, das die Häufigkeit von Primzahlen
in festgelegten Blockgrößen mithilfe des *Siebs von Eratosthenes* berechnet:

` ` `python
    !include`snippetStart="#BEGIN_MAIN", snippetEnd="#END_MAIN", dedent=4` 10d-mypy-example1.py
` ` `

Lassen Sie es mit `python prime-counts.py` laufen, gibt es aus: 

` ` `
{0: 25, 100: 21, 200: 16, 300: 16, 400: 17, 500: 14, 600: 16, 700: 14, 800: 15, 900: 14}
` ` ` 

    ...

and 10d-mypy-example.md:

#!/usr/bin/env python3
#BEGIN_MAIN
from typing import List, Dict

def sieve_of_eratosthenes(limit: int) -> List[bool]:
    """Implementiert das Sieb des Eratosthenes zur Generierung von Primzahlen.
    >>> sieve_of_eratosthenes(10)
    [False, False, True, True, False, True, False, True, False, False, False]
    """
    sieve: List[bool] = [True] * (limit+1)
    sieve[0:2] = [False, False]  # 0 und 1 sind keine Primzahlen
    current: int
    for current in range(2, int(limit**0.5) + 1):
        if sieve[current]:
            sieve[current * 2::current] = [False] * len(sieve[current * 2::current])
    return sieve

def prime_frequency(n: int, m: int) -> Dict[int,int]:
    """Berechnet die Häufigkeit von Primzahlen innerhalb Blöcken der Größe m.
    >>> prime_frequency(100, 10)
    {0: 4, 10: 4, 20: 2, 30: 2, 40: 3, 50: 2, 60: 2, 70: 3, 80: 2, 90: 1}
    """
    sieve: List[int] = sieve_of_eratosthenes(n)
    blocks: Dict[int,List[int]] = { i: sieve[i:i+m] for i in range(0, n, m) }
    counts: Dict[int,int] = {
        i: len([ is_prime for is_prime in block if is_prime ])
        for i, block in blocks.items()
    }
    return counts

if __name__ == "__main__":
    print(prime_frequency(1_000, 100))
#END_MAIN
    print(prime_frequency(1_000_000, 100_000))
    import doctest
    doctest.testmod()

Output:

buch/2023-functional$ make bug
if [ ! -d /tmp/DCsunset/pandoc-include/.git ]; then \
	mkdir -p /tmp/pi-github-repos/DCsunset; \
	git clone https://github.com/DCsunset/pandoc-include.git /tmp/DCsunset/pandoc-include; \
fi
Cloning into '/tmp/DCsunset/pandoc-include'...
remote: Enumerating objects: 908, done.
remote: Counting objects: 100% (186/186), done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 908 (delta 94), reused 120 (delta 55), pack-reused 722 (from 1)
Receiving objects: 100% (908/908), 401.56 KiB | 1.13 MiB/s, done.
Resolving deltas: 100% (513/513), done.
cd /tmp/DCsunset/pandoc-include && git fetch -q origin pull/38/head && git checkout -q 45b65df64d4b857792f9a8701d707b6cdd33d4d2
docker run --rm \
	-v /tmp/DCsunset/pandoc-include:/repo \
	-v /home/towi/buch/2023-functional:/data \
	ghcr.io/towi/pandoc-pretty-pdf \
	sh -lc 'pip install -q -e /repo && cd /data && pandoc --filter pandoc-include 10-anhang.md -o /tmp/10-anhang.html'
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Traceback (most recent call last):
  File "/usr/bin/pandoc-include", line 8, in <module>
    sys.exit(main())
  File "/repo/pandoc_include/main.py", line 333, in main
    return pf.run_filter(action, doc=doc)
  File "/usr/lib/python3.10/site-packages/panflute/io.py", line 227, in run_filter
    return run_filters([action], *args, **kwargs)
  File "/usr/lib/python3.10/site-packages/panflute/io.py", line 208, in run_filters
    doc = doc.walk(action, doc=doc, stop_if=stop_if)
  File "/usr/lib/python3.10/site-packages/panflute/base.py", line 264, in walk
    child = child.walk(action, doc, stop_if)
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 86, in walk
    ans = list(chain.from_iterable(ans))
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 84, in <genexpr>
    ans = ((item,) if type(item) is not list else item for item in ans)
  File "/usr/lib/python3.10/site-packages/panflute/containers.py", line 82, in <genexpr>
    ans = (item.walk(action, doc, stop_if) for item in self)
  File "/usr/lib/python3.10/site-packages/panflute/base.py", line 272, in walk
    altered = action(self, doc)
  File "/repo/pandoc_include/main.py", line 309, in action
    codes.append(read_file(fn, config))
  File "/repo/pandoc_include/main.py", line 160, in read_file
    content = "\n".join(dedent(content, config["dedent"]))
TypeError: sequence item 1: expected str instance, NoneType found
Error running filter pandoc-include:
Filter returned error status 1
make: [Makefile:17: bug] Error 83 (ignored)
cd /tmp/DCsunset/pandoc-include && git checkout -q master

Does that help?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants