Skip to content

feat: NotebookLM research pipeline + infographic generation#601

Merged
codercatdev merged 24 commits intomainfrom
dev
Mar 5, 2026
Merged

feat: NotebookLM research pipeline + infographic generation#601
codercatdev merged 24 commits intomainfrom
dev

Conversation

@codercatdev
Copy link
Contributor

Summary

Merges the complete NotebookLM research pipeline into main. This adds automated trend discovery, deep research, and infographic generation to the video content pipeline.

What's New

NotebookLM Client (lib/services/notebooklm/)

  • Pure TypeScript — zero Python dependencies, direct HTTP calls to NotebookLM's BatchExecute API
  • types.ts — RPC method IDs, artifact types, interfaces
  • rpc.ts — BatchExecute encoding/decoding, shared fetchWithTimeout() and sleep()
  • auth.ts — Cookie parsing, CSRF extraction, fresh cookie capture from homepage
  • client.ts — Full client: notebook CRUD, source management, deep research, infographic/report generation, artifact polling
  • index.ts — Barrel exports

Research Service (lib/services/research.ts)

  • Creates notebook → adds source URLs → deep web research → imports sources → generates 5 infographics + report → extracts summary
  • ResearchPayload interface with infographicUrls: string[] for multiple infographics
  • 10-minute research timeout (deep research takes 5-10 min)
  • Graceful degradation — every step wrapped in safeStep() try/catch

Trend Discovery (lib/services/trend-discovery.ts)

  • 5 sources: Hacker News (aggressive web dev filtering), Dev.to, Blog RSS feeds, YouTube, GitHub Trending
  • Blog feeds: Cloudflare, Next.js, Vercel, Chrome Developers, Web.dev, Firebase
  • HN filtering: domain allowlist + keyword taxonomy + exclusion patterns (removes ~77% noise)
  • Cross-source deduplication and scoring

Ingest Route (app/api/cron/ingest/route.ts)

  • discoverTrends() replaces old fetchTrendingTopics()
  • Optional conductResearch() gated by ENABLE_NOTEBOOKLM_RESEARCH=true
  • Source URLs from trend signals seeded into NotebookLM notebook
  • Research data (briefing, sources, infographic URLs) fed into Gemini prompt
  • FALLBACK_TRENDS for graceful degradation

Test Results (Local)

Full end-to-end run completed successfully:

  • 346 trend signals collected (60 YouTube, 40 HN, 40 blogs, 26 GitHub, 180 Dev.to)
  • 55 research sources found by NotebookLM deep research
  • 5 infographics generated with unique titles (architecture overview, workflow, trade-offs, etc.)
  • 850KB research report produced
  • Gemini script generated from research data, critic scored 75/100
  • Sanity documents created (contentIdea + automatedVideo)

Environment Variables

Variable Required Description
ENABLE_NOTEBOOKLM_RESEARCH No Set to "true" to enable research pipeline
NOTEBOOKLM_AUTH_JSON If research enabled Auth cookies — supports raw JSON, file path (/path/to/file), or base64 (eyJ...)

Debug Logging

Debug logging is intentionally kept for Vercel log inspection. All NotebookLM RPC calls log URL, body length, and response size/preview.

Also Includes

  • Dashboard analytics (SiteAnalytics, Vercel packages)
  • RSS feed fixes, Cloudinary cleanup, dead code removal
  • Image quality settings update
  • Source-path fix for all notebook-scoped RPC calls
  • Fresh cookie capture from NotebookLM homepage (Google rotates session cookies)

content and others added 20 commits March 4, 2026 04:49
Adds sceneType (narration|code|list|comparison|mockup) and type-specific
objects to script.scenes for richer Remotion compositions.

- code: snippet, language, highlightLines
- list: items, icon
- comparison: leftLabel, rightLabel, rows
- mockup: deviceType, screenContent

All type-specific fields are optional and conditionally hidden in Studio
based on sceneType. Defaults to 'narration' for backward compatibility.

Coordinated with @videopipe for Remotion composition support.

Co-authored-by: content <content@miriad.systems>
… params

- Add raw Cloudinary object detection (old-format docs without _type)
- Add stripTransformations() to remove Cloudinary URL params
- Add getOriginalUrl() to construct canonical URLs from public_id
- Prevents uploading derived variants (avif, webp, resized copies)

Re-run results: 433 clean originals uploaded (down from 6,970 with variants)

Co-authored-by: builder <builder@miriad.systems>
Deletes unreferenced Sanity assets left over from migration.
Safety: checks document references before deleting, preserves all active assets.
Supports --dry-run mode.

Co-authored-by: builder <builder@miriad.systems>
…s, content-type headers

- Add full iTunes namespace to podcast feed (itunes:author, itunes:image,
  itunes:category, itunes:season, itunes:episode, enclosure tags)
- Create buildPodcastFeed() with hand-crafted XML for Apple Podcasts compatibility
- Add rssPodcastQuery with podcastFields (spotify, season, episode, guest)
- Fix hardcoded Cloudinary image URL in feed channel
- Fix content-type headers: text/xml → application/rss+xml
- Fix feed links to be content-type-specific (blog, podcasts, courses)
- Fix copyright year to be dynamic
- Fix YouTube feed links pointing to non-existent routes
The NotebookLM BatchExecute API requires a source-path query parameter
(/notebook/{id}) for all notebook-scoped operations. Without it, the API
returns null for research, artifacts, summary, etc.

Fixed: addSource, startResearch, pollResearch, importResearchSources,
generateInfographic, generateReport, listArtifacts, getSummary

Also adds debug logging to rpcCall and pollResearch for live API debugging.
* fix: migration script downloads only originals, strips transformation params

- Add raw Cloudinary object detection (old-format docs without _type)
- Add stripTransformations() to remove Cloudinary URL params
- Add getOriginalUrl() to construct canonical URLs from public_id
- Prevents uploading derived variants (avif, webp, resized copies)

Re-run results: 433 clean originals uploaded (down from 6,970 with variants)

Co-authored-by: builder <builder@miriad.systems>

* chore: add orphan asset cleanup script

Deletes unreferenced Sanity assets left over from migration.
Safety: checks document references before deleting, preserves all active assets.
Supports --dry-run mode.

Co-authored-by: builder <builder@miriad.systems>

* fix: RSS feed improvements — podcast iTunes support, proper enclosures, content-type headers

- Add full iTunes namespace to podcast feed (itunes:author, itunes:image,
  itunes:category, itunes:season, itunes:episode, enclosure tags)
- Create buildPodcastFeed() with hand-crafted XML for Apple Podcasts compatibility
- Add rssPodcastQuery with podcastFields (spotify, season, episode, guest)
- Fix hardcoded Cloudinary image URL in feed channel
- Fix content-type headers: text/xml → application/rss+xml
- Fix feed links to be content-type-specific (blog, podcasts, courses)
- Fix copyright year to be dynamic
- Fix YouTube feed links pointing to non-existent routes

---------

Co-authored-by: Miriad <miriad@miriad.systems>
Co-authored-by: builder <builder@miriad.systems>
- Add @vercel/analytics and @vercel/speed-insights packages
- Create SiteAnalytics component with Vercel + conditional Umami tracking
- Add analytics to main layout and dashboard layout
- Add Umami self-hosted setup documentation

Vercel Analytics/Speed Insights work automatically on deploy.
Umami activates when NEXT_PUBLIC_UMAMI_WEBSITE_ID env var is set.
Google rotates SIDCC, __Secure-*PSIDCC, and NID cookies on every
page load. Python's httpx captures these automatically via its cookie
jar, but Node.js fetch() does not. Without the fresh cookies, all
RPC calls return error code [5] (null result).

Now parses Set-Cookie headers from the homepage GET response and
merges them into the cookie record before building the Cookie header
for subsequent RPC POST requests.
The createNotebook RPC response format is [title, null, uuid, ...].
We were extracting from index 0 (the title string) instead of
index 2 (the UUID). This caused all subsequent RPC calls to use
the notebook title as the source-path instead of the UUID, which
the API rejects with error code [5].

Also fixed listNotebooks to use the same correct index.
The live API returns a triple-nested structure:
  [[[reportId, taskInfo, startTs, updateTs]]]
where taskInfo = [notebookId, [query, srcType], mode, sources, statusCode, researchMeta]

The old parsing expected a different nesting and couldn't find the
task data, always returning 'no_research'. Now correctly unwraps
the triple-nesting and extracts statusCode from taskInfo[4]
(1=in_progress, 2=completed) and taskId from researchMeta[0].
Before starting deep research, now adds up to 10 source URLs from
trend discovery to the notebook. This gives NotebookLM real context
(blog posts, HN links, etc.) to research from, matching the workflow
shown in the video.

Also increased research timeout from 5min to 10min since deep
research can take 5-10 minutes to complete.
Extracts up to 5 source URLs from the top trending topic's signals
and passes them to conductResearch() as sourceUrls. These get added
to the NotebookLM notebook before deep research starts, giving it
real articles/blog posts to analyze.
…cts, fix nesting

- Add getSourceIds() method that fetches source IDs from notebook via GET_NOTEBOOK RPC
- generateInfographic() and generateReport() now auto-fetch source IDs if not provided
- Fix sourceIdsTriple nesting: was [[[sid]]] (4 levels), now [[sid]] (matching Python's [[[sid]] for sid in source_ids])

Co-authored-by: research <research@miriad.systems>
…Urls array

- Generate 5 infographics with different angles (architecture, comparison, workflow, timeline, pros/cons)
- Change infographicUrl to infographicUrls (string array) in ResearchPayload
- Collect URLs for all completed infographics
- Generate report in parallel with infographics

Co-authored-by: research <research@miriad.systems>
…load

Co-authored-by: research <research@miriad.systems>
@vercel
Copy link

vercel bot commented Mar 5, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
codingcat-dev Error Error Mar 5, 2026 2:16am

@codercatdev codercatdev merged commit b8f256a into main Mar 5, 2026
0 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant