feat: NotebookLM research pipeline + infographic generation by codercatdev · Pull Request #601 · CodingCatDev/codingcat.dev

codercatdev · 2026-03-05T02:12:44Z

Summary

Merges the complete NotebookLM research pipeline into main. This adds automated trend discovery, deep research, and infographic generation to the video content pipeline.

What's New

NotebookLM Client (`lib/services/notebooklm/`)

Pure TypeScript — zero Python dependencies, direct HTTP calls to NotebookLM's BatchExecute API
types.ts — RPC method IDs, artifact types, interfaces
rpc.ts — BatchExecute encoding/decoding, shared fetchWithTimeout() and sleep()
auth.ts — Cookie parsing, CSRF extraction, fresh cookie capture from homepage
client.ts — Full client: notebook CRUD, source management, deep research, infographic/report generation, artifact polling
index.ts — Barrel exports

Research Service (`lib/services/research.ts`)

Creates notebook → adds source URLs → deep web research → imports sources → generates 5 infographics + report → extracts summary
ResearchPayload interface with infographicUrls: string[] for multiple infographics
10-minute research timeout (deep research takes 5-10 min)
Graceful degradation — every step wrapped in safeStep() try/catch

Trend Discovery (`lib/services/trend-discovery.ts`)

5 sources: Hacker News (aggressive web dev filtering), Dev.to, Blog RSS feeds, YouTube, GitHub Trending
Blog feeds: Cloudflare, Next.js, Vercel, Chrome Developers, Web.dev, Firebase
HN filtering: domain allowlist + keyword taxonomy + exclusion patterns (removes ~77% noise)
Cross-source deduplication and scoring

Ingest Route (`app/api/cron/ingest/route.ts`)

discoverTrends() replaces old fetchTrendingTopics()
Optional conductResearch() gated by ENABLE_NOTEBOOKLM_RESEARCH=true
Source URLs from trend signals seeded into NotebookLM notebook
Research data (briefing, sources, infographic URLs) fed into Gemini prompt
FALLBACK_TRENDS for graceful degradation

Test Results (Local)

Full end-to-end run completed successfully:

346 trend signals collected (60 YouTube, 40 HN, 40 blogs, 26 GitHub, 180 Dev.to)
55 research sources found by NotebookLM deep research
5 infographics generated with unique titles (architecture overview, workflow, trade-offs, etc.)
850KB research report produced
Gemini script generated from research data, critic scored 75/100
Sanity documents created (contentIdea + automatedVideo)

Environment Variables

Variable	Required	Description
`ENABLE_NOTEBOOKLM_RESEARCH`	No	Set to `"true"` to enable research pipeline
`NOTEBOOKLM_AUTH_JSON`	If research enabled	Auth cookies — supports raw JSON, file path (`/path/to/file`), or base64 (`eyJ...`)

Debug Logging

Debug logging is intentionally kept for Vercel log inspection. All NotebookLM RPC calls log URL, body length, and response size/preview.

Also Includes

Dashboard analytics (SiteAnalytics, Vercel packages)
RSS feed fixes, Cloudinary cleanup, dead code removal
Image quality settings update
Source-path fix for all notebook-scoped RPC calls
Fresh cookie capture from NotebookLM homepage (Google rotates session cookies)

@videopipe

Adds sceneType (narration|code|list|comparison|mockup) and type-specific objects to script.scenes for richer Remotion compositions. - code: snippet, language, highlightLines - list: items, icon - comparison: leftLabel, rightLabel, rows - mockup: deviceType, screenContent All type-specific fields are optional and conditionally hidden in Studio based on sceneType. Defaults to 'narration' for backward compatibility. Coordinated with @videopipe for Remotion composition support. Co-authored-by: content <content@miriad.systems>

…ideo ordering

… params - Add raw Cloudinary object detection (old-format docs without _type) - Add stripTransformations() to remove Cloudinary URL params - Add getOriginalUrl() to construct canonical URLs from public_id - Prevents uploading derived variants (avif, webp, resized copies) Re-run results: 433 clean originals uploaded (down from 6,970 with variants) Co-authored-by: builder <builder@miriad.systems>

Deletes unreferenced Sanity assets left over from migration. Safety: checks document references before deleting, preserves all active assets. Supports --dry-run mode. Co-authored-by: builder <builder@miriad.systems>

…s, content-type headers - Add full iTunes namespace to podcast feed (itunes:author, itunes:image, itunes:category, itunes:season, itunes:episode, enclosure tags) - Create buildPodcastFeed() with hand-crafted XML for Apple Podcasts compatibility - Add rssPodcastQuery with podcastFields (spotify, season, episode, guest) - Fix hardcoded Cloudinary image URL in feed channel - Fix content-type headers: text/xml → application/rss+xml - Fix feed links to be content-type-specific (blog, podcasts, courses) - Fix copyright year to be dynamic - Fix YouTube feed links pointing to non-existent routes

The NotebookLM BatchExecute API requires a source-path query parameter (/notebook/{id}) for all notebook-scoped operations. Without it, the API returns null for research, artifacts, summary, etc. Fixed: addSource, startResearch, pollResearch, importResearchSources, generateInfographic, generateReport, listArtifacts, getSummary Also adds debug logging to rpcCall and pollResearch for live API debugging.

* fix: migration script downloads only originals, strips transformation params - Add raw Cloudinary object detection (old-format docs without _type) - Add stripTransformations() to remove Cloudinary URL params - Add getOriginalUrl() to construct canonical URLs from public_id - Prevents uploading derived variants (avif, webp, resized copies) Re-run results: 433 clean originals uploaded (down from 6,970 with variants) Co-authored-by: builder <builder@miriad.systems> * chore: add orphan asset cleanup script Deletes unreferenced Sanity assets left over from migration. Safety: checks document references before deleting, preserves all active assets. Supports --dry-run mode. Co-authored-by: builder <builder@miriad.systems> * fix: RSS feed improvements — podcast iTunes support, proper enclosures, content-type headers - Add full iTunes namespace to podcast feed (itunes:author, itunes:image, itunes:category, itunes:season, itunes:episode, enclosure tags) - Create buildPodcastFeed() with hand-crafted XML for Apple Podcasts compatibility - Add rssPodcastQuery with podcastFields (spotify, season, episode, guest) - Fix hardcoded Cloudinary image URL in feed channel - Fix content-type headers: text/xml → application/rss+xml - Fix feed links to be content-type-specific (blog, podcasts, courses) - Fix copyright year to be dynamic - Fix YouTube feed links pointing to non-existent routes --------- Co-authored-by: Miriad <miriad@miriad.systems> Co-authored-by: builder <builder@miriad.systems>

- Add @vercel/analytics and @vercel/speed-insights packages - Create SiteAnalytics component with Vercel + conditional Umami tracking - Add analytics to main layout and dashboard layout - Add Umami self-hosted setup documentation Vercel Analytics/Speed Insights work automatically on deploy. Umami activates when NEXT_PUBLIC_UMAMI_WEBSITE_ID env var is set.

…kage.json

…ackage.json

Google rotates SIDCC, __Secure-*PSIDCC, and NID cookies on every page load. Python's httpx captures these automatically via its cookie jar, but Node.js fetch() does not. Without the fresh cookies, all RPC calls return error code [5] (null result). Now parses Set-Cookie headers from the homepage GET response and merges them into the cookie record before building the Cookie header for subsequent RPC POST requests.

The createNotebook RPC response format is [title, null, uuid, ...]. We were extracting from index 0 (the title string) instead of index 2 (the UUID). This caused all subsequent RPC calls to use the notebook title as the source-path instead of the UUID, which the API rejects with error code [5]. Also fixed listNotebooks to use the same correct index.

The live API returns a triple-nested structure: [[[reportId, taskInfo, startTs, updateTs]]] where taskInfo = [notebookId, [query, srcType], mode, sources, statusCode, researchMeta] The old parsing expected a different nesting and couldn't find the task data, always returning 'no_research'. Now correctly unwraps the triple-nesting and extracts statusCode from taskInfo[4] (1=in_progress, 2=completed) and taskId from researchMeta[0].

Before starting deep research, now adds up to 10 source URLs from trend discovery to the notebook. This gives NotebookLM real context (blog posts, HN links, etc.) to research from, matching the workflow shown in the video. Also increased research timeout from 5min to 10min since deep research can take 5-10 minutes to complete.

Extracts up to 5 source URLs from the top trending topic's signals and passes them to conductResearch() as sourceUrls. These get added to the NotebookLM notebook before deep research starts, giving it real articles/blog posts to analyze.

…cts, fix nesting - Add getSourceIds() method that fetches source IDs from notebook via GET_NOTEBOOK RPC - generateInfographic() and generateReport() now auto-fetch source IDs if not provided - Fix sourceIdsTriple nesting: was [[[sid]]] (4 levels), now [[sid]] (matching Python's [[[sid]] for sid in source_ids]) Co-authored-by: research <research@miriad.systems>

…Urls array - Generate 5 infographics with different angles (architecture, comparison, workflow, timeline, pros/cons) - Change infographicUrl to infographicUrls (string array) in ResearchPayload - Collect URLs for all completed infographics - Generate report in parallel with infographics Co-authored-by: research <research@miriad.systems>

…load Co-authored-by: research <research@miriad.systems>

vercel · 2026-03-05T02:12:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
codingcat-dev	Error		Mar 5, 2026 2:16am

…edVideo schema

…ed system instruction

…ring + idempotency

…nts + migration scripts

content and others added 20 commits March 4, 2026 04:49

feat: add sponsor bridge + restore detailed Gemini system instruction

daf26c9

feat: wire sponsor bridge into Stripe webhook + idempotency guard + v…

02035cf

…ideo ordering

chore: add orphan asset cleanup script

7b1f3f3

Deletes unreferenced Sanity assets left over from migration. Safety: checks document references before deleting, preserves all active assets. Supports --dry-run mode. Co-authored-by: builder <builder@miriad.systems>

feat: wire up SiteAnalytics in layouts and add Vercel packages to pac…

99c7105

…kage.json

feat: add SiteAnalytics to dashboard layout, add Vercel packages to p…

e1ca775

…ackage.json

debug: add URL and body length logging to rpcCall + cache:no-store

eb37117

fix(ingest): update to handle infographicUrls array from research pay…

e271c98

…load Co-authored-by: research <research@miriad.systems>

Miriad added 4 commits March 5, 2026 02:16

Merge feat/scene-type-schema into dev — scene type fields for automat…

4ceb80f

…edVideo schema

Merge feat/sponsor-bridge into dev — sponsor bridge function + restor…

5b88bfe

…ed system instruction

Merge feat/sponsor-bridge-webhook into dev — Stripe webhook bridge wi…

ead86dd

…ring + idempotency

Merge feature/cloudinary-to-sanity-migration into dev — RSS improveme…

ec1ff93

…nts + migration scripts

vercel bot temporarily deployed to Preview March 5, 2026 02:16 Inactive

codercatdev merged commit b8f256a into main Mar 5, 2026
0 of 3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: NotebookLM research pipeline + infographic generation#601

feat: NotebookLM research pipeline + infographic generation#601
codercatdev merged 24 commits intomainfrom
dev

codercatdev commented Mar 5, 2026

Uh oh!

vercel bot commented Mar 5, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

codercatdev commented Mar 5, 2026

Summary

What's New

NotebookLM Client (lib/services/notebooklm/)

Research Service (lib/services/research.ts)

Trend Discovery (lib/services/trend-discovery.ts)

Ingest Route (app/api/cron/ingest/route.ts)

Test Results (Local)

Environment Variables

Debug Logging

Also Includes

Uh oh!

vercel bot commented Mar 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

NotebookLM Client (`lib/services/notebooklm/`)

Research Service (`lib/services/research.ts`)

Trend Discovery (`lib/services/trend-discovery.ts`)

Ingest Route (`app/api/cron/ingest/route.ts`)

vercel bot commented Mar 5, 2026 •

edited

Loading