From a62f7b4ef847e7a393d6f73661d2887a44401145 Mon Sep 17 00:00:00 2001
From: Luca Foppiano <luca@foppiano.org>
Date: Thu, 12 Feb 2026 17:51:06 +0100
Subject: [PATCH 1/2] fix(doc): remove repetitions and copy-paste leftover

---
 README.md | 9 +++------
 1 file changed, 3 insertions(+), 6 deletions(-)

diff --git a/README.md b/README.md
index 7744c74..25e166b 100644
--- a/README.md
+++ b/README.md
@@ -790,8 +790,7 @@ The program then writes that one record into a local Parquet file, does a second
 
 ### Bonus: download a full crawl index and query with DuckDB
 
-If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run
-All of these scripts run the same SQL query and should return the same record (written as a parquet file).
+If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run:
 
 ```shell
 mkdir -p 'crawl=CC-MAIN-2024-22/subset=warc'
@@ -822,7 +821,7 @@ rm cc-index-table.paths
 cd -
 ```
 
-The structure should be something like this: 
+In both ways, the file structure should be something like this: 
 ```shell
 tree my_data
 my_data
@@ -835,10 +834,8 @@ my_data
 
 Then, you can run `make duck_local_files LOCAL_DIR=/path/to/the/downloaded/data` to run the same query as above, but this time using your local copy of the index files.
 
-> [!IMPORTANT]
-> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
+Both `make duck_ccf_local_files` and `make duck_local_files LOCAL_DIR=/path/to/the/downloaded/data` run the same SQL query and should return the same record (written as a parquet file).
 
-All of these scripts run the same SQL query and should return the same record (written as a parquet file).
 
 ## Bonus 2: combine some steps
 

From 0f91b1c2106a1d9fe2d3fceec9f3faac9b345dbe Mon Sep 17 00:00:00 2001
From: Luca Foppiano <luca@foppiano.org>
Date: Thu, 12 Feb 2026 17:54:20 +0100
Subject: [PATCH 2/2] fix(doc): move warnings above

---
 README.md | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/README.md b/README.md
index 25e166b..84a8395 100644
--- a/README.md
+++ b/README.md
@@ -790,16 +790,18 @@ The program then writes that one record into a local Parquet file, does a second
 
 ### Bonus: download a full crawl index and query with DuckDB
 
-If you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. Run:
+In case you want to run many of these queries, and you have a lot of disk space, you'll want to download the 300 gigabyte index and query it repeatedly. 
+
+> [!IMPORTANT]
+> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
+
+To download the crawl index, there are two options: if you have access to the CCF AWS buckets, run: 
 
 ```shell
 mkdir -p 'crawl=CC-MAIN-2024-22/subset=warc'
 aws s3 sync s3://commoncrawl/cc-index/table/cc-main/warc/crawl=CC-MAIN-2024-22/subset=warc/ 'crawl=CC-MAIN-2024-22/subset=warc'
 ```
 
-> [!IMPORTANT]
-> If you happen to be using the Common Crawl Foundation development server, we've already downloaded these files, and you can run ```make duck_ccf_local_files```
-
 If, by any other chance, you don't have access through the AWS CLI:
 
 ```shell