Hi !
First of all, I want to thank you for this project, I rely on Tranco every day. 👍
I'm opening an issue here to track a problem we have and I think that it would benefit the project to fix it.
Here is the issue. When we look at a report like this one we can see that threat actors register names like:
ac-connection-status105.azurewebsites.net
active-az-status45.azurewebsites.net
These (randomly chosen) names have nothing to do with (randomly chosen again) names such as:
sigaa-formation.azurewebsites.net
contactessca.azurewebsites.net
If we search for azurewebsites.net in Tranco ranking we find a single entry: azurewebsites.net (ranked 161 as of writing).
It looks like a BIG blind spot in Tranco ranking. The example above is probably one among many others.
I suppose that this problem could be solved by using the PSL instead of relying on tldextract.
Here is the logic I suggest:
- Fetch the PSL
- Parse it to extract all suffixes
- Match names against these suffixes instead of using
tldextract
It will probably help solve this issue as azurewebsites.net is a suffix part of the PSL and the names in the example above will receive individual rankings. Along with the others that are not described here.
Let me know what you think of this.
Hi !
First of all, I want to thank you for this project, I rely on Tranco every day. 👍
I'm opening an issue here to track a problem we have and I think that it would benefit the project to fix it.
Here is the issue. When we look at a report like this one we can see that threat actors register names like:
ac-connection-status105.azurewebsites.netactive-az-status45.azurewebsites.netThese (randomly chosen) names have nothing to do with (randomly chosen again) names such as:
sigaa-formation.azurewebsites.netcontactessca.azurewebsites.netIf we search for
azurewebsites.netin Tranco ranking we find a single entry:azurewebsites.net(ranked 161 as of writing).It looks like a BIG blind spot in Tranco ranking. The example above is probably one among many others.
I suppose that this problem could be solved by using the PSL instead of relying on
tldextract.Here is the logic I suggest:
tldextractIt will probably help solve this issue as
azurewebsites.netis a suffix part of the PSL and the names in the example above will receive individual rankings. Along with the others that are not described here.Let me know what you think of this.