Skip to content

Use Public Suffix List ? #41

@koromodako

Description

@koromodako

Hi !

First of all, I want to thank you for this project, I rely on Tranco every day. 👍

I'm opening an issue here to track a problem we have and I think that it would benefit the project to fix it.

Here is the issue. When we look at a report like this one we can see that threat actors register names like:

  • ac-connection-status105.azurewebsites.net
  • active-az-status45.azurewebsites.net

These (randomly chosen) names have nothing to do with (randomly chosen again) names such as:

  • sigaa-formation.azurewebsites.net
  • contactessca.azurewebsites.net

If we search for azurewebsites.net in Tranco ranking we find a single entry: azurewebsites.net (ranked 161 as of writing).

It looks like a BIG blind spot in Tranco ranking. The example above is probably one among many others.

I suppose that this problem could be solved by using the PSL instead of relying on tldextract.

Here is the logic I suggest:

  1. Fetch the PSL
  2. Parse it to extract all suffixes
  3. Match names against these suffixes instead of using tldextract

It will probably help solve this issue as azurewebsites.net is a suffix part of the PSL and the names in the example above will receive individual rankings. Along with the others that are not described here.

Let me know what you think of this.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions