Using machine learning, draw boundaries on text and logos, as to classify information in a parent-child basis

Green: Title
Red: Text-1 (parent:green)
Pink: Text-2 (parent:green)
As input it could be used raw html, since the output is not expected to be a drawn image but rather a list of sub-text parented.
However, I think it is best to assume the html is not going to be reliable, and so, best to use screenshots
Using machine learning, draw boundaries on text and logos, as to classify information in a parent-child basis
Green: Title
Red: Text-1 (parent:green)
Pink: Text-2 (parent:green)
As input it could be used raw html, since the output is not expected to be a drawn image but rather a list of sub-text parented.
However, I think it is best to assume the html is not going to be reliable, and so, best to use screenshots