How it's leaky abstraction?
output_path is the path to store all the pechas created by the formatters. It defaults to ~/.openpecha/pechas/
But in OCRFormatter's create_opf, the output_path is passed as path of OpenPecha which is opf_path
now, the caller code, eg OCR-pipelines, needs to create opf_path for pecha, which in turn requires to create pecha_id. But the pecha_id generation is handled by Metadata, which is only created in the Formatters. So, with currently implementation, pecha will be saved at opf_path created by caller code. Since, it's doesn't have access to Metadata creation, metadata will generate new pecha_id. Now, we ended up with different, pecha_id in opf_path and `meta.yml.
Therefore, I think this is a leaky abstraction. The caller code only needs to provide where to store the all the pechas to the Formatters.
Why this problem exists?
I think, there is two scenarios, creating and loading pecha and we don't have clean way to handle these two.
Solution
1. When Creating new pechas
We should initialise the pecha object with the actual data like, base, layers, metadata, etc.
When saving, we should provide output_path which is the parent path of the pecha path like {output_path}/{pecha_id}/{pecha_id}.opf. Now the output_path is configurable, which is desired behaviour.
class OpenPechaFS(OpenPecha):
...
@property
def opf_path(self):
return self.base_path / self.pecha_id / f"{self.pecha_id}.opf"
def save(output_path: Path) -> Path:
self.base_path = output_path
...
return self.opf_path
2. When Loading existing pechas
I think we should go with classmethods, for eg:
pecha = OpenPechaFS.from_path(<path_to_pecha>) # loads local pecha
pecha = OpenPechaFS.from_id(<pecha_id>) # downloads pecha from github
pecha = OpenPechaGitRepo.from_commit_sha(<commit_sha>) # loads pecha from specific commit sha
How it's leaky abstraction?
output_pathis the path to store all the pechas created by the formatters. It defaults to~/.openpecha/pechas/But in OCRFormatter's
create_opf, theoutput_pathis passed aspathofOpenPechawhich isopf_pathnow, the caller code, eg
OCR-pipelines, needs to createopf_pathfor pecha, which in turn requires to createpecha_id. But thepecha_idgeneration is handled byMetadata, which is only created in theFormatters. So, with currently implementation, pecha will be saved atopf_pathcreated by caller code. Since, it's doesn't have access toMetadatacreation, metadata will generate new pecha_id. Now, we ended up with different,pecha_idinopf_pathand `meta.yml.Therefore, I think this is a leaky abstraction. The caller code only needs to provide where to store the all the pechas to the Formatters.
Why this problem exists?
I think, there is two scenarios,
creatingandloadingpecha and we don't have clean way to handle these two.Solution
1. When
Creatingnew pechasWe should initialise the
pechaobject with the actual data like,base,layers,metadata, etc.When saving, we should provide
output_pathwhich is the parent path of the pecha path like{output_path}/{pecha_id}/{pecha_id}.opf. Now theoutput_pathis configurable, which is desired behaviour.2. When
Loadingexisting pechasI think we should go with
classmethods, for eg: