Leaky abstraction in `OpenPechFS`'s `path` arg

How it's leaky abstraction?
---
[`output_path`](https://github.com/OpenPecha/Toolkit/blob/master/openpecha/formatters/formatter.py#L116) is the path to store all the pechas created by the formatters. It defaults to `~/.openpecha/pechas/`

But in OCRFormatter's [`create_opf`](https://github.com/OpenPecha/Toolkit/blob/master/openpecha/formatters/ocr/ocr.py#L700), the `output_path` is passed as `path` of `OpenPecha` which is [`opf_path`](https://github.com/OpenPecha/Toolkit/blob/master/openpecha/core/pecha.py#L231)

now, the caller code, eg [`OCR-pipelines`](https://github.com/OpenPecha/OCR-Pipelines/blob/main/ocr_pipelines/parser.py#L66), needs to create `opf_path` for pecha, which in turn requires to create `pecha_id`. But the `pecha_id` generation is handled by [`Metadata`](https://github.com/OpenPecha/Toolkit/blob/master/openpecha/core/metadata.py#L100-L105), which is only created in the `Formatters`.  So, with currently implementation, pecha will be saved at `opf_path` created by caller code. Since, it's doesn't have access to `Metadata` creation, metadata will generate new pecha_id. Now, we ended up with different, `pecha_id` in `opf_path` and `meta.yml. 

Therefore, I think this is a leaky abstraction. The caller code only needs to provide where to store the all the pechas to the Formatters.

Why this problem exists?
---
I think, there is two scenarios, `creating` and `loading` pecha and we don't have clean way to handle these two.


Solution
---
**1. When `Creating` new pechas**

We should initialise the `pecha` object with the actual data like, `base`, `layers`, `metadata`, etc.
When saving, we should provide `output_path` which is the parent path of the pecha path like `{output_path}/{pecha_id}/{pecha_id}.opf`. Now the `output_path` is configurable, which is desired behaviour.

```python
class OpenPechaFS(OpenPecha):
    ...
    @property
    def opf_path(self):
        return self.base_path / self.pecha_id / f"{self.pecha_id}.opf"
        
    def save(output_path: Path) -> Path:
        self.base_path = output_path
        ...
        return self.opf_path
```

**2. When `Loading` existing pechas**

I think we should go with `classmethods`, for eg:

```python
pecha = OpenPechaFS.from_path(<path_to_pecha>)  # loads local pecha
```

```python
pecha = OpenPechaFS.from_id(<pecha_id>)  # downloads pecha from github
```

```python
pecha = OpenPechaGitRepo.from_commit_sha(<commit_sha>)  # loads pecha from specific commit sha
```





Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaky abstraction in `OpenPechFS`'s `path` arg #226

How it's leaky abstraction?

Why this problem exists?

Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Leaky abstraction in OpenPechFS's path arg #226

Description

How it's leaky abstraction?

Why this problem exists?

Solution

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Leaky abstraction in `OpenPechFS`'s `path` arg #226