Finish moving vectorized plot routines to pymathics.vectorizedplot#1736
Finish moving vectorized plot routines to pymathics.vectorizedplot#1736
Conversation
…orizedplot module in Mathics-core
|
@mmatera can you help me out by showing me the line of Python code I would need to load the module? |
Sure. Please give me the link to the project where you use it. |
For example, in MathicsDjango, you can put a or any other setup command. This file is then loaded by the line https://github.com/Mathics3/mathics-django/blob/8a94c365a3fb54e5e88299757247be49384d91b5/mathics_django/web/models.py#L39 |
|
@mmatera thanks for the info on how to programmatically load the required module. I applied that to the relevant repo (bdlucas1/math-plotting-demo) and things seem to work, as far as it goes. I haven't looked at the code in great detail, and I'm taking it on faith that the testing is still done properly. However a couple concerns: The module doesn't seem to have the three new plotting functions that I submitted PRs for. I assume the intent is to put them there? Can that be done before this PR is merged so that it is still possible to run them (as I think those branches will no longer work once front ends like m3d are updated to use the module instead of the flag). A more general concern: I noticed there's a bunch of code in the vectorized module that is duplicated with some small changes (e.g defaults for vectorized vs non-vectorized) from core (although maybe there are more differences, it's hard to tell). I think this is indicative of the breadth of the interface between the module and core, and the difficulty of defining and maintaining a stable API between the two, which I think will be an ongoing maintenance headache. (Of course that's just reiterating ground we already covered in the meeting, so I'm aware of your position.) |
Tests are what they used to be, except now I have two "sessions", one that loads the vectorizedplot library, and the other that uses the "native" code in core. So the tests should be as good as the original ones.
For preparing the module, I poke the code from this branch: https://github.com/Mathics3/mathics-core/tree/all_vectorized_plot_PRs
We can move more evaluation code from the module to Mathics-core and vice versa. What I do not like about the code which is currently in master is the use of an environment variable to choose the branch of the plotting code to be used on each particular case. I am a physicist, not a system engineer, so in the end, I rely on @rocky and your opinions. But from my perspective, -someone who writes simulation programs in WMA and Python, and a code voyeur- the need for all this code duplication seems to be a design issue: common code for vectorized and non-vectorized plotting should be in functions that do not depend on which implementation we are using. These functions should belong to mathics-core, right?
|
|
Sorry, my mistake. What is missing is that the tests I added in the PRs for the 3 new plotting functions are not in vec_tests.yaml in core. (When I didn't see those tests running I mistook that for meaning that the 3 new functions themselves were not in the module.) They are in the tests directory in the module, but do they belong there if the tests are being run from core?
If it's the environment variable per se that you object to, it is not an essential part of the design - I don't believe I use it any more, using instead the plot.use_vectorized_plot global variable. Another mechanism that could be used is a global system setting (sorry, not sure if that's what it's called in WMA) - there's plenty of precedent for things like $Language, $MinPrecision, $Pre/$Post etc. that affect the behavior of built-in functions. Regarding code duplication - ideally yes the default position should be that code shared between core and the module should stay in core instead of being duplicated into the module (as is the case with the module currently). But that means that the interface to that shared code in core becomes part of the stable (?!) API. The "surface area" of that API, and also its maturity (is this code that is likely to change?) is a key consideration for whether something is a good candidate for becoming a separate module. My gut feeling, not having spent any time looking into what such a stable API would look like (albeit being the author of that code) is that of course it can work as a module, but these issues are going to make that approach a maintenance headache. |
Yes, you are right, I forgot to put here the new YAML file. I will add it later.
If removing the environment variable is an improvement, this switch is something that we are not fully supporting in the next release of Mathics-core, because in many ways, the support is still incomplete:
Since @rocky wants to do a release in the next month, I do not see feasible to fix all of this (and other things that I am forgetting right now) before the release. So we can make progress in the module, and eventually move what makes more sense to mathics-core.
Indeed, there are many mechanisms. But for this kind of big things, the mechanism in WMA is to put the functionality in packages. Packages that can be
Currently, most of the API required for putting symbols outside is relatively stable. Most of the changes that have been made now was related to MakeBoxes rules (required for this specific module), and I hope we do not need to do major changes for a while. Other packages like NetworkX and NaturalLanguage shares the same interface, and required not too much effort to keep them updated. |
Yes. I am trying to be a bit heads down on getting a release out. That is one reason I haven't been tracking this closely and probably won't until the release is out. We still have several steps to go through before release. |
|
@mmatera thanks for moving the additional 3D tests here. |
|
I took a closer look at how the module has been split out from core, (Let me know if I've misunderstood the code or if you want more detail).
The first two can be addressed by removing the dead/duplicate code. The third class would involve removing the duplicated shared code from Having done that, those shared utility and base classes would become |
In the short term, specifically, until the next release, we will use the Mathics3 vector Module, removing any duplicated code that exists. In the longer term, things will move into mathics-core. It can be at whatever pace we want. I imagine that the Mathics3 Module will be removed when things are more complete across the board. Basically, now is not the right time for changes to the vector code in core, except to remove duplicate code. |
The code is going to stay split into a module in the short term. This is not something that is new. We discussed and decided that. Some thoughts about the "then" part, which forms the bulk of the comment. First, I think we all know this, but to be explicit about this: dead code here is dead because it has been duplicated in the module. If there is code that happens to be dead because it was dead previously, or for some other reason, it would be nice to know, and we should remove that. So largely, "duplicate" is the same as "dead and duplicate". Again, as we probably all know, having duplicate code is harmful because there needs to be effort to keep the two in sync for each change. If they become out of sync, which often happens, there is ambiguity and confusion as to which one is the correct code. Which code is used might even change depending on context. And I think we all agree that "dead code" is bad for reasons I think we all know. But in my view, while removing the duplicate/dead code is a nice thing to do, it is a lower priority. If someone feels strongly about removing it or has the time, please put in a PR to remove it. Sometime before release, I am sure a pass will be made over to address this. But it does not impact merging this in. Removing dead/duplicate code is a nice to have, but not necessary. It can even happen after release. On the other hand, merging this in is necessary for release. |
That is true for the my first and second bullet points (which are of less concern to me), but not for the third, nor for the tests (which I mentioned previously but didn't reiterate in my last comment): this is genuinely code that has been duplicated into the module instead of being used by the module from core, and is still used in both places, each using their own separate copy of the code. To be honest, on a personal note, it makes me sad that the effort I put into de-duplicating code has been considerably undone here. |
Thanks for pointing this out. I missed this.
There are ways, down the line, to clean this up or remove most of the duplication. But now is not the time to worry about this.
I do not understand. If you go through a small, specific, detailed example, maybe I might understand. Ensuring that people working on the project do not put in wasted effort is a big concern. |
I mentioned several examples in my third bullet point of duplicated plotting code that is present and used both in core and in the module. (Probably not an exhaustive list). Here is a list I generated of functions, variables, and classes related to plotting that appear both in core and in the module. This was generated by grepping for function, class, and variable definitions, so doesn't distinguish between dead and live duplicated code, and could have (hopefully few) false positives or false negatives. After removing the dead code (which should be easy), what remains should be live duplicated code. 2 _NUM_RE = |
|
In the list, there are two kind of duplications. One duplication comes from what is reimplemented in the module. These reimplementations could be done by class inheritance, but in the sake of make the track of changes more easy/clear, I prefer to duplicate the clases. The other part are the evaluation / auxiliar functions. One part of it are related to vectorized/compile function evaluations. Others with option processing. And others with building the vectorized version of Graphics. The problem is that all of them are quite interwined, so split what belongs to core and what to the vectorizedplot module would require more time that what I have right now. For that part, I expected to have some help from @bdlucas1, who was more familiar to what each piece of code does. In any case, any further step in splitting code requires some discussion. |
If you're referring for example to _Plot, Plot, and Plot3D, yes, the correct way to specialize them (to the extent needed - almost all of the code in those classes is not specific to core vs module) is by subclassing to override the very few things that need to be different (e.g. default values for some parameters). I'm surprised you view duplicating the code as preferable.
If you're referring to things like GraphicsGenerator and PlotOptions, nothing should or needs to be split - those belong in core so they can be used by both core plotting functions and by the module. Their whole point is to avoid code duplication among the various plotting routines, so yes, they support both classic and vectorized plotting routines with the same code (parameterized where specialization is needed), which have very large amounts of commonality in graphics generation and plot option processing, so that no splitting is needed or wanted. GraphicsGenerator - the code in the module and in core is identical. There is no reason the module could not call the code in core. PlotOptions - the versions in core and the module should be identical, but on further inspection just now it appears they have diverged, (I think) because you used the code from the PRs I submitted in December to create the module but didn't apply those PRs to core before pulling out the vectorized functions into a module, so the version in core is older than the version in the module. In any case if those PRs had been applied to core before separating out the module, PlotOptions would be identical between core and the module, and the module could just use the shared code in core, without duplicating it. This is the kind of divergence that code duplication risks, and once duplication has occurred and the code allowed to diverge, it takes a large effort to re-merge. I know this from exactly the painful experience that went into creating those shared classes from the several almost-but-subtly-and-confusingly-different versions of code like that was in the codebase, and which I hoped to remedy by creating things like PlotOptions and GraphicsGenerator that could be shared code used by all plotting routines. I haven't taken a close look at anything else, but would not be surprised to find more instances of duplication-with-divergence between core and the module. This makes me even sadder now that I see that the very thing I was trying to remedy - code duplication and divergence - has already been done to the code that I created to remedy it. In any case, you have my feedback, do with it what you will. |
|
@bdlucas1 Thanks for the work and the feedback. We will revisit things after a release is done. |
I do not say I see it as preferable, but just that in order to merge further changes that you could have locally, I tried to make the diff as small as possible. On the other hand, if at some point we reach the point in which everything works, and we still want to move this back to master, keeping the full implementation would make the transition smoother.
@bdlucas1, I agree, these two objects belongs to mathics-core. But to split up the part of the vectorization backend is something that for me would take more time that what I have available. Also, as you designed it, I won't go over this without consulting you the details.
@bdlucas1, If you can help us doing these changes, I would be happy to remove this duplication.
same.
I have to say, I really appreciate your effort. For this reason, I tried to put the code in shape in a way that do the final tweaks would be as smooth as possible for all of us.
Again, probably there are some more duplications. Tests are duplicated too. What I really would like is to end before the release with the most part of the code localized in the right place, without duplication. Just I do not have the time to do it by myself alone.
Again, at this point, it should not be too hard for you to make these final tweaks.
Thanks |
Maybe Rocky didn't tell you, but for reasons that I explained to him by email in January I decided then to significantly scale back my involvement in the project, limiting it to answering questions you might have about the work I did, and that hasn't changed. |
I communicated my understanding. But I make mistakes. So it's always good to get clarification(s). Thanks. |
Ok, probably I didn't fully understand your reasons, but in any case. When I have more free time, I plan to continue making the adjustments. My idea was to leave the vectorizedplot code in a state that you or someone else can use it as it is in any other project, or, if you change your mind, continue at any rate in almost where you left it. Thanks again |
NumericArrayare kept.