-
-
Notifications
You must be signed in to change notification settings - Fork 31
feat: use a Lua interpreter for modules & templates rendering #2536
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Closes #2500. Closes #2520. Related to #2528. Co-authored-by: Nicolas Froment <[email protected]>
| "Sigles": [ | ||
| "<i>(masculí)</i> <i>Sigles de</i> <b>Alfabet Fonètic Internacional</b>", | ||
| "<i>(femení)</i> <i>Sigles de</i> <b>Associació Fonètica Internacional</b>", | ||
| "(masculí) <i>Sigles de</i> <b>Alfabet Fonètic Internacional</b>.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All those labels "loosing" their italic part is correct: the marca module outputs simple labels, and italic is done with CSS. To simplify the thing, no italic for us (and touching more the raw wikitext might be too much, lets see).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that I think more about it, those labels are rendered like that:
<span class="ib-content" about="#mwt158">informàtica<link rel="mw:PageProp/Category" href="./Categoria:Informàtica_en_català"></span>We could, in another PR, replace the code to get back italic. It's just that it will be another round of clean up in all those clean-up steps. I am not against doing so, but later maybe, I'll move to the next big steps instead).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious how check_word currently deal with that ? It probably fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually --check-word works 🍾
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And now I think about it, --check-word is becoming useless since it was used to check how our Python code was expanding wikitext. Any error happening at the expending step is already catch in --render.
|
I probably miss something... The PR right now fails on --render with I assume it works for you ? are global variable shared by multiprocessing.Process ? (Python 3.13.7, on macosx) |
|
Can you share the full traceback to see the origin? I bet there is something about how It works for me, yes, I generated dictionaries for all locales to check for regressions. |
|
Can you add this line in multiprocessing.set_start_method("fork", force=True)Maybe try "forkserver" if the former fails. |
|
Yes, it's something about the start method. "fork" is the default for linux and python 3.13 (forkserver in 3.14) while spawn is the default on mac. According to the documentation "fork" will crash mac subprocess ... Fork seems to run for now. |
|
Lets go with fork for now. We do not use subprocess, and if we hit an wall at some point, we could use a managed resource, yes. Another link for completeness: https://bugs.python.org/issue33725 |
wikidict/constants.py
Outdated
| # --parse: modules & templates "end patterns" to ignore when saving them in the database | ||
| MODULES_TO_IGNORE = ("/doc", "/documentation", "/sandbox", "/testcases") | ||
|
|
||
| # --render: modules & templates to override globally for the Lua interpreter (they can still be overrided by `template_ovverides[locale]`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo ovveride
|
Fork worked. I ran it for english and french. It seems to work very well and will for sure reduce the code base drastically... |
Yeah, this will be quite a change! |
Closes #2500.
Closes #2520.
Related to #2528.
Summup:
Locales support:
transcludemodule is problematic, it's about 6k impacted words, we can move without these missing definitions for a start)transcludesupport needed too, plus module clean-up in database, for later)Notes:
--rendercan be faster or equal for most locales, and at most double for big ones like FR/EN (from 9 minutes to 18, that's really not an issue). Interestingly, DE with almost 1 millions words is rendered very fast, while EL is amont the most impacted given its 360k words (maybe EL modules are more greedy, but again: not a problem)wikidict/context.pyis the core of changes. The name of the file could be better maybe, I'm open to eventual suggestions :)templates_ignored&definitions_to_ignoreare not yet taken into account. Will be done in a second time when the current code will be validated.last_template_handler,templates_multi,templates_others,templates_italicare now unused, same for all scripts.test_ca.pyto show a preview of changes needed with a local database (good thing is that only the markup is updated, the rest is good).log-analyzer.pyis maybe too much to keep in there, it finds the last word handled by a process (when it is "Job done." it means the process finished with success), when the process hangs. So, now, nothing hangs anymore, but for future locales support, it might be good to have it. I would prefer to have a shell one-liner for that, but my skills are not so good in this area ^^)This will require some testing to catch rendering issues. I am quite confident, and updates can be shipped quickly in case of really problematic case.