Skip to content

Conversation

@BoboTiG
Copy link
Member

@BoboTiG BoboTiG commented Oct 4, 2025

Closes #2500.
Closes #2520.
Related to #2528.

Summup:

  • tables support: ✔️
  • hieroglyphs support: ✔️
  • chemical/math formulas: ✔️
  • thread-safety: ✔️
  • pronunciations/genders: ✔️
  • variants: ✔️
  • reverse variants: ✔️

Locales support:

  1. CA: ✔️
  2. DA: ✔️
  3. DE: ✔️
  4. EL: ✔️
  5. EN: ✔️ (the transclude module is problematic, it's about 6k impacted words, we can move without these missing definitions for a start)
  6. EO: ✔️
  7. ES: ✔️
  8. FR: ✔️
  9. IT: ✔️
  10. NO: ✔️
  11. PT: ✔️
  12. RO: ✔️
  13. RU: ✔️
  14. SV: ✔️
  15. ZH: ✔️ (transclude support needed too, plus module clean-up in database, for later)

Notes:

  1. About numbers, --render can be faster or equal for most locales, and at most double for big ones like FR/EN (from 9 minutes to 18, that's really not an issue). Interestingly, DE with almost 1 millions words is rendered very fast, while EL is amont the most impacted given its 360k words (maybe EL modules are more greedy, but again: not a problem)
  2. wikidict/context.py is the core of changes. The name of the file could be better maybe, I'm open to eventual suggestions :)
  3. templates_ignored & definitions_to_ignore are not yet taken into account. Will be done in a second time when the current code will be validated.
  4. last_template_handler, templates_multi, templates_others, templates_italic are now unused, same for all scripts.
  5. I adapted test_ca.py to show a preview of changes needed with a local database (good thing is that only the markup is updated, the rest is good).
  6. log-analyzer.py is maybe too much to keep in there, it finds the last word handled by a process (when it is "Job done." it means the process finished with success), when the process hangs. So, now, nothing hangs anymore, but for future locales support, it might be good to have it. I would prefer to have a shell one-liner for that, but my skills are not so good in this area ^^)

This will require some testing to catch rendering issues. I am quite confident, and updates can be shipped quickly in case of really problematic case.

Closes #2500.
Closes #2520.
Related to #2528.

Co-authored-by: Nicolas Froment <[email protected]>
"Sigles": [
"<i>(masculí)</i> <i>Sigles de</i> <b>Alfabet Fonètic Internacional</b>",
"<i>(femení)</i> <i>Sigles de</i> <b>Associació Fonètica Internacional</b>",
"(masculí) <i>Sigles de</i> <b>Alfabet Fonètic Internacional</b>.",
Copy link
Member Author

@BoboTiG BoboTiG Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All those labels "loosing" their italic part is correct: the marca module outputs simple labels, and italic is done with CSS. To simplify the thing, no italic for us (and touching more the raw wikitext might be too much, lets see).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now that I think more about it, those labels are rendered like that:

<span class="ib-content" about="#mwt158">informàtica<link rel="mw:PageProp/Category" href="./Categoria:Informàtica_en_català"></span>

We could, in another PR, replace the code to get back italic. It's just that it will be another round of clean up in all those clean-up steps. I am not against doing so, but later maybe, I'll move to the next big steps instead).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious how check_word currently deal with that ? It probably fails.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually --check-word works 🍾

Copy link
Member Author

@BoboTiG BoboTiG Oct 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now I think about it, --check-word is becoming useless since it was used to check how our Python code was expanding wikitext. Any error happening at the expending step is already catch in --render.

@BoboTiG BoboTiG requested a review from lasconic October 4, 2025 11:47
@lasconic
Copy link
Collaborator

lasconic commented Oct 4, 2025

I probably miss something... The PR right now fails on --render with

 CTX.start_page(word)
    ^^^
NameError: name 'CTX' is not defined

I assume it works for you ? are global variable shared by multiprocessing.Process ?

(Python 3.13.7, on macosx)

@BoboTiG
Copy link
Member Author

BoboTiG commented Oct 4, 2025

Can you share the full traceback to see the origin? I bet there is something about how fork() works.

It works for me, yes, I generated dictionaries for all locales to check for regressions.

@BoboTiG
Copy link
Member Author

BoboTiG commented Oct 4, 2025

Can you add this line in render.py, like before using multiprocessing?

multiprocessing.set_start_method("fork", force=True)

Maybe try "forkserver" if the former fails.

@lasconic
Copy link
Collaborator

lasconic commented Oct 4, 2025

Yes, it's something about the start method. "fork" is the default for linux and python 3.13 (forkserver in 3.14) while spawn is the default on mac. According to the documentation "fork" will crash mac subprocess ...
https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
To make it clean we will have to init the db per process, or pass it around via the Manager

Fork seems to run for now.

@BoboTiG
Copy link
Member Author

BoboTiG commented Oct 4, 2025

Lets go with fork for now. We do not use subprocess, and if we hit an wall at some point, we could use a managed resource, yes.

Another link for completeness: https://bugs.python.org/issue33725

# --parse: modules & templates "end patterns" to ignore when saving them in the database
MODULES_TO_IGNORE = ("/doc", "/documentation", "/sandbox", "/testcases")

# --render: modules & templates to override globally for the Lua interpreter (they can still be overrided by `template_ovverides[locale]`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo ovveride

@lasconic
Copy link
Collaborator

lasconic commented Oct 4, 2025

Fork worked. I ran it for english and french. It seems to work very well and will for sure reduce the code base drastically...

@BoboTiG
Copy link
Member Author

BoboTiG commented Oct 4, 2025

Fork worked. I ran it for english and french. It seems to work very well and will for sure reduce the code base drastically...

Yeah, this will be quite a change!

@BoboTiG BoboTiG merged commit 2e8c73e into master Oct 4, 2025
1 of 3 checks passed
@BoboTiG BoboTiG deleted the feat-lua-runner-v2 branch October 4, 2025 21:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use Lua to Python to handle templates ? [EN] Support "egy-h" template

2 participants