-
-
Notifications
You must be signed in to change notification settings - Fork 27
[WIP] Monadical's first implementation of python bindings for libzim #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm not sure to understand why with have Docker stuff in this PR. Could you please explain it? |
Thank you very much! I haven't tested it already but I will. In the mean time, just taking a quick look at it, I have a few questions:
creator = pyzim.Creator(True)
creator.zim_creation('hola.zim',True,"eng",2048) Again, this is all from a quick read of the PR ; will test properly. Thanks again 👍 |
@kelson42 To compile the python extension libzim is needed, since some people might use Mac and libzim is hard to get to compile on a Mac I set up a docker container that can be used for development and testing. |
I advice to use the latest HEAD of the branch python3 where I cleaned the code a lot and already addressed some bugs and comments
Easily changed, up to you, just bear in mind there are some python naming restrictions for packages.
No, it's needed it might be an error at the example. finalise calls
Yes, initially I started developing in python 2.7 but all was moved to python3. It might be a mistake.
Mistake.
Just included for tests, perhaps the ones included with libzim are more appropriate.
add_art was a test version I used while debugging. Will not be included.
documentation and docstrings are pending once you are ok with the API.
It's was changed, the example file included was outdated. |
I haven't made a review of your code, but just base on previous comments :
To use the python extension you also need libzim. And you wouldn't set a docker container (if you can) to use a library. Why compile libzim is hard on Mac ? kiwix-build is here for that.
I would prefer to keep the method
That is a real question that goes beyond this project. libzim also have some test zim and I somehow dislike this.
I agree with @rgaudin comment and I would add that function with to many arguments is difficult to use. A better way would be to force the use of |
This reverts commit 38d5c7a.
Issue #4
Done ✅
@rgaudin I'll first need to understand what you want to do with Search. I might get a Search pointer from File.h search function, once in python search function I could use the iterator and yield. But please let me know What's the use case of Issue #5
I'll recheck tomorrow how redirects are used.
Issue #6
Issue #7
Done ✅ All exceptions will be catched and translated to Cython
Done ✅
Done ✅
I think this is related to how I am constructing the string, perhaps missing a semicolon or =
Done ✅ The constructor will not type content. Leave bytes untouched if content is str content is encoded to utf-8: def __cinit__(self, url="", content="", namespace= "A", mimetype= "text/html", title="", redirect_article_url= "",filename="", should_index=True ): It will encode if it's a python str or pass directly if bytes are inputed: bytes_content =b''
if isinstance(content, str):
bytes_content = content.encode('UTF-8')
else:
bytes_content = content The getter and setter are also adjusted so that bytes pass transparently if some Unicode Error arises (will need to check this assumption) @property
def content(self):
"""Get the article's content"""
data = self.c_zim_article.content
try:
return data.decode('UTF-8')
except UnicodeDecodeError:
return data
@content.setter
def content(self, new_content):
"""Set the article's content"""
if isinstance(new_content,str):
self.c_zim_article.content = new_content.encode('UTF-8')
else:
self.c_zim_article.content = new_content Example use case |
To avoid any confusion, we should call it from the name of the repo I have created. |
This is a bit heavy indeed. Why so big? A small ZIM file should be enough to run all the unit tests. |
this is not a valid python module name. we could have:
I like |
@rgaudin in pypi, it should be |
then that’s settled. thanks |
Ok, the biggest remaining things to do are:
|
Agree. |
Since all efforts are now focused on the blend approach in #3 , we're closing this PR for now. |
Not everything is to throw here. |
That's the plan. To reuse most of the reader code perhaps naming as ZimFileReader, ZimFileArticle to avoid confusing with article definitions, but without that particular article implementation. Just simple wrappers for File.h and Article.h (zim::) |
Progress
setup.py
,sdist
&bdist
setup,tox
?, setupREADME.md
md parsing instead of rst)Goals
libzim
API used bysotoki
andmwoffliner
(described below), with potential to expand to the rest of thelibzim
functionalitylibzim
's.h
filesasync/await
,multiprocessing
, orThreading
API Overview
1. Article
2. ZimCreator
3. ZimFileReader
__del__
?)Our work on this so far was inspired by Matthew G.'s existing code from here:
Resources