For a publication library Win32 app, I am looking to extract data from its proprietary and undocumented file format.
* It is multilingual and precedes Unicode, most likely compressed.
* The individual documents are search indexed (indices in separate files).
* They are sorted into a hierarchy: Publication (file-level) > Chapter > Content.
* Some publications are magazines, their hierarchy is: Publication (file-level) > Year > Issue > Chapter > Content.
* The documents are interconnected with hyperlinks.
* Few publications contain images, most is formatted text.
Source:
I will provide you with the entire library viewer app including all of its publication files (1+GB).
Deliverables:
* The tool you develop to read and convert the files to the following format.
* I can work from a set of legible, interconnected HTML (and JPG) files with their TOC files, sorted into nested folders.
* All formatting, links and footnotes need to be retained.
* Indices should be on a separate file per publication, using HTML anchor tags < a id="uniqueID" > in the content files.
* I should be able to use the same tool on more files of the same specification.
* Delivering a command line tool for Win32, x64, Linux or macOS is fine.
Hi,
I had practice with different file formats; compressed and uncompressed, plain and encrypted.
Hope to write nice tool for you.
It would be great to see the example of database (even if it has size >1 GB);
and sometimes it is helpful to see any related tool(s) which already works with this database.
I'm ready to write short demo-example to prove that I understand the task right (before project accepting).
Thank you for nice and interesting project!
Hi, I am expert in C++ programming having more than 15 years of experience. I can do this work in very short time in a successful manner. Please let me know. Thanks