Quote:
|
Originally Posted by rivan
. . .I can use assistance filtering the relevant text out of the HTML . . .
Ideally, ending up with something like:
T.S.B.::PG014-01: Attachment: 2002 PDS Check Sheet (tab) 2002teet.pdf
T.S.B.::AC001-01: 01-02 Sequoia: A/C Compressor Durability Improvenent (OBSOLETE) (tab) ac00101r.pdf
|
I used regular expressions and textpad to clean up the table of contents -- although I wasn't able to get the right pdf name in there through any automated means. I search for everything between the first < and the last > in a single line and replace it with "" (or something along those lines).
Also, to create the bookmarks, I:
1. merged all PDFs into a single PDF for each section or subsection.
2. Created the bookmarks in the PDF using the "New Bookmarks from Structure . . ." tool, and chose "Part" in the dialog.
3. This results in a bookmark for each merged pdf along with sub-bookmarks for each title within that PDF (new bookmarks are created for style occurances I think).
4. Delete all sub-bookmarks, leaving only the bookmarks that correspond with the TOC that was downloaded.
5. Open the PDF with Textpad, scroll down to the bookmarks created in step 3. They look something like: "196 0 obj<</Parent 171 0 R/Next 198 0 R/Prev 194 0 R/Title(MA-5: Engine: Inspection)/A 195 0 R>>
endobj". This shows one that I already performed step 6 on, I can't remember quite what it had before for the title.
6. Copy and paste the bookmarks from the TOC into the PDF file. I could only figure out how to do this one at a time and is the most time-consuming piece of work. Be careful, sometimes I found that the bookmarks in the PDF were in reverse order. If someone is a Perl or Python expert, maybe they could come up with something quicker.
HTH