BMV - Behind The Curtain

 

Work smarter, not harder!!!! There is no chance that I could manually do all the PDF extraction, page manipulation, table of context and search key generation, and packaging for upload to website. Well technically I could, but it would be so tedious and error prone to do every step by hand. To that end, BMV has a rich set of “bmvtools” since early releases, and I have improved these many times over the years.

 

 

Magazine Content

Each magazine issue available within BMV is composed of one to N “pgNNNN.bmv” files. For example, the BCCA magazines are always 48 pages, so there will be pg0001.bmv through pg0048.bmv. Always four digits since pretty safe there will never be more than 9999 pages for any single magazine issue. The American Brewer does have some months with over 100 pages but nobody yet into the 1000’s. These are the individual pages scaled to 1200 pixels tall and simple “read and display” for fast display to the individual Pages window.

For each magazine issue there is also a Month.bmv which is copy of the first page scaled to 400 pixels tall. This is for fast display to the Month/TOC window. Additionally there is a Cover.bmv which is copy of the first page scaled to 125 pixels tall. These are the files used to quickly display in the Year window.

With BMV 4.0 there is an option to display new format for the Year window, which uses the large Month.bmv files instead of the smaller Cover.bmv files. Long ago there used to be 12 issues per year (e.g. Beer Cans Monthly and Rustlings) but for the past many years most are bi-monthly (six issues per year) or some just four issues per year. As such, I like the new BMV 4.0 Year format that shows 3 columns of larger Month.bmv instead of 4 columns of smaller Cover.bmv. The new format still works for older years, but will require vertical scroll to see more than the first six cover pages. Simple right-click can toggle between Old and New year format.

 

Proprietary BMV format / Copyright content

Note all files are “bmv” format, not “jpg”. The BMV project does NOT own most of the content, just has permissions to package and distribute different publisher content for hobby viewing. The magazine publisher retains all copyright ownership, and if you use the content for any commericial reason you should first request permission from the publisher. A requirement to use the American Brewer content was to include a “citation” back to the “Hagley Digital Archive”. Completely valid request, fully agreed, and with that inclusion there is a clear citiation of whom owns any of the content.

Very early content was standard “jpg” format, but based on the major hours to scan, format and produce content, I created a JPG derivative format which I appropriately named “bmv”. The BMV format is proprietary, not usable outside of the BMV program. Reverse engineering the “bmv” format per license is explicitly prohibited. Simply prevents someone from quick copy of all available BMV content and using for some other purpose. This proprietary format helped finally receive permissions to include the BCCA’s “Beer Cans & Brewery Collectibles”.

 

Producing the BMV content

The BMV started in 2002 for my personal view/search of the long defunct “Beer Cans Monthly” (Robert Dabbs) and follow on “Brewery Collectibles” (Jeff Cameron). As part of the desire to digitize Rustlings, I volunteered to drive that effort, which morphed into the BMV Project and inclusion at this writing now 21 different magazines and over 41,000 individual pages.

For all older content there was no magic solution, simply required MAJOR HOURS scanning of the printed content into JPG, and Adobe Photoshop Elements to manually consistent crop and scale to the BMV needed sizes. Nothing I could do about the manual scanning of printed pages, but dang well could “automate” the JPG processing. I have written a separate program named “BmvTools” of which can at least scale and encode between JPG <-> BMV, and this tool has options to generate the TOC’s, creating Month/Cover images as required, as well as creation of the binary search key data files. Once the original content was digital JPG, from there I could greatly speed up processing with BmvTools.

It was around 2008 working with Chris Taylor for Rustlings and Marica Butterbaugh for the BCCA, I started getting content in some form of digital format. This ranged from individual JPG, Word documents, and what the BCCA format was I believe Quark. At least the original content was starting as digital, and I could write code or use tools to handle much faster.

Fast forward to modern times, all content is now provided nicely as PDF, I use some commerical/open source tools for extraction of PDF to JPG (I have yet to find one tool that works for what should be universal standard PDF format) but I have learned what tool works best for which magazine producer, fast PDF extract to JPG, and then fast catalog to BMV directory structure. From there, a couple of BmvTools options to generate the binary files, ZIP/Package and ready to go. What literallly used to take 3-4 hours per one issue, can now be done in 15-20 minutes.

 

The BmvTools program

Also a Java written program using the same SWT toolkit and shared code between BMV and CanDB. Again, work smarter, not harder! Here are a few examples:

The “BMV Coding” tab/feature is used to convert between JPG and BMV format. The “Encode” and “Decode” are used often, the other for special cases, using software to do what I don’t want to do manually.

 

The “Create Data” tab/feature is used to generate the binary “.dat” files for BMV read. All sorting/manipulation and CPU/time is spent by BmvTools, such that the “.dat” files are in optimal format for very fast BMV read. Spend extra seconds/minutes producing the files, NOT on the BMV reading them. The “All” option used to be fairly fast with just BCM, adding Rustlings a bit slower, and now with 21 different magazines and 41,000 pages, it has gotten to maybe 10-12 minutes to fully generate all the needed binary files. Good thing it is the computer doing all the work, I just wait. Maybe someday I will JProfiler this and see if anything that could be made faster.

 

The “Package Update” tab/feature is used to generate the monthly ZIP files to allow user download with “Check For Updates”. Instead of downloading maybe 48 pages for a single BCCA issue, all files are bundled into a single ZIP file, for example:

Many fewer, compressed files are faster for me to upload, and faster for BMV user to download. BMV once file transferred from my website to local disk, will “uncompress” into normal BMV content and then remove the compressed package file. Note the “checksum” for each ZIP package. To ensure the intended byte content was 100% correctly uploaded from my computer to web site, and then 100% correctly from website to user local disk, I ensure the generated checksum exactly matches the downloaded checksum, and every byte is exactly as intended. Dang, this can take many minutes now to generate the UpdateMap.xml file, but again better for BmvTools to ensure user doesn’t get any corrupt data.

 

Check For Updates Feature

User has an earlier release of BMV and a possibly large set of downloaded content. There is a new BMV version available and/or new content, new magazine issues. How to quickly check what files are different/missing without major number of back/forth internet queries? The solution I created for BMV, and then used with CanDB, is when uploading new content the BmvTools creates a “ContentMap.xml”, one per each operating system supported. This will have *recommended* list of files in a single XML file, one file per element with name, date and checksum.

The program “Tools -> Check For Updates” HTTP downloads this single ContentMap.xml file, and then iterates over each recommended file and checks if that file/checksum already exists on user’s computer. If exact match, that file fully up-to-date. If different check sum (file changed) or if user file doesn’t exist at all (new file), a “Check For Updates” tree table is displayed for each changed/missing file name that should be downloaded. A user can reject, but will normally choose “OK” and the program will then make HTTP requests for each of the needed files, showing a checkmark, date, and “file processed” status as files are downloaded. Gives the user a very clear list of what will happen, and real-time status updates as the download is happening.

I really like this feature, minimizes both user network bandwidth and more importantly my web site CPU time. I like this feature so much I actually incorporated it into several of my real world work projects!!!

 

BMV hidden tools/featues

The BMV itself has some hidden tools/features, only available for Randy usage (the BMV checks for existence of a “Randy.Only” file. For recent cataloging of Canning Companies, Advertisement and Sponsors, again, work smarter, not harder! To catalog this new data, I have hidden BMV dialogs such as below. Within BMV simply view a given issue, hit the <SpaceBar> to iterate over pages, use one of the dialogs if I want to catalog as one of these special types. The three popup dialogs when the “Catalog” button is clicked, write all necessary info to appropriate directories, and when ready, there is a BmvTools option to process all Special catalog entries and merge/convert so ready for BMV packaging.

 

Hidden “Runtime Debug” dialog

I have written software professionally for too many years now, and something I tend to do with all program is built in test and inspection. Yes, I often use Java Debugger within the Eclipse IDE, but often there just needs to be some runtime debug out as to number, step performed, thread interaction. As such, my usage of BMV always has a Tools -> Runtime Debug dialog. I really do spend a signifcant time testing new code changes.

 

Usage of Sleak and JProfiler

Sleak is short hand for S-Leak, or fully an SWT Resource Leak detector. Within most operating systems a given program has limited (though very large in the 10.000s) available “resource handles”. While the allowed number is large, a good program will careully manage used resources, never duplication, and dispose as soon as out of scope. BMV creates many temporary scaled images for the Thumbnail or Print Preview displays, but it also very carefully manages EVERY ONE instance and calls dispose() as soon as possible. Here is a CanSleak screenshot. Note the left list shows the allocated resources, selecting one will show a picture/color/font in the right panel. If the “Stack”” check box selected, will show exactly what line of source code triggered resource creation. Just a single public domain Sleak.java class of which I added as CanSleak.java with some nice new featues (e.g. tracking peristent versus transient resources):

 

 

A commerical tool of which I purchased a single user license is JProfiler. This tool attaches to a running Java program and can be used to observe in great detail any memory leaks/usage or CPU hot spots. I used this often with BMV 3.0 major code changes, and some with BMV 4.0 just to see if could possibly squeeze out some more speed. A very good product!

The above is JProfiler CPU view of the early BMV 4.0 scaling 161 “Advertisement” thumbnail images. As I can confirm, there are 161 invocations of expected functions and how many micro secs, milliseconds, seconds and hopefully never minutes. To scale the 161 images from 1200 pixels down to 200 pixel tall thumbails under JProfiler (which does add notifceable performance penaly) took 124 seconds.

The Java GUI API uses “AWT” (Abstract Window Toolkit), I use SWT GUI (Standard Window Toolkit) widget/images. To scale images, I was not able to originally find any SWT functions, so I wrote code to convert between AWT <-> SWT, using AWT to scale/rotate. The more pixels, the more time. The above is a good example for me to investigate if there are modern API to scale an SWT image directly, and skip this expensive pixel-by-pixel conversion between AWT. Well…

Wow, a bit more research there is an alternative! SWT does not have a direct Image scale function, but it is possible to create a new Image object with blank canvas of the desired size, and then use an SWT GC to draw the existing Image onto new blank canvas, of which the drawImage() function will scale the pixels up/down as necessary. Straight allocation of a new pixel array, and then SWT function call to paint pixels into that new array! No more SWT -> AWT -> SWT expensive conversion. With the released BMV 4.0 code, note the major speed improvement. Even running under JProfiler performance hit, scaling those same 161 images took just 7.3 seconds. Major improvement from 214 seconds to 7.3 seconds. All thumbnail image loading/display is MAGNITUDES faster in BMV 4.0

 

Miscellaneous notes

There are many more notes I could probably add, but some important final for now:

·        The software has always been Java (now Java 8, near term maybe jump to Java 17). Mainly because Java is a “write once, run anywhere” language. There are some minor exceptions I have needed to work around, but the BMV/CanDB Java source is developed on a Windows 10/11 computer but the same code will run on all flavors of Windows, MacOS and Linux systems.

 

·        I use the Eclipse IDE, Ant build.xml rules to drive compile/JAR/ZIP, SVN for source code control, and WS_FTP Pro to upload files to website. The latter is a simple drag/drop with list of files on a left panel local disk, and list of destinations on in the right panel website directories.

 

·        My website is hosted by IPowerWeb. In the early 2000’s and search for a web hosting company, this matched my needs. For essentially unlimited disk space I pay approximately $280 every two years. Hosting BMV with the ever increasing content I don’t have disk space worries (I think latest summation was around 15GB). The website supports basic HTTP, SSL secured HTTPS, and PHP. I just use the web site for storage, there is no server side database or other running server code.

 

·        Virus detection. In these days of cybersecurity breaches, I really hope everyone uses at least one AntiVirus products. I have used ZoneAlarm Pro for probably a decade now, and occasionally run MalwareBytes as double check. The BMV/CanDB web hosting absolutely must not web host files with a virus. I am very confident all BMV/CanDB web hosted content is 100% virus clean.

 

·        If any questions of BMV internals, I am more than willing to discuss. I am VERY happy with both BMV and CanDB core design, and the code is getting cleaner and cleaner each iteration. For example, the new Print code I wrote for CanDB is now part of a “canutils.jar” and shared between both programs.Really takes advantage of general classes with specific sub-class extensions, Java Interfaces, and Java Reflection.