Google Patent Data Analytics: Working with XML format patent bibliographic data

Monday 11 November 2013

Working with XML format patent bibliographic data

The USPTO issues United States patents in batches, on Tuesday of each week throughout the year. The Canadian Intellectual Property Office does the same: Canadian patents are issued in batches, on Tuesday of each week throughout the year.

According to the USPTO’s statistics 253,155 US utility patents were granted in 2012. I’m ignoring reissue, design and plant patents for comparison purposes. Canada does not grant design or plant patents. Instead of design patents, Canada grants industrial design registrations. Instead of plant patents, Canada grants plant breeders’ rights (these are administered not by the CIPO but by the Plant Breeders’ Rights Office, which is part of the Canadian Food Inspection Agency).

A search of the CIPO’s online patent database reveals that 21,592 Canadian utility patents issued in 2012. So, in 2012, the volume of Canadian utility patent grants was about 8.5% of the volume of US utility patent grants. An even greater disparity appears in relation to reissue patents: the USPTO granted 822 reissue patents in 2012, but only 20 Canadian reissue patents were granted in the decade spanning 2001-2011.

The USPTO and the CIPO publish bibliographic data for their respective granted patents in XML format, in accordance with WIPO’s ST.36 standard. The CIPO’s Canadian patent bibliographic data XML files are typically provided in .zip type archive files. For example, the CIPO’s 2012 XML format patent bibliographic data is provided in a 188 MB archive from which 58,572 separate XML files can be extracted. However, those XML files pertain not only to granted utility patents (kind code C) but also to laid-open applications (kind code A1), reissue patents (kind code E) and re-examined patents (kind code F).

Moreover, the CIPO may republish a patent bibliographic data XML file—if an error is detected in a previously published version thereof.  For example, the CIPO’s 2012 patent bibliographic data archive includes an XML file for Canadian patent no. 2121906 which issued on 29 April 1993. As shown here, that XML file contains a pair of <ca-date-updated></ca-date-updated> XML tags encapsulating the 31 December 2012 date on which the CIPO republished its XML bibliographic data file for the ‘906 patent (New Years Eve 2012 fell on a Tuesday).  In processing the CIPO’s patent bibliographic data, one must take any such republication into account and perform appropriate update operations on existing data.

This brief discussion touches on only some issues that one must be cognizant of in processing XML format patent bibliographic data. Next week I’ll discuss another issue specific to the USPTO’s XML format patent bibliographic data.