- USPTO Patent Grant Bibliographic Data (1976-present)
- Google USPTO Bulk Downloads: US Patents
- Canadian Intellectual Property Office IP Data Products: Patents
- European Patent Office: Bibliographic Data files
- World Intellectual Property Organization: PCT Bibliographic data
- IP Australia bulk data products
- State Intellectual Property Office of the P.R.C.: Chinese Patent Data (English machine translation)
- Korea Intellectual Property Rights Information Service (KIPRIS) Electronic Data Supply
- German Patent and Trade Mark Office (DPMA) data supply services
- Intellectual Property Office of New Zealand
Monday, 28 October 2013
Bibliographic patent data sources
As previously mentioned, many patent offices publish their bibliographic patent data in XML format in accordance with WIPO standard ST.36. Where can you find the data? As of the date of this post, the following web pages provide download links or purchase order information for such data.
Monday, 21 October 2013
XML format patent bibliographic data
Many patent offices (e.g. USPTO, EPO, WIPO, SIPO, CIPO) publish their bibliographic patent data in XML format in accordance with WIPO standard ST.36. Some patent offices (e.g. CIPO) make the data available free of charge, for non-commercial use. Others (e.g. USPTO) make the data available free of charge with no usage restrictions.
Like HTML, XML (extensible markup language) employs tags to encapsulate information. Unlike HTML tags, XML tags impart no display characteristics (e.g. fonts) to the tagged information. Also unlike HTML tags, XML tags are user-definable. This means that they can be—and usually are—self-describing. XML tags can also be arranged, e.g. nested to present information hierarchically. Patent bibliographic data stored in the XML format defined by WIPO's ST.36 standard utilizes self-describing tags which are defined and hierarchically arranged in accordance with the standard.
Consider this extract from the USPTO’s XML document for US patent no. 8309744. Notice the field tags. For example, the <country></country> tag pair encapsulates the “US” country code, telling us that this document pertains to a US patent.
The <doc-number></doc-number> tag pair encapsulates “08309744”, telling us the document's number.
The <kind></kind> tag pair encapsulates “B2”, telling us that the document is a granted utility patent.
The <date></date> tag pair encapsulates “20121113”, telling us that the patent issued on November 13, 2012.
Those four tag pairs are nested within the <document-id></document-id> tag pair which is in turn nested within the <publication-reference></publication-reference> tag pair. The information encapsulated by those tag pairs identifies the published document.
The <document-id></document-id>, <country></country>, <doc-number></doc-number>, <date></date> tag pairs are also hierarchically nested within a pair of <application-reference></application-reference> tags. Since the tags are self-describing, you can easily understand that the encapsulated information tells us that the '744 patent issued from US application serial no. 13/081,794 which was filed on April 7, 2011.
The depicted extract is just a small part of the USPTO's XML document publication for US patent no. 8309744. Anyone familiar with patent information could read the raw XML document and discern its meaning fairly readily. However, XML documents are not normally intended for human reading. Their primary purpose is to preserve a document's organization and structure in computer-readable form. The visualizations presented via this blog were developed by computer processing of XML documents corresponding to the visualized patent publications.
Like HTML, XML (extensible markup language) employs tags to encapsulate information. Unlike HTML tags, XML tags impart no display characteristics (e.g. fonts) to the tagged information. Also unlike HTML tags, XML tags are user-definable. This means that they can be—and usually are—self-describing. XML tags can also be arranged, e.g. nested to present information hierarchically. Patent bibliographic data stored in the XML format defined by WIPO's ST.36 standard utilizes self-describing tags which are defined and hierarchically arranged in accordance with the standard.
Consider this extract from the USPTO’s XML document for US patent no. 8309744. Notice the field tags. For example, the <country></country> tag pair encapsulates the “US” country code, telling us that this document pertains to a US patent.
The <doc-number></doc-number> tag pair encapsulates “08309744”, telling us the document's number.
The <kind></kind> tag pair encapsulates “B2”, telling us that the document is a granted utility patent.
The <date></date> tag pair encapsulates “20121113”, telling us that the patent issued on November 13, 2012.
Those four tag pairs are nested within the <document-id></document-id> tag pair which is in turn nested within the <publication-reference></publication-reference> tag pair. The information encapsulated by those tag pairs identifies the published document.
The <document-id></document-id>, <country></country>, <doc-number></doc-number>, <date></date> tag pairs are also hierarchically nested within a pair of <application-reference></application-reference> tags. Since the tags are self-describing, you can easily understand that the encapsulated information tells us that the '744 patent issued from US application serial no. 13/081,794 which was filed on April 7, 2011.
The depicted extract is just a small part of the USPTO's XML document publication for US patent no. 8309744. Anyone familiar with patent information could read the raw XML document and discern its meaning fairly readily. However, XML documents are not normally intended for human reading. Their primary purpose is to preserve a document's organization and structure in computer-readable form. The visualizations presented via this blog were developed by computer processing of XML documents corresponding to the visualized patent publications.
Monday, 14 October 2013
Patent bibliographic data basics
Bibliography is the description of books using details such as author, publication date, edition, etc. which collectively constitute bibliographic data. In relation to patents, bibliographic data encompasses details such as country, patent number & issue date; application number & filing date; priority number(s), country(ies) & date(s); invention title; inventor name(s), citizenship & address; assignee name(s), nationality & residence; and much more.
Have a look at the cover sheet of this United States patent. Everything that you see here—plus more information that you do not see here—constitutes this patent’s bibliographic data.
The visualizations presented via this blog make only limited use of the full range of available patent bibliographic data. In general, text and image information (e.g. abstract, description, claims, drawings) is not used. For the most part, information that can be counted is used.
For example, the question “how many patents did firm X prosecute on behalf of assignee Y for inventions handled by USPTO art unit Z ?” is answered by counting the number of patents which satisfy all three of those criteria. Accordingly, patent bibliographic details such as firm names, assignee names and art unit numbers are utilized. But, apart from counting the total number of claims in a patent, neither the text comprising a patent’s abstract, description and claims nor the drawing images are useful for the purposes of the visualizations presented via this blog.
Some dates can be useful, especially if they facilitate calculation of meaningful statistics for a large group of documents. For example, the time span between an application’s filing date and the corresponding patent’s issue date provides a useful measure that can be used to address questions such as “What is the average filing-to-issue time in years for US patents which issued in 2012 to assignee X for inventions in IPC subclass G06Q ?”
In future posts I’ll delve more deeply into other aspects of patent bibliographic data.
Have a look at the cover sheet of this United States patent. Everything that you see here—plus more information that you do not see here—constitutes this patent’s bibliographic data.
The visualizations presented via this blog make only limited use of the full range of available patent bibliographic data. In general, text and image information (e.g. abstract, description, claims, drawings) is not used. For the most part, information that can be counted is used.
For example, the question “how many patents did firm X prosecute on behalf of assignee Y for inventions handled by USPTO art unit Z ?” is answered by counting the number of patents which satisfy all three of those criteria. Accordingly, patent bibliographic details such as firm names, assignee names and art unit numbers are utilized. But, apart from counting the total number of claims in a patent, neither the text comprising a patent’s abstract, description and claims nor the drawing images are useful for the purposes of the visualizations presented via this blog.
Some dates can be useful, especially if they facilitate calculation of meaningful statistics for a large group of documents. For example, the time span between an application’s filing date and the corresponding patent’s issue date provides a useful measure that can be used to address questions such as “What is the average filing-to-issue time in years for US patents which issued in 2012 to assignee X for inventions in IPC subclass G06Q ?”
In future posts I’ll delve more deeply into other aspects of patent bibliographic data.
Monday, 7 October 2013
Bubble charts
Bubble charts are sometimes useful for visualizing data. This example uses color to encode country (mauve = Finland, peach = Israel, green = Italy) and size to encode number of patent documents. The labels identify USPTO art units. Overall, the visualization compares Finland, Israel and Italy in terms of the number of US patents which issued in 2012 to assignees located in those countries and which were allocated by the USPTO to one of five different art units. The five art units are:
For Finland, the next two most significant art units are 2916 and 2618 in that order, but you need to look closely to determine each bubble's size to get them in the right sequence. The Finland/2916 bubble corresponds to 51 patents and the Finland/2918 bubble corresponds to 46 patents. Difficulty in distinguishing bubble sizes is a downside of bubble charts.
For Israel, the next two most significant art units are 2617 and 2618 in that order, as is reasonably apparent from the bubbles’ respective sizes.
For Italy, the next two most significant art units are 2617 and 2624 in that order, but again you need to look closely to get them in the right order. The Italy/2617 bubble corresponds to 23 patents and the Italy/2624 bubble corresponds to 18 patents.
The bubble size discrimination problem can be addressed by adding ranking values (e.g. 1, 2, 3...) to the bubbles within each color group, by applying different patterns corresponding to the number of patents represented by each bubble, etc. However, such techniques can distract the viewer without adequately addressing the problem.
Bubble charts are useful if you only want to see an approximation. But, if precision matters, bubble charts may not be the best choice. If you look back at my "Top technology sectors by country" post, you’ll see that I used data bars to compare Finland, Israel and Italy in a different context. Consider whether it’s easier to understand the data bar visualization or the bubble chart visualization.
- 2617 (cellular telephony)
- 2618 (radio/satellite communications)
- 2624 (image analysis)
- 2916 (a design patent art unit)
- 2913 (another design patent art unit)
For Finland, the next two most significant art units are 2916 and 2618 in that order, but you need to look closely to determine each bubble's size to get them in the right sequence. The Finland/2916 bubble corresponds to 51 patents and the Finland/2918 bubble corresponds to 46 patents. Difficulty in distinguishing bubble sizes is a downside of bubble charts.
For Israel, the next two most significant art units are 2617 and 2618 in that order, as is reasonably apparent from the bubbles’ respective sizes.
For Italy, the next two most significant art units are 2617 and 2624 in that order, but again you need to look closely to get them in the right order. The Italy/2617 bubble corresponds to 23 patents and the Italy/2624 bubble corresponds to 18 patents.
The bubble size discrimination problem can be addressed by adding ranking values (e.g. 1, 2, 3...) to the bubbles within each color group, by applying different patterns corresponding to the number of patents represented by each bubble, etc. However, such techniques can distract the viewer without adequately addressing the problem.
Bubble charts are useful if you only want to see an approximation. But, if precision matters, bubble charts may not be the best choice. If you look back at my "Top technology sectors by country" post, you’ll see that I used data bars to compare Finland, Israel and Italy in a different context. Consider whether it’s easier to understand the data bar visualization or the bubble chart visualization.
Subscribe to:
Posts (Atom)