Database Information
Data current through
{date_database}
Saturday - September 23, 2017
McGuire Center for Entrepreneurship
The University of Arizona | Eller College of Management The University of Arizona Eller College of Management
Eller College Home > McGuire Center for Entrepreneurship > CRIE > Road Map > Normalization Priorities
Commercialization Research on Innovation and Entrepreneurship


Normalization Priorities

Normalization is the process of isolating statistical error in data. For database normalization, the data preprocessing plays a crucial role.

Anchoring to U.S. patent data, we envision normalizing the database so patent data can be merged with other fundamental datasets (financial data, merger-acquisition data, etc.) Below we outline some proposed priorities.

Proposed Priorities for Normalization

Normalization of Locations

Both inventors and firm assignees have location-specific data within the U.S patent system. These locations are noisy, so we are in the process of programmatically cleaning up everything we can using the follow protocol.

  1. Gather information from Google Map's GEOCODE API. Take the location string, and see if Google can format it and provide spatial details:
    http://maps.google.com/maps/api/geocode/json?sensor=false&address=Ontario,CA
  2. If the data returns true, take the first response, and pass the location data (longitude, latitude) to ASKGEO to determine the timezone1 relates to this location.
    http://www.askgeo.com/api/950095/repebj25kt81g57pnor3e9ojar/timezone.json?points=34.06334430,-117.65088760
  3. Cache the results so if we ever encounter this location string again.

Once this is done, there is a list of "bad locations" that Google was unable to parse. Many of these are anglicized foreign locations, which we deal with manually. For example, Patent # 5,059,256 is a patent from the Lithuania area during the Cold War Era. At the time the location was "Vilnjus, SU" ... after manually considering the information, we replace it with a Google-friendly string "Vilnius, Lithuania"

Any normalization process should be well documented. We are in the process of dealing with approximately 20,000 "bad locations" and when complete, we will make the manual replacements accessible for public scrutiny. If necessary, we will update any possible mistakes, once feedback is provided from the academic community.

Normalization of Firms (e.g., cusips or permnos)

Firms are not assigned a unique identifier with the U.S. Patent system. As such, it is challenging to ascertain if all patents have been carefully defined for a specific firm's patent portfolio.

e.g., IBM (permno 12490)

We perform a search for "IBM", "Intl Business", and "International Business." We then manually look at each patent variant and determine if we deem it applies to IBM (a simple green-light approach).

VARIANTS

IBM IBM Business Machines Corporation IBM Corp IBM Corp. IBM Corporation IBM Corporation of Armonk IBM International Business Machines Corporation IBM Japan Business Logistics Co., Ltd. IBM Japan Ltd. IBM Japan, Ltd. IBM Patent Operations IBM Thomas J. Watson Search Center International Business Machines International Business Machines - IBM International Business Machines Corp. International Business Machines Corporation International Business Machines, International Business International Business [[AND]] Technology Corporation International Business Business Machines International Business Corporation International Business Corpration International Business Development Co. International Business Development Company International Business Development Inc. International Business MAchines Corporation International Business Machiens Corporation International Business Machin es Corporation International Business Machinces Corp. International Business Machinces Corporation International Business Machine International Business Machine Company International Business Machine Corp. International Business Machine Corp. International Property Law International Business Machine Corporation International Business Machined Corporation International Business Machines - Corporation International Business Machines Cirporation International Business Machines Coirporation International Business Machines Company International Business Machines Company Corporation International Business Machines Coporation International Business Machines Coproartion International Business Machines Coproation International Business Machines Coproration International Business Machines Coprporation International Business Machines Coroporation International Business Machines Cororation International Business Machines Corp International Business Machines Corp. International Business Machines Corpoartion International Business Machines Corpoation International Business Machines Corporaion International Business Machines Corporaiton International Business Machines Corporartion International Business Machines Corporataion International Business Machines Corporatiion International Business Machines Corporatin International Business Machines Corporatioin International Business Machines Corporatiom International Business Machines Corporation International Business Machines Corporation Inc. International Business Machines Corporation Limited International Business Machines Corporation [ International Business Machines Corporation, International Business Machines Corporation, Inc. International Business Machines Corporation. International Business Machines Corporational International Business Machines Corporations International Business Machines Corporatoin International Business Machines Corporaton International Business Machines Corporatrion International Business Machines Corportaion International Business Machines Corportation International Business Machines Corportion International Business Machines Corproation International Business Machines Inc International Business Machines Inc. International Business Machines Inc. Corporation International Business Machines Incorp. International Business Machines Incorporated International Business Machines Incorporation International Business Machines Machine International Business Machines Machines International Business Machines Machines Corporation International Business Machines Operation International Business Machines corp. International Business Machines corporation International Business Machines for Corporation International Business Machines of Corporation International Business Machines, International Business Machines, Corp International Business Machines, Corp. International Business Machines, Corporation International Business Machines, Inc. International Business Machines, Incorporation International Business Machinesc Corporation International Business Machiness Corporation International Business Machins Corporation International Business Machnes Corporation International Business Machnies Corporation International Business Machnines Corporation International Business Macines Corp. International Business Macines Corporation International Business Macjines Coporation International Business Madnine Corporation International Business Mahcines Corporation International Business Mahines Corporation International Business Mcahines Corporation International Business Relations Bureau Inc. International Business Systems, Incorporated International Business Technology Corporation International Business and Machines Corporation International Business and Technology Corporation International Business machines Corporation International business Machines Corporation International, Business Machines Corporation International;Business Machines Corporation international Business Machines Corporation

We are developing algorithms that take information about the firm's name with its location, patent classifications, etc. to help us programmatically normalize the data. We will compare/contrast our network-dominant methodology with the NBER methodologies.

Updated Information (December 2014)

Mergent, an industry partner, has helped us develop some initial "smart-portfolio" methodologies that will be available for members of Mergent Patent Archives.

Free Service Mergent Patent Archives Members
Result set 50,000 records 250,000 records
~ Queue time 24 hours 15 minutes
Queue priority Economic Highest
Queue runs Every hour Every 5 minutes
Result time Few days Few hours
OCR tables 1 NO YES
Smart portfolios 2 NO YES

1. Patent Archive members can query the metadata extracted from the OCR process and include in a panel download.

2. Patent Archive members can utilize the CRIE wizard to build smart portfolios (patents to firms) based on different methodologies developed.

Normalization of Inventors (e.g., create a unique identifier for each)

Uniquely identifying inventors is also a nontrivial task. Manual Trajtenberg and colleagues attempted to apply the SOUNDEX1 algorithm to do this, and Lee Fleming and colleagues tried to do a nested-logic2 approach. We are in the process of utilizing modern Record Linkage algorithms (including SOUNDEX, Jaro-Winkler, etc.) and merge this information with a nested-network logic.

Once the algorithm is developed, a sample data set, the source code (in this case in JAVA), and sample output will be available for public scrutiny.

1. Trajtenberg, Manuel and Shiff, Gil and Melamed, Ran. 2006. The 'Names Game': Harnessing Inventors' Patent Data For Economic Research. National Bureau of Economic Research (Working Paper 12479): http://www.nber.org/papers/w12479

2. Lai, Ronald and D'Amour, Alexander and Fleming, Lee. 2008. The careers and co-authorship networks of U.S. patent-holders, since 1975. Harvard Business School http://dvn.iq.harvard.edu/dvn/dv/patent/faces/study/StudyPage.xhtml?studyId=38083

Normalization of Lawyers / Examiners (e.g., create a unique identifier for each)

Creating derivatives from the normalization processes outlined for inventors, patent agents and patent examiners and their respective law firms and departments will also be normalized.

Normalization of Claim Information

Can we use text analysis to normalize the claim information and conclude the percentage (%) of a patent that relates to a product innovation and the percentage (%) of a patent that relates to a process innovation.

Other data (ideas are welcome)

The success of CRIE will be dependent on its adoption by academics. Tell your friends. Create an account. Use the system. Provide feedback on what you like, what other data you would like to see, and how we can improve. Your feedback is essential for CRIE's success and ultimately better academic research using patent data.


For additional information, please contact us.

hosted by Mergent

powered by Patent Rank

Patent Data Repository
* Email:
Patent Data Repository
* Password:
Patent Data Repository
Lost Password Reset Password Activate Account
* Email:
This will be your user name (EDU email)
* Create Password:
* Repeat Password:

  Title:
* Name:      
William        H.       Gates       III 
* Preferred:
Bill Gates or "Bill"      

* Service:
By checking this box, you agree to our terms of service.
If you check this box, we will send you a monthly newsletter.
If you check this box, we will send you promotional information about the Patent Data Repository, etc.

- Reference:
Monte  

http://crie.patent-rank.com/ [user] [crie-sandbox]