Application for data cleanup and data transformation
OpenRefine
|
Developer(s)
| Freebase
, then
Google
, now open source community
|
---|
Initial release
| November 10, 2010
; 13 years ago
(
2010-11-10
)
|
---|
|
Stable release
| 3.8.1
[1]
/ 21 May 2024
; 20 days ago
(
21 May 2024
)
|
---|
|
Repository
| |
---|
Written in
| Java
[2]
|
---|
Platform
| Microsoft Windows
,
Linux
,
macOS
|
---|
Available in
| English, Italian, Chinese, Japanese, French, German
|
---|
Type
| |
---|
License
| BSD License
|
---|
Website
| openrefine
.org
|
---|
OpenRefine
is an
open-source
desktop application for data cleanup and transformation to other formats, an activity commonly known as
data wrangling
.
[3]
It is similar to
spreadsheet
applications, and can handle spreadsheet file formats such as
CSV
, but it behaves more like a database.
It operates on
rows
of data which have cells under
columns,
similar to the manner in which
relational database
tables operate. OpenRefine projects consist of one table, whose rows can be filtered using
facets
that define criteria (for example, showing rows where a given column is not empty).
Unlike spreadsheets, most operations in OpenRefine are done on all visible rows, for example, the transformation of all cells in all rows under one column,
[4]
or the creation of a new column based on existing data. Actions performed on a dataset are stored the project and can be 'replayed' on other datasets. Formulas are not stored in cells, but are used to transform the data. Transformation is done only once.
[5]
Formula expressions can be written in
General Refine Expression Language (GREL)
,
[6]
in
Jython
(i.e., Python), and in
Clojure
.
[7]
The program operates as a local web app: it starts a
web server
and opens the default browser to
127.0.0.1
:3333.
Uses
[
edit
]
- Cleaning messy data
: for example if working with a text file with some semi-structured data, it can be edited using transformations, facets and clustering to make the data cleanly structured.
[8]
- Transformation of data
: converting values to other formats, normalizing and denormalizing.
- Parsing data from web sites
: OpenRefine has a URL fetch feature and
jsoup
HTML parser and DOM engine.
[9]
- Adding data to dataset by fetching it from web services
(i.e. returning
JSON
).
[10]
For example, can be used for
geocoding
addresses to
geographic coordinates
.
[11]
- Aligning to
Wikidata
(formerly
Freebase
[12]
): this involves
reconciliation
? mapping string values in cells to entities in Wikidata.
[13]
Supported formats
[
edit
]
Import
is supported from following formats:
[14]
If input data is in a non-standard text format, it can be imported as whole lines, without splitting into columns, and then columns extracted later with OpenRefine's tools. Archived and compressed files are supported (.zip, .tar.gz, .tgz, .tar.bz2, .gz, or .bz2) and Refine can download input files from a
URL
. To use web pages as input, it is possible to import a list of URLs and then invoke a URL fetch function.
Export
is supported in following formats:
[16]
Whole OpenRefine projects in native format can be exported as a
.tar.gz
archive.
Development
[
edit
]
OpenRefine started life as
Freebase Gridworks,
developed by
Metaweb
and has been available as open source since January 2010.
[17]
On 16 July 2010,
Google
acquired Metaweb,
[18]
the creators of
Freebase
, and on 10 November 2010 renamed Freebase Gridwords
Google Refine
, releasing version 2.0.
[19]
On 2 October 2012, original author David Huynh announced that Google would soon stop its active support of Google Refine.
[20]
[21]
[22]
Since then, the codebase has been in transition to an open source project named OpenRefine.
[23]
References
[
edit
]
External links
[
edit
]
Google
free and open-source software
|
---|
Software
| Applications
| |
---|
Programming languages
| |
---|
Frameworks and
development tools
| |
---|
Operating systems
| |
---|
|
---|
Related
| |
---|