Википедия ? эркенаб энциклопедия сайталдасан материал
About me
Name:
Du?an
Krehe?
Born:
1991
,
Pre?ov
,
Czecho-Slovakia
Life:
Pre?ov Region
Email:
dusankrehel@gmail.com
Language (native):
Slovak
Language (understand):
Czech
Foreign languages:
German
,
English
,
Croatian
Social networks:
Bot
Articles
Exports
- Wikipedia projects (grouped by the local project):
Technologies
d0cmf
d0cmf
? shorted
Du?an's zero matrix format
Practical:
Practical comparison (2023-01 to 2023-06)
Original
|
d0cmf
|
RAW
|
bz2
|
RAW
|
bz2
|
91531991545B
|
16923192176B
|
8272043931B
|
1415546226B
|
91.5 GB
|
16.9 GB
|
8.2 GB
|
1.4 GB
|
|
9%
|
8%
|
- Notice: In a practical comparison, in d0cmf, pagevies are divided according to local wikipedia and thus their size is calculated. Sources:
[1]
[2]
.
A bonus for the community (if implemented):
- pageview statistic:
- Smaller compression size of files.
- When saving ? support for any long time interval.
- Store statistics divided according to local Wikipedia.
Revision databases
- Otherwise, the storage of site data.
- Revision line encoding
: From all revision lines is creating the line index and the revision are then the group of the lines indexes. The line index of revision is stored in the binary format.
- More:
https://archive.org/details/revision-database
Demonstration on skwiki-20240101-pages-meta-history.xml.bz2
|
Now
|
Concept
|
Database
|
~19GB
|
1 to 5GB
(5% to 26%)
|
Export (bz2)
|
~2.8GB
|
~1.1GB
(39%)
|
Wiki page language
Idea: to standardize the Wiki page language and to have the convertor
wiki ⇒ HTML
with DOM and DOM manipulation API.
- Benefits:
- Determining the boundaries where the bot and the user correspond,
- better tools for bots
- one standard, one change tracking document,
- the support of MediaWiki table in the three part software.
Test implementations (2022-12-09)
0.000
333
s
|
"dwiki"
|
0.000
275
s
|
"dwiki editor"
|
0.016
512
s
|
Wikimedia parser
|
1.260
279
s
|
Parsoid
|
More: