To the request of the Internationalization Lab who helped us translated
the website in Farsi, I did some stats on the hits we see on the
website. These are aggregate numbers from April 24 to May 22 so I
thought I could as well publish them here as they might be of interest
to different people. I'm also documenting my scripts for the future and
in case I made errors (I often do on these things).
Translation stats
=================
These are the stats we publish in our month reports. They have nothing
to do with website hits but, since I'm writing this for the
Internationalization Lab, I thought I'd copy it here as well.
Overall translation of the website
----------------------------------
- de: 50% (2615) strings translated, 43% words translated
- fa: 47% (2502) strings translated, 54% words translated
- fr: 63% (3278) strings translated, 63% words translated
- it: 17% ( 949) strings translated, 18% words translated
- pt: 31% (1661) strings translated, 29% words translated
Total original words: 53520
Core pages of the website
-------------------------
See
https://tails.boum.org/contribute/l10n_tricks/core_po_files.txt
- de: 79% (1432) strings translated, 79% words translated
- fa: 40% ( 726) strings translated, 42% words translated
- fr: 73% (1330) strings translated, 77% words translated
- it: 49% ( 886) strings translated, 56% words translated
- pt: 55% (1001) strings translated, 55% words translated
Total original words: 14006
Hits per language
=================
for lang in en fa fr de ; do echo -n "${lang} " ; grep -E "GET
.+\.${lang}\.html HTTP/1\..\" 200" access.log* | wc -l ; done
en 1501323 (83.1%)
fa 11468 ( 0.6%)
fr 124823 ( 6.9%)
de 170007 ( 9.4%)
Top 50 pages in Farsi and their hits
====================================
Note that this doesn't mean that these pages are actually translated in
Farsi. For example, the top 2, 3, 8, 10, and 12 pages are not translated
into Farsi.
grep -E "GET .+\.fa\.html HTTP/1\..\" 200" /tmp/access.log | sed -n -re
's/.* ([^ ]+)\.fa\.html HTTP.*/\1/p' | sort | uniq -c | sort -rn | head
-n 50
686 /index
312 /install
189 /install/os
183 /news
162 /about
128 /support/faq
128 /getting_started
127 /install/win
126 /doc/anonymous_internet/Tor_Browser
120 /install/win/usb
106 /news/version_2.3
106 /install/win/usb/overview
105 /doc
103 /support/known_issues
103 /doc/first_steps/startup_options/bridge_mode
85 /doc/about/license
84 /support
83 /contribute
82 /press
74 /doc/about/warning
74 /contribute/how/donate
67 /security
66 /doc/first_steps/introduction_to_gnome_and_the_tails_desktop
66 /doc/about/trust
64 /install/vm
63 /doc/anonymous_internet/claws_mail_to_icedove
59 /doc/encryption_and_privacy/secure_deletion
59 /doc/anonymous_internet/tor_status
57 /doc/anonymous_internet/icedove
56 /doc/anonymous_internet/electrum
55 /doc/first_steps/bug_reporting
55 /doc/anonymous_internet/pidgin
55 /doc/about/features
54 /doc/anonymous_internet/i2p
53 /doc/first_steps/startup_options/network_configuration
52 /install/dvd
51 /security/Numerous_security_holes_in_2.2.1
51 /install/debian
50 /doc/introduction
50 /doc/anonymous_internet/index
49 /news/version_1.7
49 /doc/first_steps/upgrade
49 /doc/first_steps/startup_options/mac_spoofing
48 /doc/about/openpgp_keys
48 /doc/about/acknowledgments_and_similar_projects
47 /news/version_2.2.1
46 /news/version_2.2
46 /doc/advanced_topics/virtualization
45 /doc/first_steps/installation/manual/linux
44 /install/win/clone/overview
Top 50 pages across all languages
=================================
grep -E "GET .+\...\.html HTTP/1\..\" 200" /tmp/access.log | sed -n -re
's/.* ([^ ]+)\...\.html HTTP.*/\1/p' | sort | uniq -c | sort -rn | head
-n 50
554957 /news
154156 /install
146827 /install/os
99661 /install/win
70154 /index
65685 /install/win/usb/overview
62222 /install/win/usb
40132 /about
33440 /install/debian
28471 /getting_started
22188 /doc/about/warning
20882 /install/debian/usb
20486 /news/version_2.3
20428 /install/dvd
20370 /install/debian/usb/overview
19971 /install/linux
19889 /doc
19305 /install/vm
14709 /install/mac
12914 /install/win/clone/overview
12627 /doc/about/features
11638 /install/clone
10947 /install/download
10279 /support/faq
9732 /doc/first_steps/installation
9239 /install/linux/usb/overview
9072 /security/Numerous_security_holes_in_2.2.1
8242 /doc/first_steps/reset/windows
8204 /support/known_issues
8024 /install/linux/usb
7871 /support
6729 /doc/first_steps/installation/manual
6351 /doc/first_steps/startup_options/bridge_mode
6243 /doc/first_steps/startup_options/administration_password
6136 /doc/advanced_topics/virtualization/virtualbox
6088 /install/mac/usb/overview
5808 /doc/about/license
5746 /doc/get/verify
5654 /doc/first_steps/persistence/configure
5301 /doc/first_steps/startup_options
5123 /install/expert/usb/overview
4986 /doc/first_steps/startup_options/mac_spoofing
4968 /doc/first_steps/start_tails
4772 /install/mac/usb
4704 /doc/advanced_topics/virtualization
4698 /doc/about/fingerprint
4677 /install/expert/usb
4386 /doc/first_steps/upgrade
4333 /doc/first_steps/reset/linux
4289 /doc/about/requirements
Top 20 user agents
==================
grep -E "GET .+\.fa\.html HTTP/1\..\" 200" /tmp/access.log | sed -e 's/
/ /g' | cut -d ' ' -f 17 | sort | uniq -c | sort -rn | head -n 20
7722 "Mozilla/5.0
731 "Domain
701 "Mozilla/4.5
576 "Wget/1.15
490 "Mozilla/4.0
274 "Googlebot/2.1
252 "Riddler
232 "GigablastOpenSource/1.0"
99 "ltx71
61 "PrivateSearch/0.1.0
61 "eilisabot/1.0.0-beta"
54 "yacybot
49 "ResearchBot;
27 "UserAgent"
21 "DoCoMo/2.0
15 "SAMSUNG-SGH-E250/1.0
14 "Opera/9.80
11 "-"
10 "Ruby"
7 "UCWEB/2.0