Vinyl records as an example of beginning a “global open resource”, Global Open Resources, Translation Ids
Jorge,
To become a global open resource for any topic or any resource is not hard, it takes dedication, patience, and years. Also good database, organization and open communication skills.
(“70s vinyl” OR “70s vinyls”) has 143,000 entry points today. I recognize it but have not made effort to gather and organize it.
(“vintage” “vinyl” (“record” OR “records”) ) has 24.2 Million entry points
(“vinyl” (“record” OR “records”) ) has 110,000 entry points
(“phonograph” (“record” OR “records”)) has 7.37 Million entry points
(“phonograph record” OR “phonograph records”) has 3.5 Million entry points.
((“phonograph” OR “phonograph”) “preservation”) has only 10 entry points
((“phonograph” OR “phonograph”) (“preservation” OR “preserving”)) has 10 entry points
( site:loc.gov (“phonographs” OR “vinyls”) ) has 4,080 entry points
( site:loc.gov (“vinyl” OR “vinyls”) ) has 7,270 entry points and might be a good place to start.
But do not rely on any government agency on the Internet. It is not stable, up-to-date, complete, open, reliable. That is a technical assessment for use of web sites. There are other issues.
(“vinyl recording” OR “vinyl recordings” OR “vinyl records”) has 31.2 Million entry points. When anyone say there are “only a few”, I suspect they are wrong.
English: (“vinyl record” OR “vinyl records”) 37.0 Million
Bengali: (“ভিনাইল রেকর্ড” বা “ভিনাইল রেকর্ড”) 16,700 entries
Bengali: (“ভিনাইল রেকর্ড” OR “ভিনাইল রেকর্ড”) has 25.9 Million
China1: (“黑胶唱片”或“黑胶唱片”)has 9.19 Million entries
China1: (“黑胶唱片” OR “黑胶唱片”)has 9.96 Million
“China1” is “Chinese Simplified”, “Mainland China”, not yet “global open Chinese”. I am working on all the languages: recording, identification, recognition, encoding, decoding.
https://zh.wikipedia.org/zh-hans/%E9%BB%91%E8%86%A0%E5%94%B1%E7%89%87 can be translated to English and 59 languages. But I doubt the specific language author are cooperating to make ONE global open concept that combines it all.
Korean: (“축음기 음반”) has 9,210 entry points
Korean: (“비닐 레코드” OR “비닐 레코드”) has 193,000 entry points
Do you begin to see that “vinyl records” impacts the whole Internet? All humans? It is part of the whole. When you trace “vinyl records” in depth, eventually you find you have touched all people.
My problems, for the Internet Foundation, is that (1) the languages are inefficiently implemented on the Internet, (2) the languages themselves and their writing are based on paper methods that have moder and future counterparts. (3) the 2 Billion children in the world now (roughly under 20 years old) are not being taught how to understand and speak all sounds of all languages, but only their limited sounds of their own language, and those sounds are not encoded in the technology in a way that affords global and heliospheric efficiencies for a rapidly growing humans species taking its first steps into the larger universe, and barely learning to take care of all 8.1 Billion humans and related species now.
I would have to write the whole Internet in this tiny box am Twitter(X) software technology and more importantly corporate policies and staff training – are not set up to encourage and enable truly “global open discussions”.
When I write things like that, for instance (“universal education”) with 1.35 Million responses from the query at Googles closed database — I mean to imply a unique ID that is the index of a concept database open to the whole Internet that has all languages.
Now as much as I dislike Google’s methods and its inadequacies in managing the Chromium project, I did see in their code a glimmer. If it were in an open database, not squirreled away deep in their spaghetti code.
[ Before I forget, I keep a copy of most of my comments online, 2/3 are private. But this will be posted publicly as
( Vinyl records as an example of beginning a “global open resource”) at /?p=14677 ]
Now the version of chromium-main I am using is from a few days ago. I try to keep a provenance record to get back to the unique version of the code. But this snapshot hs 416,762 files in it of many hundreds of types, most of them undocumented (and undocument-able in this form).
Of those there are 5013 files that contain 2,328,271 places where the exact string (<translation id=) is used.
Here is what that looks like. Those unique ids can be tied to unique strings in all languages (not yet, but technically yes). So there can be a “translation id” of “global open”, for “vinyl record”, for “global climate change”, for (“covid” OR “coronavirus” OR “corona virus”), for “heliospheric exploration”, for ( “faster than light” (“communication” OR “engineering” OR “vehicles”), for (“lithium batteries”) or any real term.
My most bitter complaint about the LLMs (“large language models” can have a translation id) is they are NOT using global open tokens. They are using arbitrary byte combinations from the raw files, when they should be coding all that knowledge to global open form so ALL languages, all groups, all humans and all AIs can access and refer to the same thing. “girl boy dog cat house man woman bird water food eye hand” – there are common references in real word. If an open organization (a way of coordinating between groups globally and heliospherically) that can be implemented in software and hardware — would start with a few thousand (Google probably has it, but won’t share with the world, and ABSOLUTELY CANNOT BE TRUSTED to administer such a thing as a for-profit and narrowly controlled group)
A universal, “global open” to start, list of strings in all languages is a start. I would like to see (“Meter”, “Farad”, “Meters/Second”, “Meters/Second^2, “Teslas”, “Joules”, “Watts”, “nano”, “pico”, “femto” and all the other Standard Internet (to implement SI).
I am tired. I am exhausted. My eyes hurt, my body hurts. 26 years every day, long hours is barely enough to let me get a sketch, and I started many of these ideas 60 years ago.
I am not sure what you want to do. I think, if it is a “good” idea, it is something that will happen any way. If a problem like (“plastic in the ocean” OR (“plastic” oceans”) ) comes up, then that can be mapped entirely. If the Internet information is shared, not simply mined by Google or LLM groups, for their own purposes, then hoarded.
Richard Collins, The Internet Foundation
Auguste Laurent would hate you if he saw you using ALL CAPITALS. “Auguste Laurent Society” is not SHOUTING.
That is the way I learned in the BBS, gopher, ftp and early days of long distance data sharing.
I think I wrote about what I did in high school. I had been designing “infinite neural nets” and using “random neural nets” as examples to work from, since then I had not seen a computer. I contacted my local companies – Owens Corning, IBM, State Farm and a few others and asked to be shown how they were using computers and where they were going with it. The people were very nice and explained it clearly.
My point is that corporations WILL help young people, they just do not have very good methods when you try to do that today.
And, State Farm was already using large storage devices and long distance telephone and data technology of that day to tie its offices and groups together.
That was when I was going to Newark Senior High in Newark Ohio 1964-1967. In 60 years things have not changed a lot, the methods change but the purposes and directions are stable. Still pointing to a global world where all humans are able to share all knowledge, and where that can extend for millions of years into the Universe (all the data, not just a big bang model worth).
(“big bang”) has 125 Million entry points. It is big, it has adherents and a presence in the world. But it is NOT everything.
(“a” OR “the”) has 25.27 Billion entry points
I wish Google operated as an open database then you could count things and take random samples. Then “fair sampling” would be possible, and an open database would have high standards so all countries could contribute and use it. NOT hoarded by the governments which are only a tiny part of the world’s humans.
Richard Collins, The Internet Foundation