Comment on String Data Structures Question – Purpose and Meaning
Strings are just one example of the larger problem of sequences and patterns (any dimensions). This morning, I am just surveying the Internet for rock types and measurements used in geochemistry. There are many methods and representations. Does the student like to explore, to gather, to find general patterns, to share what they have found, to drill? What is their personality and goals? People almost seem to be born to solve certain kinds of problems.
Sequences of fields in records are also a type of string. Just the pieces are larger. Websites (at the level I work at every day) have to be tokenized and classified, broken into types of things — when you look at a few thousand of something you use some methods. When you look at trillions of things you use different methods. When you are working with Avagadro’s number of Avagadro’s – different methods still.
I have been studying the Internet every day for the last 23 years. Usually trying to solve global problems (problems whose solution requires global cooperation of hundreds of thousands to billions of people). If that student likes helping people there are no end of life paths they can follow, and enjoy the process. Have them pick some problems of the type they might want to spend the rest of their life working with. What kind of people do they like to work with? Have them pick something where they find “all the people doing ___”.
Now “computer strings” are not “quantum strings”. And the suggestion about “natural language strings” is a good one. Perhaps they might want to simply find “strings” (257 Million on Google just now) and classify all uses of that “string”.
That includes “strings” “parsing” (5.59 Million entry points) which is small enough to completely locate and classify, organize and find all the people, applications (real world, not software), data, methods, economic and social impacts.
Now “strings” “quantum” (5.44 Million entry points) includes “string theory”. But that is just because “sheets”, “branes”, “webs”, “channels”, “flows” and other things have mechanical properties with similar mathematics that are easy.
If they are socially responsible, they might want to try something larger, but it needs immediate effort to be of practical use. (“covid” OR “corona virus” OR “coronavirus”) is showing 16.92 Billion entry points just now. It was sitting at 7.5 Billion for most of the last 6 months. Most of those entries are duplicates and variations. It is a string classification problem. Find all of them, organize them, classify them to find patterns and ways that humans can better understand what is going on. Then find all the people responsible for each entry. Consolidate the best, show the variations, encourage all the owners, authors, publishers, sites to share — and simplify the whole – without losing anything. It is possible. I call that kind of problem, “not hard, just tedious”. Much of “big data” and “machine learning” is patience and perseverance and empathy for the people you meet. We still don’t let computers free to form their own ideas and solutions, to live their own lives. I call that “intelligent algorithms” – a deliberate flip on AI to emphasize that intelligence, caring and purpose come first, then you make algorithms to help.
I somewhat hate that I seldom meet people. Work with a few dozen people and you form friendships and life-long connections. Work with a few billion people and there are always more being born.
Good luck to your student. Just ask them to pick something that will change the world and let them figure it out.
Richard Collins, Director, The Internet Foundation