Categories
Misc

Upgrading to tf2 modifies all my .py files

I’m upgrading my tensorflow version and using their
tf_upgrade_v2 script. I don’t run into any issues but all my python
files register as “modified” in git, even when there are no
changes? I poked around google and some people mention that my file
permissions may be changing, so I set the core.filemode to false in
my .git/config and then retry to to the upgrade, but I am still
seeing file changes. I diffed the files and I see zero changes. I
believe it could be the EOL, but I tried setting core.autcrlf to
false as well and that still gives me all these files as modified.
Has anyone encountered this? Running ubuntu 20.04.1.

submitted by /u/Woodhouse_20

[visit reddit]

[comments]

Categories
Misc

Can anybody help me with running posenet python version of posenet tensorflowjs by rwightman?

Git link https://github.com/ArimaValanImmanuel/posenet-python

submitted by /u/Section_Disastrous

[visit reddit]

[comments]

Categories
Offsites

Improving Indian Language Transliterations in Google Maps

Nearly 75% of India’s population — which possesses the second highest number of internet users in the world — interacts with the web primarily using Indian languages, rather than English. Over the next five years, that number is expected to rise to 90%. In order to make Google Maps as accessible as possible to the next billion users, it must allow people to use it in their preferred language, enabling them to explore anywhere in the world.

However, the names of most Indian places of interest (POIs) in Google Maps are not generally available in the native scripts of the languages of India. These names are often in English and may be combined with acronyms based on the Latin script, as well as Indian language words and names. Addressing such mixed-language representations requires a transliteration system that maps characters from one script to another, based on the source and target languages, while accounting for the phonetic properties of the words as well.

For example, consider a user in Ahmedabad, Gujarat, who is looking for a nearby hospital, KD Hospital. They issue the search query, કેડી હોસ્પિટલ, in the native script of Gujarati, the 6th most widely spoken language in India. Here, કેડી (“kay-dee”) is the sounding out of the acronym KD, and હોસ્પિટલ is “hospital”. In this search, Google Maps knows to look for hospitals, but it doesn’t understand that કેડી is KD, hence it finds another hospital, CIMS. As a consequence of the relative sparsity of names available in the Gujarati script for places of interest (POIs) in India, instead of their desired result, the user is shown a result that is further away.

To address this challenge, we have built an ensemble of learned models to transliterate names of Latin script POIs into 10 languages prominent in India: Hindi, Bangla, Marathi, Telugu, Tamil, Gujarati, Kannada, Malayalam, Punjabi, and Odia. Using this ensemble, we have added names in these languages to millions of POIs in India, increasing the coverage nearly twenty-fold in some languages. This will immediately benefit millions of existing Indian users who don’t speak English, enabling them to find doctors, hospitals, grocery stores, banks, bus stops, train stations and other essential services in their own language.

Transliteration vs. Transcription vs. Translation
Our goal was to design a system that will transliterate from a reference Latin script name into the scripts and orthographies native to the above-mentioned languages. For example, the Devanagari script is the native script for both Hindi and Marathi (the language native to Nagpur, Maharashtra). Transliterating the Latin script names for NIT Garden and Chandramani Garden, both POIs in Nagpur, results in एनआईटी गार्डन and चंद्रमणी गार्डन, respectively, depending on the specific language’s orthography in that script.

It is important to note that the transliterated POI names are not translations. Transliteration is only concerned with writing the same words in a different script, much like an English language newspaper might choose to write the name Горбачёв from the Cyrillic script as “Gorbachev” for their readers who do not read the Cyrillic script. For example, the second word in both of the transliterated POI names above is still pronounced “garden”, and the second word of the Gujarati example earlier is still “hospital” — they remain the English words “garden” and “hospital”, just written in the other script. Indeed, common English words are frequently used in POI names in India, even when written in the native script. How the name is written in these scripts is largely driven by its pronunciation; so एनआईटी from the acronym NIT is pronounced “en-aye-tee”, not as the English word “nit”. Knowing that NIT is a common acronym from the region is one piece of evidence that can be used when deriving the correct transliteration.

Note also that, while we use the term transliteration, following convention in the NLP community for mapping directly between writing systems, romanization in South Asian languages regardless of the script is generally pronunciation-driven, and hence one could call these methods transcription rather than transliteration. The task remains, however, mapping between scripts, since pronunciation is only relatively coarsely captured in the Latin script for these languages, and there remain many script-specific correspondences that must be accounted for. This, coupled with the lack of standard spelling in the Latin script and the resulting variability, is what makes the task challenging.

Transliteration Ensemble
We use an ensemble of models to automatically transliterate from the reference Latin script name (such as NIT Garden or Chandramani Garden) into the scripts and orthographies native to the above-mentioned languages. Candidate transliterations are derived from a pair of sequence-to-sequence (seq2seq) models. One is a finite-state model for general text transliteration, trained in a manner similar to models used by Gboard on-device for transliteration keyboards. The other is a neural long short-term memory (LSTM) model trained, in part, on the publicly released Dakshina dataset. This dataset contains Latin and native script data drawn from Wikipedia in 12 South Asian languages, including all but one of the languages mentioned above, and permits training and evaluation of various transliteration methods. Because the two models have such different characteristics, together they produce a greater variety of transliteration candidates.

To deal with the tricky phenomena of acronyms (such as the “NIT” and “KD” examples above), we developed a specialized transliteration module that generates additional candidate transliterations for these cases.

For each native language script, the ensemble makes use of specialized romanization dictionaries of varying provenance that are tailored for place names, proper names, or common words. Examples of such romanization dictionaries are found in the Dakshina dataset.

Scoring in the Ensemble
The ensemble combines scores for the possible transliterations in a weighted mixture, the parameters of which are tuned specifically for POI name accuracy using small targeted development sets for such names.

For each native script token in candidate transliterations, the ensemble also weights the result according to its frequency in a very large sample of on-line text. Additional candidate scoring is based on a deterministic romanization approach derived from the ISO 15919 romanization standard, which maps each native script token to a unique Latin script string. This string allows the ensemble to track certain key correspondences when compared to the original Latin script token being transliterated, even though the ISO-derived mapping itself does not always perfectly correspond to how the given native script word is typically written in the Latin script.

In aggregate, these many moving parts provide substantially higher quality transliterations than possible for any of the individual methods alone.

Coverage
The following table provides the per-language quality and coverage improvements due to the ensemble over existing automatic transliterations of POI names. The coverage improvement measures the increase in items for which an automatic transliteration has been made available. Quality improvement measures the ratio of updated transliterations that were judged to be improvements versus those that were judged to be inferior to existing automatic transliterations.

  Coverage Quality
Language   Improvement    Improvement
Hindi 3.2x 1.8x
Bengali 19x 3.3x
Marathi 19x 2.9x
Telugu 3.9x 2.6x
Tamil 19x 3.6x
Gujarati 19x 2.5x
Kannada 24x 2.3x
Malayalam 24x 1.7x
Odia 960x *
Punjabi 24x *
* Unknown / No Baseline.

Conclusion
As with any machine learned system, the resulting automatic transliterations may contain a few errors or infelicities, but the large increase in coverage in these widely spoken languages marks a substantial expansion of the accessibility of information within Google Maps in India. Future work will include using the ensemble for transliteration of other classes of entities within Maps and its extension to other languages and scripts, including Perso-Arabic scripts, which are also commonly used in the region.

Acknowledgments
This work was a collaboration between the authors and Jacob Farner, Jonathan Herbert, Anna Katanova, Andre Lebedev, Chris Miles, Brian Roark, Anurag Sharma, Kevin Wang, Andy Wildenberg, and many others.

Categories
Misc

I published part 1 of a tutorial that shows how to transform vanilla autoencoders into variational autoencoders

Autoencoders have a number of limitations for generative tasks.
That’s why they need a power-up to become Variational
Autoencoders. In my new video, I explain the first step to
transform an autoencoder into a VAE. Specifically, I discuss how
VAEs use multivariate normal distributions to encode input data
into a latent space and why this is awesome for generative tasks.
Don’t worry – I also explain what multivariate normal
distributions are!

This video is part of a series called “Generating Sound with
Neural Networks”. In this series, you’ll learn how to generate
sound from audio files and spectrograms 🎧 🎧 using Variational
Autoencoders 🤖 🤖

Here’s the video:


https://www.youtube.com/watch?v=b8AzCgY1gZI&list=PL-wATfeyAMNpEyENTc-tVH5tfLGKtSWPp&index=9

submitted by /u/diabulusInMusica

[visit reddit]

[comments]

Categories
Misc

How can I convert a TensorFlow Dataset into a Pandas DataFrame?

I have tf dataset with images and labels and want to convert it
to a Pandas DataFrame, since that’s the object required in an
AzureML pipeline designer.

I’m a beginner working with tensorflow and after googling for a
couple of hours I haven’t found anything.

I’d appreciate any tips on how to do this.

submitted by /u/juliansorel

[visit reddit]

[comments]

Categories
Misc

What does the shape of a spectrogram really mean?


What does the shape of a spectrogram really mean?
submitted by /u/Metecko

[visit reddit]

[comments]
Categories
Misc

Can’t use tensorflow 2 (I need tensorflow 2cant use 1) because of no protobuf version working for it.

If I have any other version of protobuf except for 3.6.0 I will
get “ImportError: DLL load failed: The specified procedure could
not be found” but if I use protobuf 3.6.0 I get
“AttributeError:
‘google.protobuf.pyext._message.RepeatedCompositeCo’ object has
no attribute ‘append’” this error occurs when I try to build
the model.

I have tried every 2.x version of tensorflow have reinstalled
python 3.6 I have made sure my path variables are correct. I can
find no useful information on the internet. I have tried countless
versions of protobuf. Please help! I have no clue what the hell is
going on.

Maybe upgrade python 3.6 to 3.7? as I have previously had
tensorflow 2.x working on python 3.7 but I don’t know.

submitted by /u/FunnyForWrongReason

[visit reddit]

[comments]

Categories
Misc

Fastest way to develop a custom translation model with RNN?

I’m a Python web developer, so I have some professional coding
experience, but I’m a complete novice when it comes to machine
learning.

In short, I have a dataset (csv form) with 65,000 sentences in
two languages. One of the languages is real, the other is not. I’d
live to quickly dive into an RNN example online so that I can train
a model based on this dataset, but all of the examples seem to
prefer that I use existing, binary datasets (that I can’t
read).

My laptop is relatively old, and processing a dataset properly
can take a week, so every example I’ve attempted to adapt to my
needs has cost lots of time and lots of heartache when I discover
that I can’t use it.

Is there an RNN translation tutorial that anyone would recommend
for the purpose of translating between an existing corpus and a
constructed language? I can do research on any terms listed below,
but the topic of machine learning has so regularly stumped me that,
even though I know easy examples for what I want to do probably
already exist, I don’t even know where to start.

Thank you for your time!

submitted by /u/ehowardhill

[visit reddit]

[comments]

Categories
Misc

Any tutorials that you can recommend?

So I understood the attention mechanism ( Bahdanau Attention,
2017 paper) and I was looking for the implementation of the paper
and then I landed on the tensorflow website which has a tutorial on
attention mechanism. Nut Frankly speaking, I found it very hard to
understand the code. Are there any tutorials that you can share
that will help me to understand the code of the attention
mechanism.

submitted by /u/Consistent_Ad767

[visit reddit]

[comments]

Categories
Offsites

RxR: A Multilingual Benchmark for Navigation Instruction Following

A core challenge in machine learning (ML) is to build agents that can navigate complex human environments in response to spoken or written commands. While today’s agents, including robots, can often navigate complicated environments, they cannot yet understand navigation goals expressed in natural language, such as, “Go past the brown double doors that are closed to your right and stand behind the chair at the head of the table.”

This challenge, referred to as vision-and-language navigation (VLN), demands a sophisticated understanding of spatial language. For example, the ability to identify the position “behind the chair at the head of the table requires finding the table, identifying which part of the table is considered to be the “head”, finding the chair closest to the head, identifying the area behind this chair and so on. While people can follow these instructions easily, these challenges cannot be easily solved with current ML-based methods, requiring systems that can better connect language to the physical world it describes.

To help spur progress in this area, we are excited to introduce Room-Across-Room (RxR), a new dataset for VLN. Described in “Room-Across-Room: Multilingual Vision-and-Language Navigation with Dense Spatiotemporal Grounding”, RxR is the first multilingual dataset for VLN, containing 126,069 human-annotated navigation instructions in three typologically diverse languages — English, Hindi and Telugu. Each instruction describes a path through a photorealistic simulator populated with indoor environments from the Matterport3D dataset, which includes 3D captures of homes, offices and public buildings. To track progress on VLN, we are also announcing the RxR Challenge, a competition that encourages the machine learning community to train and evaluate their own instruction following agents on RxR instructions.

Language Instruction
en-US Starting next to the long dining room table, turn so the table is to your right. Walk towards the glass double doors. When you reach the mat before the doors, turn immediately left and walk down the stairs. When you reach the bottom of the stairs, walk through the open doors to your left and continue through the art exhibit with the tub to your right hand side. Down the length of the table until you reach the small step at the end of the room before you reach the tub and stop.
   
hi-IN अभी हमारे बायीं ओर एक बड़ा मेज़ है कुछ कुर्सियाँ हैं और कुछ दीपक मेज़ के ऊपर रखे हैं। उलटी दिशा में घूम जाएँ और सिधा चलें। अभी हमारे दायीं ओर एक गोल मेज़ है वहां से सीधा बढ़ें और सामने एक शीशे का बंद दरवाज़ा है उससे पहले बायीं ओर एक सीढ़ी है उससे निचे उतरें। निचे उतरने के बाद दायीं ओर मुड़े और एक भूरे रंग के दरवाज़े से अंदर प्रवेश करें और सीधा चलें। अभी हमारे दायीं ओर एक बड़ा मेज़ है और दो कुर्सियां राखी हैं सीधा आगे बढ़ें। हमारे सामने एक पानी का कल है और सामने तीन कुर्सियां दिवार के पास रखी हैं यहीं पर ठहर जाएँ।
   
te-IN ఉన్న చోటు నుండి వెనకకు తిరిగి, నేరుగా వెళ్తే, మీ ముందర ఒక బల్ల ఉంటుంది. దాన్ని దాటుకొని ఎడమవైపుకి తిరిగితే, మీ ముందర మెట్లు ఉంటాయి. వాటిని పూర్తిగా దిగండి. ఇప్పుడు మీ ముందర రెండు తెరిచిన ద్వారాలు ఉంటాయి. ఎడమవైపు ఉన్న ద్వారం గుండా బయటకు వెళ్ళి, నేరుగా నడవండి. ఇప్పుడు మీ కుడివైపున పొడవైన బల్ల ఉంటుంది. దాన్ని దాటుకొని ముందరే ఉన్న మెట్ల వద్దకు వెళ్ళి ఆగండి.

Examples of English, Hindi and Telugu navigation instructions from the RxR dataset. Each navigation instruction describes the same path.

Pose Traces
In addition to navigation instructions and paths, RxR also includes a new, more detailed multimodal annotation called a pose trace. Inspired by the mouse traces captured in the Localized Narratives dataset, pose traces provide dense groundings between language, vision and movement in a rich 3D setting. To generate navigation instructions, we ask guide annotators to move along a path in the simulator while narrating the path based on the surroundings. The pose trace is a record of everything the guide sees along the path, time-aligned with the words in the navigation instructions. These traces are then paired with pose traces from follower annotators, who are tasked with following the intended path by listening to the guide’s audio, thereby validating the quality of the navigation instructions. Pose traces implicitly capture notions of landmark selection and visual saliency, and represent a play-by-play account of how to solve the navigation instruction generation task (for guides) and the navigation instruction following task (for followers).

Example English navigation instruction in the RxR dataset. Words in the instruction text (right) are color-coded to align with the pose trace (left) that illustrates the movements and visual percepts of the guide annotator as they move through the environment describing the path.
The same RxR example with words in the navigation instruction aligned to 360° images along the path. The parts of the scene the guide annotator observed are highlighted; parts of the scene ignored by the annotator are faded. Red and yellow boxes highlight some of the close alignments between the textual instructions and the annotator’s visual cues. The red cross indicates the next direction the annotator moved.

Scale
In total, RxR contains almost 10 million words, making it around 10 times larger than existing datasets, such as R2R and Touchdown/Retouchdown. This is important because, in comparison to tasks based on static image and text data, language tasks that require learning through movement or interaction with an environment typically suffer from a lack of large-scale training data. RxR also addresses known biases in the construction of the paths that have arisen in other datasets, such as R2R in which all paths have similar lengths and take the shortest route to the goal. In contrast, the paths in RxR are on average longer and less predictable, making them more challenging to follow and encouraging models trained on the dataset to place greater emphasis on the role of language in the task. The size, scope and detail of RxR will expand the frontier for research on grounded language learning while reducing the dominance of high resource languages such as English.

Left: RxR is an order of magnitude larger than similar existing datasets. Right: Compared to R2R, the paths in RxR are typically longer and less predictable, making them more challenging to follow.

Baselines
To better characterize and understand the RxR dataset, we trained a variety of agents on RxR using our open source framework VALAN, and language representations from the multilingual BERT model. We found that results were improved by including follower annotations as well as guide annotations during training, and that independently trained monolingual agents outperformed a single multilingual agent.

Conceptually, evaluation of these agents is straightforward — did the agent follow the intended path? Empirically, we measure the similarity between the path taken by the VLN agent and the reference path using NDTW, a normalized measure of path fidelity that ranges between 100 (perfect correspondence) and 0 (completely wrong). The average score for the follower annotators across all three languages is 79.5, due to natural variation between similar paths. In contrast, the best model (a composite of three independently trained monolingual agents, one for each language) achieved an NDTW score on the RxR test set of 41.5. While this is much better than random (15.4), it remains far below human performance. Although advances in language modeling continue to rapidly erode the headroom for improvement in text-only language understanding benchmarks such as GLUE and SuperGLUE, benchmarks like RxR that connect language to the physical world offer substantial room for improvement.

Results for our multilingual and monolingual instruction following agents on the RxR test-standard split. While performance is much better than a random walk, there remains considerable headroom to reach human performance on this task.

Competition
To encourage further research in this area, we are launching the RxR Challenge, an ongoing competition for the machine learning community to develop computational agents that can follow natural language navigation instructions. To take part, participants upload the navigation paths taken by their agent in response to the provided RxR test instructions. In the most difficult setting (reported here and in the paper), all the test environments are previously unseen. However, we also allow for settings in which the agent is either trained in or explores the test environments in advance. For more details and the latest results please visit the challenge website.

PanGEA
We are also releasing the custom web-based annotation tool that we developed to collect the RxR dataset. The Panoramic Graph Environment Annotation toolkit (PanGEA), is a lightweight and customizable codebase for collecting speech and text annotations in panoramic graph environments, such as Matterport3D and StreetLearn. It includes speech recording and virtual pose tracking, as well as tooling to align the resulting pose trace with a manual transcript. For more details please visit the PanGEA github page.

Acknowledgements
The authors would like to thank Roma Patel, Eugene Ie and Jason Baldridge for their contributions to this research. We would also like to thank all the annotators, Sneha Kudugunta for analyzing the Telugu annotations, and Igor Karpov, Ashwin Kakarla and Christina Liu for their tooling and annotation support for this project, Austin Waters and Su Wang for help with image features, and Daphne Luong for executive support for the data collection.