![]() |
submitted by /u/nbortolotti [visit reddit] [comments] |

![]() |
submitted by /u/nbortolotti [visit reddit] [comments] |
From real-time ray tracing, to streaming from the cloud, find out more about the breakthroughs that are helping organizations across industries enhance their XR workflows.
NVIDIA technology has powered some of the most stunning extended reality experiences across all industries. This year at GTC, several sessions showcased how the latest advancements are driving the future of XR — and all of these sessions are now available on NVIDIA On-Demand.
From real-time ray tracing, to streaming from the cloud, find out more about the breakthroughs that are helping organizations across industries enhance their XR workflows.
Check out some of the most popular XR sessions you might have missed at GTC ’21 (note: some sessions may require a free NVIDIA Developer Program membership).
Autodesk VRED with NVIDIA CloudXR and Varjo XR3: Unparalleled XR Quality and Data Complexity
See a beautifully detailed car presented by Autodesk VRED, and learn how the car was streamed to a mobile device using NVIDIA CloudXR. This was shown using a single NVIDIA RTX 8000 GPU and a Varjo XR-3 headset.
Look, Mum, No Computer! How the Cloud can Revolutionize VR Experiences (Presented by Google Cloud)
Learn how to connect a VR headset to an instance running in Google Cloud using NVIDIA CloudXR. Explore the advantages and limitations of this solution, and see a number of use cases to test performance.
Making 3D Content Creation Fast and Easy for Creatives by using VR and ML
Check out how NVIDIA CloudXR helps creatives use high-performance software on a lightweight mobile headset. This marks a major milestone on the path to democratizing 3D content creation.
The Technology Empowering Lucid Motors’ Luxury Automotive Purchase Experience
Get an inside look at how Lucid Motors partnered with ZeroLight. The two companies launched a cloud-powered purchase journey for customers interested in exploring, customizing, or buying the new Lucid Air pure-electric luxury vehicle. Learn how this experience reflects the need to provide a digital shopping experience — linking the virtual and physical worlds.
NVIDIA CloudXR and XR Streaming 101
Explore the various streaming approaches and strategies being developed, and dive into the pros and cons vis-à-vis different devices and use cases. Experience the NVIDIA CloudXR SDK, and learn more about how it works and how you can use it.
Collaborative Virtual Workspaces: Pop-Up XR Experiences with NVIDIA CloudXR
See how flexible, no-setup XR experiences can support manufacturing use cases, such as virtual 3P Workshops (Production Preparation Process). Learn how XR can enhance exploring, validating, and confirming designed manufacturing processes against the physical factory layout situation and assets on location, as well as improve workflows for digital designed products, human-centric assembly processes and worker ergonomics.
NVIDIA CloudXR Client for iOS, Creating AR Applications using the CloudXR SDK
NVIDIA CloudXR continues to add additional client device support. With the NVIDIA CloudXR SDK Release 2.1, iOS devices can now use NVIDIA GPUs for advanced AR rendering, inferencing and real-time graphics. This session provided a step-by-step walkthrough of the NVIDIA CloudXR client build, deploying to a device, and testing with advanced real-time visualization tools.
Did you miss GTC? All of the AR/VR sessions are available at no charge on NVIDIA On-Demand.
Learning good visual and vision-language representations is critical to solving computer vision problems — image retrieval, image classification, video understanding — and can enable the development of tools and products that change people’s daily lives. For example, a good vision-language matching model can help users find the most relevant images given a text description or an image input and help tools such as Google Lens find more fine-grained information about an image.
To learn such representations, current state-of-the-art (SotA) visual and vision-language models rely heavily on curated training datasets that require expert knowledge and extensive labels. For vision applications, representations are mostly learned on large-scale datasets with explicit class labels, such as ImageNet, OpenImages, and JFT-300M. For vision-language applications, popular pre-training datasets, such as Conceptual Captions and Visual Genome Dense Captions, all require non-trivial data collection and cleaning steps, limiting the size of datasets and thus hindering the scale of the trained models. In contrast, natural language processing (NLP) models have achieved SotA performance on GLUE and SuperGLUE benchmarks by utilizing large-scale pre-training on raw text without human labels.
In “Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision“, to appear at ICML 2021, we propose bridging this gap with publicly available image alt-text data (written copy that appears in place of an image on a webpage if the image fails to load on a user’s screen) in order to train larger, state-of-the-art vision and vision-language models. To that end, we leverage a noisy dataset of over one billion image and alt-text pairs, obtained without expensive filtering or post-processing steps in the Conceptual Captions dataset. We show that the scale of our corpus can make up for noisy data and leads to SotA representation, and achieves strong performance when transferred to classification tasks such as ImageNet and VTAB. The aligned visual and language representations also set new SotA results on Flickr30K and MS-COCO benchmarks, even when compared with more sophisticated cross-attention models. The representations also enable zero-shot image classification and cross-modality search with complex text and text + image queries.
Creating the Dataset
Alt-texts usually provide a description of what the image is about, but the dataset is “noisy” because some text may be partly or wholly unrelated to its paired image.
![]() |
Example image-text pairs randomly sampled from the training dataset of ALIGN. One clearly noisy text label is marked in italics. |
In this work, we follow the methodology of constructing the Conceptual Captions dataset to get a version of raw English alt-text data (image and alt-text pairs). While the Conceptual Captions dataset was cleaned by heavy filtering and post-processing, this work scales up visual and vision-language representation learning by relaxing most of the cleaning steps in the original work. Instead, we only apply minimal frequency-based filtering. The result is a much larger but noisier dataset of 1.8B image-text pairs.
ALIGN: A Large-scale ImaGe and Noisy-Text Embedding
For the purpose of building larger and more powerful models easily, we employ a simple dual-encoder architecture that learns to align visual and language representations of the image and text pairs. Image and text encoders are learned via a contrastive loss (formulated as normalized softmax) that pushes the embeddings of matched image-text pairs together while pushing those of non-matched image-text pairs (within the same batch) apart. The large-scale dataset makes it possible for us to scale up the model size to be as large as EfficientNet-L2 (image encoder) and BERT-large (text encoder) trained from scratch. The learned representation can be used for downstream visual and vision-language tasks.
![]() |
Figure of ImageNet credit to (Krizhevsky et al. 2012) and VTAB figure credit to (Zhai et al. 2019) |
The resulting representation can be used for vision-only or vision-language task transfer. Without any fine-tuning, ALIGN powers cross-modal search – image-to-text search, text-to-image search, and even search with joint image+text queries, examples below.
![]() |
Evaluating Retrieval and Representation
The learned ALIGN model with BERT-Large and EfficientNet-L2 as text and image encoder backbones achieves SotA performance on multiple image-text retrieval tasks (Flickr30K and MS-COCO) in both zero-shot and fine-tuned settings, as shown below.
Flickr30K (1K test set) R@1 | MS-COCO (5K test set) R@1 | ||||
Setting | Model | image → text | text → image | image → text | text → image |
Zero-shot | ImageBERT | 70.7 | 54.3 | 44.0 | 32.3 |
UNITER | 83.6 | 68.7 | – | – | |
CLIP | 88.0 | 68.7 | 58.4 | 37.8 | |
ALIGN | 88.6 | 75.7 | 58.6 | 45.6 | |
Fine-tuned | GPO | 88.7 | 76.1 | 68.1 | 52.7 |
UNITER | 87.3 | 75.6 | 65.7 | 52.9 | |
ERNIE-ViL | 88.1 | 76.7 | – | – | |
VILLA | 87.9 | 76.3 | – | – | |
Oscar | – | – | 73.5 | 57.5 | |
ALIGN | 95.3 | 84.9 | 77.0 | 59.9 |
Image-text retrieval results (recall@1) on Flickr30K and MS-COCO datasets (both zero-shot and fine-tuned). ALIGN significantly outperforms existing methods including the cross-modality attention models that are too expensive for large-scale retrieval applications. |
ALIGN is also a strong image representation model. Shown below, with frozen features, ALIGN slightly outperforms CLIP and achieves a SotA result of 85.5% top-1 accuracy on ImageNet. With fine-tuning, ALIGN achieves higher accuracy than most generalist models, such as BiT and ViT, and is only worse than Meta Pseudo Labels, which requires deeper interaction between ImageNet training and large-scale unlabeled data.
Model (backbone) | Acc@1 w/ frozen features | Acc@1 | Acc@5 |
WSL (ResNeXt-101 32x48d) | 83.6 | 85.4 | 97.6 |
CLIP (ViT-L/14) | 85.4 | – | – |
BiT (ResNet152 x 4) | – | 87.54 | 98.46 |
NoisyStudent (EfficientNet-L2) | – | 88.4 | 98.7 |
ViT (ViT-H/14) | – | 88.55 | – |
Meta-Pseudo-Labels (EfficientNet-L2) | – | 90.2 | 98.8 |
ALIGN (EfficientNet-L2) | 85.5 | 88.64 | 98.67 |
ImageNet classification results comparison with supervised training (fine-tuning). |
Zero-Shot Image Classification
Traditionally, image classification problems treat each class as independent IDs, and people have to train the classification layers with at least a few shots of labeled data per class. The class names are actually also natural language phrases, so we can naturally extend the image-text retrieval capability of ALIGN for image classification without any training data.
On the ImageNet validation dataset, ALIGN achieves 76.4% top-1 zero-shot accuracy and shows great robustness in different variants of ImageNet with distribution shifts, similar to the concurrent work CLIP. We also use the same text prompt engineering and ensembling as in CLIP.
ImageNet | ImageNet-R | ImageNet-A | ImageNet-V2 | |
CLIP | 76.2 | 88.9 | 77.2 | 70.1 |
ALIGN | 76.4 | 92.2 | 75.8 | 70.1 |
Top-1 accuracy of zero-shot classification on ImageNet and its variants. |
Application in Image Search
To illustrate the quantitative results above, we build a simple image retrieval system with the embeddings trained by ALIGN and show the top 1 text-to-image retrieval results for a handful of text queries from a 160M image pool. ALIGN can retrieve precise images given detailed descriptions of a scene, or fine-grained or instance-level concepts like landmarks and artworks. These examples demonstrate that the ALIGN model can align images and texts with similar semantics, and that ALIGN can generalize to novel complex concepts.
![]() |
Image retrieval with fine-grained text queries using ALIGN’s embeddings. |
Multimodal (Image+Text) Query for Image Search
A surprising property of word vectors is that word analogies can often be solved with vector arithmetic. A common example, “king – man + woman = queen”. Such linear relationships between image and text embeddings also emerge in ALIGN.
Specifically, given a query image and a text string, we add their ALIGN embeddings together and use it to retrieve relevant images using cosine similarity, as shown below. These examples not only demonstrate the compositionality of ALIGN embeddings across vision and language domains, but also show the feasibility of searching with a multi-modal query. For instance, one could now look for the “Australia” or “Madagascar” equivalence of pandas, or turn a pair of black shoes into identically-looking beige shoes. Also, it is possible to remove objects/attributes from a scene by performing subtraction in the embedding space, shown below.
![]() |
Image retrieval with image text queries. By adding or subtracting text query embedding, ALIGN retrieves relevant images. |
Social Impact and Future Work
While this work shows promising results from a methodology perspective with a simple data collection method, additional analysis of the data and the resulting model is necessary before the responsible use of the model in practice. For instance, considerations should be made towards the potential for the use of harmful text data in alt-texts to reinforce such harms. With regard to fairness, data balancing efforts may be required to prevent reinforcing stereotypes from the web data. Additional testing and training around sensitive religious or cultural items should be taken to understand and mitigate the impact from possibly mislabeled data.
Further analysis should also be taken to ensure that the demographic distribution of humans and related cultural items, such as clothing, food, and art, do not cause skewed model performance. Analysis and balancing would be required if such models will be used in production.
Conclusion
We have presented a simple method of leveraging large-scale noisy image-text data to scale up visual and vision-language representation learning. The resulting model, ALIGN, is capable of cross-modal retrieval and significantly outperforms SotA models. In visual-only downstream tasks, ALIGN is also comparable to or outperforms SotA models trained with large-scale labeled data.
Acknowledgement
We would like to thank our co-authors in Google Research: Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, Tom Duerig. This work was also done with invaluable help from other colleagues from Google. We would like to thank Jan Dlabal and Zhe Li for continuous support in training infrastructure, Simon Kornblith for building the zero-shot & robustness model evaluation on ImageNet variants, Xiaohua Zhai for help on conducting VTAB evaluation, Mingxing Tan and Max Moroz for suggestions on EfficientNet training, Aleksei Timofeev for the early idea of multimodal query retrieval, Aaron Michelony and Kaushal Patel for their early work on data generation, and Sergey Ioffe, Jason Baldridge and Krishna Srinivasan for the insightful feedback and discussion.
GeForce NOW ensures your favorite games are automatically up to date, avoiding game updates and patches. Simply login, click PLAY, and enjoy an optimal cloud gaming experience. Here’s an overview on how the service keeps your library game ready at all times. Updating Games for All GeForce NOW Members When a gamer downloads an update Read article >
The post Keeping Games up to Date in the Cloud appeared first on The Official NVIDIA Blog.
What is geospatial drive time? Geospatial analytics is an important part of real estate decisions for businesses, especially for retailers. There are many factors that go into deciding where to place a new store (demographics, competitors, traffic) and such a decision is often a significant investment. Retailers who understand these factors have an advantage over … Continued
Geospatial analytics is an important part of real estate decisions for businesses, especially for retailers. There are many factors that go into deciding where to place a new store (demographics, competitors, traffic) and such a decision is often a significant investment. Retailers who understand these factors have an advantage over their competitors and can thrive. In this blog post, we’ll explore how RAPIDS’ cuDF, cuGraph, cuSpatial, and Plotly Dash with NVIDIA GPUs can be used to solve these complex geospatial analytics problems interactively.
Let’s consider a retailer looking to pick the next location for a store, or in these times of a pandemic, delivery hub. After the retailer picks a candidate location, they consider a surrounding “drive-time trade area”, or more formally, isochrone. An isochrone is the resulting polygon if one starts at one geographic location and drives in all possible directions for a specified time.
Why are isochrones sometimes used instead of “as the crow flies” (i.e. a circle)? In Retail, time is one of the most important factors when going to a store or shipping out a delivery. A location might be 2 miles from a customer, but due to dense traffic, it might take me 10 minutes to get there. Instead, it might be easier to hop on the highway and drive 5 miles but instead arrive in 5 minutes.
Isochrones are also more robust to the differences between urban, suburban and rural areas. If one uses an “as the crow flies” methodology and specifies a 5 mile radius, one might be including too many customers in an urban area or excluding customers in a less dense, rural area.
Once a retailer has calculated the isochrone, they can combine it with other data like demographics or competitor datasets to generate insights about that serviceable area. How many customers live in that isochrone? What is the average household income? How many of my competitors are in the area? Is that area a “food desert” (limited access to supermarkets, general affordable resources) or a “food oasis”? Is this area highly trafficked? Answers to these questions are incredibly valuable when making a decision about a potentially multi-million dollar investment.
However, demographics, competitor, and traffic analytics datasets can be large, diverse, and difficult to crunch – even more so if complex operations like geospatial analytics are involved. Additionally, these algorithms can be challenging to scale across large geographic areas like states or countries and even harder to interact with in real time.
By combining the accelerated compute power of RAPIDS cuDF, cuGraph, and cuSpatial with the interactivity of a Plotly Dash visualization application, we are able to transform this complicated problem into an application with a simple user interface.
Let’s think about the algorithm. So how can one calculate a drive-time area? The general flow is as follows:
However, the idealized workflow above needs to be translated into real world functionality. Like all data visualization projects, it will take a few iterations to dial in.
While there are large and highly current datasets available for this type of information, they are generally not publicly available or are very costly. For the sake of simplicity and accessibility of this demo, we will use open-source data. For added accessibility, we will also scope the application to run on a single GPU with around 24-32GB of memory.
We started out by prototyping the workflow using a notebook to create a PoC workflow to compute isochrones. Despite the broad data science tooling required to tackle this problem (from graph to geospatial), the ability of the different RAPIDS’ libraries to work easily in conjunction with one another without moving out of GPU memory greatly speeds up development.
Our initial workflow used the Overpass API for OpenStreetMap to download the open-source street data in a graph format. However, a query for a 25 mi radius took approximately 4-5 minutes to complete. For the end goal of making an interactive visualization application, this was way too long
The next step was to then pre-download the entire United States OpenStreetMap data, optimize it, and save it as a parquet file for quick loading by cuDF. With the isochrones workflow producing good results, we then moved on to adding demographic data.
We conveniently had formatted 2010 census data from a previous Plotly Dash Census Visualization available. All together, the datasets were as follows:
We then prototyped the full workflow with the census data in another notebook. With such a large, combined dataset, computational spikes often resulted in OOM (out of memory) errors or in certain conditions took longer than ~ 10 seconds to complete. We want to enable a user to click on ANY point in the continental US and quickly get a response back. Yet, after that initial click interaction, only a small fraction of the total data set is needed. To increase speed and reduce memory spikes we had to encode some boundary assumptions:
After adding in these conditions, we were confident that we could port the notebook into a full Plotly Dash visualization dashboard.
Why build an interactive dashboard? If we can already do the compute in a notebook, why should it be necessary to go through the effort of building a visualization application and optimize the compute to make it more interactive? Not everyone is familiar with using notebooks, but most can use a dashboard (especially those designed for ease of use). Furthermore, an interface that reduces the mental overhead and friction associated with doing what we actually want, asking questions of our data, encourages more experimentation and exploration. This leads to better, higher quality answers.
Starting out with a quick sketch mock-up (highly recommended for any visualization project) we went through several iterations of the dashboard to further optimize memory usage, reduce interaction times, and simplify the UI.
We found that by taking advantage of Dask for segmented data loading, we could drastically optimize loading times. Further optimization of cuSpatia’s PiP (point in polygon) and cuDF groupBy further reduced the compute times to around ~2-3 seconds per query in a sparsely populated area, and ~5-8 seconds in a densely populated area like New York City.
We also experimented with using quad-tree based point in polygon calculations for a 25-30% reduction in compute time, but because the current PiP only takes ~0.4 – 0.5 seconds in our optimized data format, the added memory overhead was not worth the speed up in this case.
For our final implementation, running a chain of operations to filter a radius of road data from the entire continental US to a selected point, load that data, find the nearest point on the road graph, compute the shortest path for the selected drive-time distance, create a polygon, and filter 300 million+ individual’s demographic data typically takes just 3-8 seconds!
Being able to click on any point in the continental US and, within seconds, compute both the drive time radius and demographic information within is impressive. Although this is only using limited open-source data for a generalized business use case, adapting this demo into a fully functional application would only require more precise data and a few minor optimizations.
While the performance of using GPUs is noteworthy, equally important is the speedup from using RAPIDS libraries together. Their ability to seamlessly provide end-to-end acceleration from notebook prototype to stand-alone visualization application enables interactive data science at the speed of thought.
Brings RTX Real-Time Ray Tracing and AI-Based DLSS to Tens of Millions More Gamers and Creators with $799 Portable PowerhousesSANTA CLARA, Calif., May 11, 2021 (GLOBE NEWSWIRE) — NVIDIA today …
New NVIDIA Studio laptops from Dell, HP, Lenovo, Gigabyte, MSI and Razer were announced today as part of the record-breaking GeForce laptop launch. The new Studio laptops are powered by GeForce RTX 30 Series and NVIDIA RTX professional laptop GPUs, including designs with the new GeForce RTX 3050 Ti and 3050 laptop GPUs, and the Read article >
The post Create in Record Time with New NVIDIA Studio Laptops from Dell, HP, Lenovo, Gigabyte, MSI and Razer appeared first on The Official NVIDIA Blog.
Yep, seems odd but I am unsure how to take it now.
A little bit of background: I have been participating in Deep Learning related competitions for a pretty long time. I am 25 right now, started in the field when I was 20. Started from Keras, then PyTorch then eventually chose Tensorflow because that gave me an edge with GPU parallelization and every firm around my region uses TF / helping me get a better job.
I dropped out of college, left my degree that majored in statistics and realised that AI was something that I could learn without spending a buck. It worked out pretty well. I eventually applied to a Speakers Giant ( can’t name ofc ) for the position of a data scientist and they gave it to me. Which was pretty fricking nuts given that I was doing research in NLP on a scale that was not professional.
That jump gave me hope in life that I won’t die as a dork with new cash. I eventually got a girlfriend. She was my colleague there, she left tho, to work at another firm 5 months ago.
The downfall started when I came across this ML challenge at a website called AIcrowd. I had participated and won a ton at Kaggle but this was new. I started making submissions and boom there it was. The ping noise. The sheer tension between me and time as I rushed to make more submissions. They were giving the participants like $10000+ for winning this thing and I was not even focused on that. Four days went away before I actually got out of my room, these were my stretches. Separating these sounds and making submissions felt like sex. I had never seen anything like this before.
I stopped answering my girlfriend’s texts initially, I would complete office assignments at night. I pretended to have a throat allergy to avoid office calls. We had been dating for a year and she had always hated my instinct to participate in these challenges which she looked down upon. She felt that I did them for the quick money, when I did it for the growth in my skills (Well Kaggle paid okayish, this AIcrowd’s prizes are nuts tho).
She called me a couple of days ago, asked me what would I do if she left due to my obsession. I told her that let me tighten up my ranking and then we could talk. Lol she broke up over text. F*ck.
I am still on top of the challenge’s leader board tho.
W
submitted by /u/aichimp-101
[visit reddit] [comments]
Nevermind, solved.
EDIT: Got it working, there were two bugs. First, I had mistakenly initialized batch_size twice during all my editing, and so I was mismatching between batch_size depending on where in the code I was. The second bug, which I still haven’t entirely fixed, is that if the batch size does not evenly divide the input generator the code fails, even though I have it set with a take amount that IS divisible by the batch_size. I get the same error even if I set steps per epoch to 1 (so it should never be reaching the end of the batches). I can only assume it’s an error during graph generation where it’s trying to define that last partial batch even though it will never be trained over. Hmm.
EDIT EDIT: Carefully following the size of my dataset throughout my pipeline, I discovered the source of the second issue, which is actually just that I didn’t delete my cache files when I previously had a larger take. The last thing I would still like to do is fix the code such that it actually CAN handle variable length batches so I don’t have to worry about making sure I don’t have partial batches. However, from what I can see, tf.unstack along variable length dimensions is straight up not supported, so this will require refactoring my computation to use some other method, like loops maybe. To be honest, though, it’s not worth my time to do so right now when I can just use drop_remainder = True and drop the last incomplete batch. In my real application there will be a lots of data per epoch, so losing 16 or so random examples from each epoch is rather minor.
So, I am making a project where I randomly crop images. In my data pipeline, I was trying to write code such that I could crop batches of my data at once, as the docs suggested that vectorizing my operations would reduce scheduling overhead.
However, I have run into some issues. If I use tf.image.random_crop, the problem is that the same random crop will be used on every image in the batch. I, however, want different random crops for every image. Moreover, since where I randomly crop an image will affect my labels, I need to track every random crop performed per image and adjust the label for that image.
I was able to write code that seems like it would work by using unstack, doing my operation per element, then restacking, like so:
images = tf.unstack( img, num=None, axis=0, name='unstack' ) xshifts = [] yshifts = [] newimages =[] for image in images: if not is_valid: x = np.random.randint(0, width - dx + 1) y = np.random.randint(0, height - dy + 1) else: x = 0 y = 0 newimages.append(image[ y:(y+dy), x:(x+dx), :]) print(image[ y:(y+dy), x:(x+dx), :]) xshifts.append((float(x) / img.shape[2]) * img_real_size) yshifts.append((float(y) / img.shape[1]) * img_real_size) images = tf.stack(newimages,0)
But oh no! Whenever I use this code in a map function, it doesn’t work because in unstack I set num=None, which requires it to infer how much to unstack from the actual batch size. But because tensorflow for reasons decided that batches should have size none when specifying things, the code fails because you can’t infer size from None. If I patch the code to put in num=batch_size, it changes my datasets output signature to be hard coded to batch_size, which seems like it shouldn’t be a problem, except this happens.
tensorflow.python.framework.errors_impl.InvalidArgumentError: Input shape axis 0 must equal 2, got shape [1,448,448,3] [[{{node unstack}}]]
Which is to say, it’s failing because instead of receiving the expected batch input with the appropriate batch_size (2 for testing), it’s receiving a single input image, and 2 does not equal 1. The documentation strongly implied to me that if I batch my dataset before mapping (which I do), then the map function should be receiving the entire batch and should therefore be vectorized. But is this not the case? I double checked my batch and dataset sizes to make sure that it isn’t just an error arising due to some smaller final batch.
To sum up: I want to crop my images uniquely per image and alter labels as I do so. I also want to do this per batch, not just per image. The code I wrote that does this requires me to unstack my batch. But unstacking my batch can’t have num=None. But tensorflow batches have shape none, so the input of my method has shape none at specification. But if I change unstack’s num argument to anything but None, it changes my output specification to that number (which isn’t None), and the output signature of my method must ALSO have shape none at specification. How can I get around this?
Or, if someone can figure out why my batched dataset, batched before my map function, is apparently feeding in single samples instead of full batches, that would also solve the mystery.
submitted by /u/Drinniol
[visit reddit] [comments]
I bought new macair M1 and was installing tensorflow on it, downloaded python3.8 using xcode-select-install but got error i between, “…arm64.whl” not supported, any help is appreciated.
submitted by /u/Anu_Rag9704
[visit reddit] [comments]