I am using labelImg to annotate custom datasets for Faster RCNN. What tools do you use to annotate images for Mask RCNN – create the masks?
submitted by /u/giakou4
[visit reddit] [comments]
I am using labelImg to annotate custom datasets for Faster RCNN. What tools do you use to annotate images for Mask RCNN – create the masks?
submitted by /u/giakou4
[visit reddit] [comments]
Hello,
I was wondering
Lets say if I get the output of an intermediary layer, would it be possible to feed the data back into the intermediary layer and resume the processing only from that layer?
Just wondering…..
submitted by /u/lulzintosh123
[visit reddit] [comments]
![]() |
submitted by /u/hichemito [visit reddit] [comments] |
I have been training a model using MS Coco 2017 dataset.
The Coco dataset is divided between train, test and validation. The key difference is the test dataset does not have bounding box annotations.
I didn’t realise this and I set the eval_input _reader to point at the Coco test dataset .tfrecord files.
Is this incorrect? Should I instead point it towards the validation dataset which has the bounding-box annotations? It’s strange because my model is still working. Though not very well.
Very confused by it all. Why doesn’t the cocodataset label the test images?
submitted by /u/manduala
[visit reddit] [comments]
Hi everyone,
I’m trying to implement my own seq2seq model and employ the widely recommended attention layer on it. Oddly getting an error during the concatenation like below:
2 frames/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)53 ctx.ensure_initialized()54 tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,—> 55 inputs, attrs, num_outputs)56 except core._NotOkStatusException as e:57 if name is not None:InvalidArgumentError: Graph execution error:Detected at node ‘model_3/concat_layer/concat’ defined at (most recent call last):File “/usr/lib/python3.7/runpy.py”, line 193, in _run_module_as_main”__main__”, mod_spec)File “/usr/lib/python3.7/runpy.py”, line 85, in _run_codeexec(code, run_globals)File “/usr/local/lib/python3.7/dist-packages/ipykernel_launcher.py”, line 16, in <module>app.launch_new_instance()File “/usr/local/lib/python3.7/dist-packages/traitlets/config/application.py”, line 846, in launch_instanceapp.start()File “/usr/local/lib/python3.7/dist-packages/ipykernel/kernelapp.py”, line 499, in startself.io_loop.start()File “/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py”, line 132, in startself.asyncio_loop.run_forever()File “/usr/lib/python3.7/asyncio/base_events.py”, line 541, in run_foreverself._run_once()File “/usr/lib/python3.7/asyncio/base_events.py”, line 1786, in _run_oncehandle._run()File “/usr/lib/python3.7/asyncio/events.py”, line 88, in _runself._context.run(self._callback, *self._args)File “/usr/local/lib/python3.7/dist-packages/tornado/platform/asyncio.py”, line 122, in _handle_eventshandler_func(fileobj, events)File “/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py”, line 300, in null_wrapperreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py”, line 452, in _handle_eventsself._handle_recv()File “/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py”, line 481, in _handle_recvself._run_callback(callback, msg)File “/usr/local/lib/python3.7/dist-packages/zmq/eventloop/zmqstream.py”, line 431, in _run_callbackcallback(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/tornado/stack_context.py”, line 300, in null_wrapperreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py”, line 283, in dispatcherreturn self.dispatch_shell(stream, msg)File “/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py”, line 233, in dispatch_shellhandler(stream, idents, msg)File “/usr/local/lib/python3.7/dist-packages/ipykernel/kernelbase.py”, line 399, in execute_requestuser_expressions, allow_stdin)File “/usr/local/lib/python3.7/dist-packages/ipykernel/ipkernel.py”, line 208, in do_executeres = shell.run_cell(code, store_history=store_history, silent=silent)File “/usr/local/lib/python3.7/dist-packages/ipykernel/zmqshell.py”, line 537, in run_cellreturn super(ZMQInteractiveShell, self).run_cell(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py”, line 2718, in run_cellinteractivity=interactivity, compiler=compiler, result=result)File “/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py”, line 2822, in run_ast_nodesif self.run_code(code, result):File “/usr/local/lib/python3.7/dist-packages/IPython/core/interactiveshell.py”, line 2882, in run_codeexec(code_obj, self.user_global_ns, self.user_ns)File “<ipython-input-15-f9cbfc42a957>”, line 491, in <module>train()File “<ipython-input-15-f9cbfc42a957>”, line 321, in train1)[:, 1:]))File “/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py”, line 64, in error_handlerreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/engine/training.py”, line 1384, in fittmp_logs = self.train_function(iterator)File “/usr/local/lib/python3.7/dist-packages/keras/engine/training.py”, line 1021, in train_functionreturn step_function(self, iterator)File “/usr/local/lib/python3.7/dist-packages/keras/engine/training.py”, line 1010, in step_functionoutputs = model.distribute_strategy.run(run_step, args=(data,))File “/usr/local/lib/python3.7/dist-packages/keras/engine/training.py”, line 1000, in run_stepoutputs = model.train_step(data)File “/usr/local/lib/python3.7/dist-packages/keras/engine/training.py”, line 859, in train_stepy_pred = self(x, training=True)File “/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py”, line 64, in error_handlerreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py”, line 1096, in __call__outputs = call_fn(inputs, *args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py”, line 92, in error_handlerreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py”, line 452, in callinputs, training=training, mask=mask)File “/usr/local/lib/python3.7/dist-packages/keras/engine/functional.py”, line 589, in _run_internal_graphoutputs = node.layer(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py”, line 64, in error_handlerreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/engine/base_layer.py”, line 1096, in __call__outputs = call_fn(inputs, *args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/utils/traceback_utils.py”, line 92, in error_handlerreturn fn(*args, **kwargs)File “/usr/local/lib/python3.7/dist-packages/keras/layers/merge.py”, line 183, in callreturn self._merge_function(inputs)File “/usr/local/lib/python3.7/dist-packages/keras/layers/merge.py”, line 531, in _merge_functionreturn backend.concatenate(inputs, axis=self.axis)File “/usr/local/lib/python3.7/dist-packages/keras/backend.py”, line 3313, in concatenatereturn tf.concat([to_dense(x) for x in tensors], axis)Node: ‘model_3/concat_layer/concat’ConcatOp : Dimension 1 in both shapes must be equal: shape[0] = [128,7,300] vs. shape[1] = [128,128,300][[{{node model_3/concat_layer/concat}}]] [Op:__inference_train_function_43001]
You can reach the whole Python code from here: https://pastebin.pl/view/16b32d92 (I could not share it here as it was too long to be included here).
Could you please help me to find out what I’m missing?
Many many thanks in advance!
p.s. I’m using Keras for the implementation.
submitted by /u/talhak
[visit reddit] [comments]
I am trying to implement a normalizing flow according to the RealNVP model for density estimation. First, I am trying to make it work on the “moons” toy dataset.
The model produces the expected result when not using the BatchNormalization bijector. However, when adding the BatchNormalization bijector to the model, the methods prob and log_prob return unexpected results.
Following is a code snippet setting up the model:
“`python layers = 6 dimensions = 2 hidden_units = [512, 512] bijectors = []
base_dist = tfd.Normal(loc=0.0, scale=1.0) # specify base distribution
for i in range(layers): # Adding the BatchNormalization bijector corrupts the results bijectors.append(tfb.BatchNormalization()) bijectors.append(RealNVP(input_shape=dimensions, n_hidden=hidden_units)) bijectors.append(tfp.bijectors.Permute([1, 0]))
bijector = tfb.Chain(bijectors=list(reversed(bijectors))[:-1], name=’chain_of_real_nvp’)
flow = tfd.TransformedDistribution( distribution=tfd.Sample(base_dist, sample_shape=[dimensions]), bijector=bijector ) “`
When to BatchNormalization bijector is omitted both sampling and evaluating the probability return expected results:
Heatmap of probabilities and samples without BN
However, when the BatchNormalization bijector is added, sampling is as expected but evaluating the probability seems wrong:
Heatmap of probabilities and samples with BN
Because I am interested in density estimation the prob method is crucial. The full code can be found in the following jupyter notebook: https://github.com/mmsbrggr/normalizing-flows/blob/master/moons_training_rnvp.ipynb
I know that the BatchNormalization bijector behaves differently during training and inference. Could the problem be that the BN bijector is still in training mode? If so how can I move the flow to inference mode?
submitted by /u/marcelmoosbrugger
[visit reddit] [comments]
If you want to ride the next big wave in AI, grab a transformer. They’re not the shape-shifting toy robots on TV or the trash-can-sized tubs on telephone poles. So, What’s a Transformer Model? A transformer model is a neural network that learns context and thus meaning by tracking relationships in sequential data like the Read article >
The post What Is a Transformer Model? appeared first on NVIDIA Blog.
When the first instant photo was taken 75 years ago with a Polaroid camera, it was groundbreaking to rapidly capture the 3D world in a realistic 2D image. Today, AI researchers are working on the opposite: turning a collection of still images into a digital 3D scene in a matter of seconds. Known as inverse Read article >
The post NVIDIA Research Turns 2D Photos Into 3D Scenes in the Blink of an AI appeared first on NVIDIA Blog.
NVIDIA announced a number of new tools for game developers at this year’s GDC to help you save time, more easily integrate RTX, and simplify the creation of virtual worlds.
This week at GDC, NVIDIA announced a number of new tools for game developers to help save time, more easily integrate NVIDIA RTX, and streamline the creation of virtual worlds. Watch this overview of three exciting new tools now available.
Since NVIDIA Deep Learning Super Sampling (DLSS) launched in 2019, a variety of super-resolution technologies have shipped from both hardware vendors and engine providers. To support these various technologies, game developers have had to integrate multiple SDKs, often with varying integration points and compatibility.
Today NVIDIA is releasing Streamline, an open-source cross-IHV framework that aims to simplify integration of multiple super-resolution technologies and other graphics effects in games and applications.
Streamline offers a single integration with a plug-and-play framework. It sits between the game and render API, and abstracts the SDK-specific API calls into an easy-to-use Streamline framework. Instead of manually integrating each SDK, developers simply identify which resources (motion vectors, depth, etc) are required for the target super-resolution plug-ins and then set where they want the plug-ins to run in their graphics pipeline. The framework is also extensible beyond super-resolution SDKs, with developers able to add NVIDIA Real-time Denoisers (NRD) to their games via Streamline. Making multiple technologies easier for developers to integrate, Streamline benefits gamers with more technologies in more games.
The Streamline SDK is available today on Github with support for NVIDIA DLSS and Deep Learning Anti-Aliasing. NVIDIA Image Scaling support is also coming soon. Streamline is open source and hardware vendors can create their own specific plug-ins. For instance, Intel is working on Streamline support for XeSS.
In 2018, NVIDIA Turing architecture changed the game with real-time ray tracing. Now, over 150 top games and applications use RTX to deliver realistic graphics with incredibly fast performance, including Cyberpunk 2077, Far Cry 6, and Marvel’s Guardians of the Galaxy.
Available now, Kickstart RT makes it easier than ever to add real-time ray tracing to game engines, producing realistic reflections, shadows, ambient occlusion, and global illumination.
Traditionally, game engines must bind all active materials in a scene. Kickstart RT delivers beautiful ray-traced effects, while foregoing legacy requirements and invasive changes to existing material systems.
Kickstart RT provides a convenient starting point for developers to quickly and easily include realistic dynamic lighting of complex scenes in their game engines in a much shorter timespan than traditional methods. It’s also helpful for those who may find upgrading their engine to the DirectX12 API difficult.
Learn more about Kickstart RT.
With a network of more than 30 data centers and 80 countries, GeForce NOW (GFN) uses the power of the cloud to take the GeForce PC gaming experience to any device. This extended infrastructure provides a developer platform for studios and publishers to perform their game development virtually, starting with playtesting.
With GeForce NOW Cloud Playtest, players and observers can be anywhere in the world. Game developers can upload a build to the cloud, and schedule a playtest for players and observers to manage on their calendar.
During a playtest session, a player uses the GFN app to play the targeted game, while streaming their camera and microphone. Observers watch the gameplay and webcam feeds from the cloud. Developers need innovative ways to perform this vital part of game development safely and securely. All this is possible with Cloud Playtest on the GeForce NOW Developer Platform.
Learn more about GeForce Now Cloud Playtest.
Virtual world simulation technology is opening new portals for game developers. The NVIDIA Omniverse platform is bringing a new era of opportunities to game developers.
Users can plug into any layer of the modular Omniverse stack to build advanced tools that simplify workflows, integrate advanced AI and simulation technologies, or help connect complex production pipelines.
This GDC, see how Omniverse is supercharging game development pipelines with over 20 years of NVIDIA rendering, simulation, and AI technology.
Lastly, NVIDIA announced the #MadeInMachinima contest—where participants can remix iconic game characters into a cinematic short using the Omniverse Machinima app for a chance to win RTX-accelerated NVIDIA Studio laptops. Users can experiment with AI-enabled tools and create intuitively in real-time using assets from Squad, Mount & Blade II: Bannerlord, and Mechwarrior 5. The contest is free to enter.
Learn more about Omniverse for developers.
It has never been easier to integrate real-time ray tracing in your games with Kickstart RT. However, for those looking for specific solutions, we’ve also updated our individual ray tracing SDKs.
The RTX Global Illumination plug-in is now available for Unreal Engine 5 (UE5). Developers can get a headstart in UE5 with dynamic lighting in their open worlds. Unreal Engine 4.27 has also been updated with performance and quality improvements while the plug-in for the NVIDIA branch of UE4 has received ray traced reflections and translucency support alongside skylight enhancements.
RTX Direct Illumination has received image quality improvements for glossy surfaces. NVIDIA Real-Time Denoisers introduces NVIDIA Image Scaling and a path-tracing mode within the sample application. It also has a new performance mode that is optimized for lower spec systems.
We’ve recently announced new latency measurement enhancements for NVIDIA Reflex that can be seen in Valorant, Overwatch, and Fortnite. These new features include per-frame PC Latency measurement and automatic configuration for Reflex Analyzer. Today, Reflex 1.6 SDK is available to all developers—making latency monitoring as easy as measuring FPS.
We’ve also updated our NVIDIA RTX Technology Showcase, an interactive demo, with Deep Learning Super Sampling (DLSS), Deep Learning Anti-Aliasing, and NVIDIA Image Scaling. Toggle different technologies to see the benefits AI can bring to your projects. For developers who have benefited from DLSS, we’ve updated the SDK to include a new and simpler way to integrate the latest NVIDIA technologies into your pipeline.
Truly immersive XR games require advanced graphics, best-in-class lighting, and performance optimizations to run smoothly on all-in-one VR headsets. Cutting-edge XR solutions like NVIDIA CloudXR streaming and NVIDIA VRWorks are making it easier and faster for game developers to create amazing AR and VR experiences.
Developers and early access users can accurately capture and replay VR sessions for performance testing, scene troubleshooting, and more with NVIDIA Virtual Reality Capture and Replay (VCR).
Attempting to repeat a user’s VR experience exactly is time consuming, and nearly impossible. VCR makes replaying VR sessions both accurate and painless. The tool records time-stamped HMD and controller inputs during an immersive VR session. Users can then replay the recording, without an HMD attached, to reproduce the session. It’s also possible to filter the recorded session through an optional processing step, cleaning up the data and removing excessive camera motion.
Learn more about NVIDIA XR developer solutions.
The latest Nsight Graphics 2022.2 features the brand new Ray Tracing Acceleration Structure Instance Heatmap and Shader Timing Heatmap. The GPU Trace Profiler has been improved to help developers see when GPU activity belonging to other processes may be interfering with profiling data. This ensures that the information that developers get for profiling is accurate and reliable
NVIDIA Nsight Systems is a triage and performance analysis tool designed to track GPU workloads to their CPU origins within a system-wide view. The new Nsight Systems 2022.2 release now includes support for Vulkan graphics pipeline library and Vulkan memory operations and warnings. Enhancements to multi report view further improve the ability to compare and debug issues.
Game developers can find additional free resources to recreate fully ray-traced and AI-driven virtual worlds on the NVIDIA Game Development resources page. Check out NVIDIA GTC 2022 game development sessions from this past week.
Three years ago we wrote about our work on predicting a number of cardiovascular risk factors from fundus photos (i.e., photos of the back of the eye)1 using deep learning. That such risk factors could be extracted from fundus photos was a novel discovery and thus a surprising outcome to clinicians and laypersons alike. Since then, we and other researchers have discovered additional novel biomarkers from fundus photos, such as markers for chronic kidney disease and diabetes, and hemoglobin levels to detect anemia.
A unifying goal of work like this is to develop new disease detection or monitoring approaches that are less invasive, more accurate, cheaper and more readily available. However, one restriction to potential broad population-level applicability of efforts to extract biomarkers from fundus photos is getting the fundus photos themselves, which requires specialized imaging equipment and a trained technician.
![]() |
The eye can be imaged in multiple ways. A common approach for diabetic retinal disease screening is to examine the posterior segment using fundus photographs (left), which have been shown to contain signals of kidney and heart disease, as well as anemia. Another way is to take photographs of the front of the eye (external eye photos; right), which is typically used to track conditions affecting the eyelids, conjunctiva, cornea, and lens. |
In “Detection of signs of disease in external photographs of the eyes via deep learning”, in press at Nature Biomedical Engineering, we show that a deep learning model can extract potentially useful biomarkers from external eye photos (i.e., photos of the front of the eye). In particular, for diabetic patients, the model can predict the presence of diabetic retinal disease, elevated HbA1c (a biomarker of diabetic blood sugar control and outcomes), and elevated blood lipids (a biomarker of cardiovascular risk). External eye photos as an imaging modality are particularly interesting because their use may reduce the need for specialized equipment, opening the door to various avenues of improving the accessibility of health screening.
Developing the Model
To develop the model, we used de-identified data from over 145,000 patients from a teleretinal diabetic retinopathy screening program. We trained a convolutional neural network both on these images and on the corresponding ground truth for the variables we wanted the model to predict (i.e., whether the patient has diabetic retinal disease, elevated HbA1c, or elevated lipids) so that the neural network could learn from these examples. After training, the model is able to take external eye photos as input and then output predictions for whether the patient has diabetic retinal disease, or elevated sugars or lipids.
![]() |
A schematic showing the model generating predictions for an external eye photo. |
We measured model performance using the area under the receiver operator characteristic curve (AUC), which quantifies how frequently the model assigns higher scores to patients who are truly positive than patients who are truly negative (i.e., a perfect model scores 100%, compared to 50% for random guesses). The model detected various forms of diabetic retinal disease with AUCs of 71-82%, AUCs of 67-70% for elevated HbA1c, and AUCs of 57-68% for elevated lipids. These results indicate that, though imperfect, external eye photos can help detect and quantify various parameters of systemic health.
Much like the CDC’s pre-diabetes screening questionnaire, external eye photos may be able to help “pre-screen” people and identify those who may benefit from further confirmatory testing. If we sort all patients in our study based on their predicted risk and look at the top 5% of that list, 69% of those patients had HbA1c measurements ≥ 9 (indicating poor blood sugar control for patients with diabetes). For comparison, among patients who are at highest risk according to a risk score based on demographics and years with diabetes, only 55% had HbA1c ≥ 9, and among patients selected at random only 33% had HbA1c ≥ 9.
Assessing Potential Bias
We emphasize that this is promising, yet early, proof-of-concept research showcasing a novel discovery. That said, because we believe that it is important to evaluate potential biases in the data and model, we undertook a multi-pronged approach for bias assessment.
First, we conducted various explainability analyses aimed at discovering what parts of the image contribute most to the algorithm’s predictions (similar to our anemia work). Both saliency analyses (which examine which pixels most influenced the predictions) and ablation experiments (which examine the impact of removing various image regions) indicate that the algorithm is most influenced by the center of the image (the areas of the sclera, iris, and pupil of the eye, but not the eyelids). This is demonstrated below where one can see that the AUC declines much more quickly when image occlusion starts in the center (green lines) than when it starts in the periphery (blue lines).
![]() |
Explainability analysis shows that (top) all predictions focused on different parts of the eye, and that (bottom) occluding the center of the image (corresponding to parts of the eyeball) has a much greater effect than occluding the periphery (corresponding to the surrounding structures, such as eyelids), as shown by the green line’s steeper decline. The “baseline” is a logistic regression model that takes self-reported age, sex, race and years with diabetes as input. |
Second, our development dataset spanned a diverse set of locations within the U.S., encompassing over 300,000 de-identified photos taken at 301 diabetic retinopathy screening sites. Our evaluation datasets comprised over 95,000 images from 198 sites in 18 US states, including datasets of predominantly Hispanic or Latino patients, a dataset of majority Black patients, and a dataset that included patients without diabetes. We conducted extensive subgroup analyses across groups of patients with different demographic and physical characteristics (such as age, sex, race and ethnicity, presence of cataract, pupil size, and even camera type), and controlled for these variables as covariates. The algorithm was more predictive than the baseline in all subgroups after accounting for these factors.
Conclusion
This exciting work demonstrates the feasibility of extracting useful health related signals from external eye photographs, and has potential implications for the large and rapidly growing population of patients with diabetes or other chronic diseases. There is a long way to go to achieve broad applicability, for example understanding what level of image quality is needed, generalizing to patients with and without known chronic diseases, and understanding generalization to images taken with different cameras and under a wider variety of conditions, like lighting and environment. In continued partnership with academic and nonacademic experts, including EyePACS and CGMH, we look forward to further developing and testing our algorithm on larger and more comprehensive datasets, and broadening the set of biomarkers recognized (e.g., for liver disease). Ultimately we are working towards making non-invasive health and wellness tools more accessible to everyone.
Acknowledgements
This work involved the efforts of a multidisciplinary team of software engineers, researchers, clinicians and cross functional contributors. Key contributors to this project include: Boris Babenko, Akinori Mitani, Ilana Traynis, Naho Kitade, Preeti Singh, April Y. Maa, Jorge Cuadros, Greg S. Corrado, Lily Peng, Dale R. Webster, Avinash Varadarajan, Naama Hammel, and Yun Liu. The authors would also like to acknowledge Huy Doan, Quang Duong, Roy Lee, and the Google Health team for software infrastructure support and data collection. We also thank Tiffany Guo, Mike McConnell, Michael Howell, and Sam Kavusi for their feedback on the manuscript. Last but not least, gratitude goes to the graders who labeled data for the pupil segmentation model, and a special thanks to Tom Small for the ideation and design that inspired the animation used in this blog post.
1The information presented here is research and does not reflect a product that is available for sale. Future availability cannot be guaranteed. ↩