c0de517e's journal
log_004

logo: back home toot
- WSL2 setup for nerfstudio
- Another draft from 2015 - language paradigms
- Politics, Variance and Techbros.
- Draft from 2015 on the future of real-time rendering and GPUs.
[Previous log file]

- WSL2 setup for nerfstudio (permalink)

DOCS
- https://learn.microsoft.com/en-us/windows/ai/directml/gpu-cuda-in-wsl
- https://www.reddit.com/r/wsl2/comments/15vlnvu/do_i_still_need_to_install_cuda_toolkit/
- https://docs.nerf.studio/
-- https://gist.github.com/SharkWipf/0a3fc1be3ea88b0c9640db6ce15b44b9

# WSL2 / Hyper-V

wsl --update
wsl --install ubuntu

# Update apt and upgrade ubuntu, why not - and also let's install a decent editor

wsl
sudo apt-get update
sudo apt-get -y upgrade
# some utilities
sudo apt-get install micro
sudo apt-get install mc

# Where are my files? It seems that now wsl2 creates a virtual HD by default
# Windows hd can be accessed using /mnt/c
# e.g. C:\Users\apesce\AppData\Local\Packages\CanonicalGroupLimited.Ubuntu_79rhkp1fndgsc\LocalState\ext4.vhdx
# Windows can mount these

# Cuda for WSL
# Should make sure NVidia drivers are up to date etc...
# Lots of different Cuda versions etc - this is tricky - right now best is to stick with 11.8

# References:
# https://developer.nvidia.com/cuda-11-8-0-download-archive?target_os=Linux&target_arch=x86_64&Distribution=WSL-Ubuntu&target_version=2.0&target_type=deb_network
# https://docs.nvidia.com/cuda/wsl-user-guide/index.html

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-keyring_1.0-1_all.deb
sudo dpkg -i cuda-keyring_1.0-1_all.deb

# Install only the toolkit, not full cuda
sudo apt-get -y install cuda-toolkit-11-8 build-essential
# btw, nerfstudio guide suggests using conda - conda install -c "nvidia/label/cuda-11.8.0" cuda-toolkit

# Add environment variables to .bashrc in $home (can use micro)

export CUDA_PATH="/usr/local/cuda-11.8"
export CUDA_HOME="$CUDA_PATH"
export PATH="$CUDA_PATH/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64/stubs:$CUDA_PATH/lib64:/usr/lib/wsl/lib/:$LD_LIBRARY_PATH"
export LIBRARY_PATH="$LD_LIBRARY_PATH:$LIBRARY_PATH"

# Miniconda

cd $home
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh
rm Miniconda3-latest-Linux-x86_64.sh
./miniconda3/bin/conda init

conda update conda

# Conda environment and all the dependencies...
# note: next time I should try to use pixi instead, which should replace all of the steps below https://docs.nerf.studio/quickstart/installation.html#using-pixi

conda create --name nerfstudio -y python=3.8
conda activate nerfstudio
python -m pip install --upgrade pip

# the official guide uses pip, but everytime I can use conda instead it's best to do so
# pip install torch==2.1.2+cu118 torchvision==0.16.2+cu118 --extra-index-url https://download.pytorch.org/whl/cu118
# note the use of the conda-forge channel:

conda install ffmpeg 'colmap<3.9' hloc 'pytorch<2.2' cudatoolkit=11.8 'opencv==4.10' -c conda-forge

# these seem to be required for splatfacto
# nope- conda install triton
# nope- pip install pytorch-triton --extra-index-url https://download.pytorch.org/whl/cu118
# BINGO:
pip install triton==2.2.0
pip install 'setuptools<70' # if you get some errors about 'packaging' in python scripts...

# Test cuda/pytorch in python:
>>> import torch
>>> torch.cuda.is_available()

# Build tinycudann

pip install ninja git+https://github.com/NVlabs/tiny-cuda-nn/#subdirectory=bindings/torch

# Nerfstudio (latest, from source)

git clone https://github.com/nerfstudio-project/nerfstudio
pip install --upgrade pip setuptools

# this step reads the pyproject.toml for dependencies, but we should already have most of them installed by now
pip install -e .
# if this errors on a package version (...can't uninstall) it's because the conda version is different than the one in the .toml, might need to tweak things...

# optional cli bindings, will show some errors, normal:
ns-install-cli

# NOTE: if it complains about libEGL.so it means we need to install freeglut:
sudo apt-get install freeglut3-dev

# Example use:

conda activate nerfstudio

# sample
ns-download-data nerfstudio --capture-name=poster
ns-train nerfacto --data data/nerfstudio/poster

# https://docs.nerf.studio/quickstart/custom_dataset.html
ns-process-data video --data some/video.mp4 --output-dir video_colmapped
ns-train splatfacto --data video_colmapped --output-dir trained
Fri, 17 Jan 2025 13:00:21 -0800


- Another draft from 2015 - language paradigms (permalink)

Again posted as I found it (raw notes, WIP...)

- Languages to learn paradigms.

De-empathize OOP. Functional and "DOP" are gaining popularity. But as CS we should know more...

Idea - List of languages good to learn a given paradigm and their learning resources + a simple project that would be fun to do with them. Most languages are multiparadigm, but certain are better suited to given ones and are great for learning a particular style.
These are all fairly generic languages, fairly current and supported. There are many more paradigms that are restricted to very peculiar applications, and languages that are very interesting for their historical contributions.

   - Racket Scheme and derived (e.g. Typed Racket). Mathematica. Clojure.
      - Lisp-family. Impure functional, symbolic, reflective.
      - Metaprogramming. Homoiconic. Continuation.
      - Design-by-contract (Clojure). Aspects (Racket). Futures/Promises (Racket). Term Rewriting (Mathematica).
      - Resources: ...       
      - Project idea: Genetic Programming.
   - OcaML, F#            
      - ML-family. Impure functional, type-inferring.
      - Metaprogramming. OO       
      - Resources: ...
      - Project idea: Path Tracer.
   - SWI Prolog
      - Prolog-family. Declarative, logic. Constraint Programming.
      - Non-deterministic.
      - Resources: ...       
      - Project idea: ...   
   - C       
      - Imperative, procedural.
      - Resources: ...       
      - Project idea: Software Rasterizer.
   - ChucK       
      - Livecoding. Synchronous Reactive Programming.
   - DirectCompute... Cuda, Shadertoy (GLSL), ISPC
      - Stream processing
   - TCL
      - String based. Scripting.
   - ColorForth. Roll-your-own (e.g. JonesForth). Factor
      - Stack-based. Minimal. Typeless. Self-Modifying.
   - VVVV
      - Live, visual, flow-based    
   - Verilog, VHDL
      - Flow-based, reactive
   - Erlang
      - Actors (Similar to communicating sequential processes).
      - Session-based, hot-code loading. Futures/Promises.
   - J...
      - Array    
   - Haskell
      - Purely functional. Lazy.
   - Rust
      - Linear types.
   - 68000. 6510.
      - Assembly...
   - Map-reduce       
      - Declarative concurrent
   - NetLogo
      - ...
Tue, 14 Jan 2025 17:13:14 -0800


- Politics, Variance and Techbros. (permalink)

Travis KalanickIt worries me to see techbros close to politics. A side-effect of the tech industry is that it teaches some that they are extraordinarily smart, and that all problems can be solved with some javascript. No other experience needed, move fast, break things etc.

And that is scary when it comes to the real world, as most problems are non-trivial, and require deep understanding of the subject matter.

Now, one can argue that's not true, tech managed over and over again to have real-world impact - uber, amazon, airbnb etc - all things that work with physical things. And I doubt that the founders were all experts in the various fields before entering them, i.e. I don't think Travis Kalanick spent all his time driving a taxi before deciding to make Uber...

This works in tech because the cost of trying something is next to nothing.
Failing has no cost, and if you don't immediately fail, you can learn as you go. Expertise is mostly a variance-reduction technique, and you in government you want low-variance more than you want to luck out at the risk of failing catastrophically.
A.k.a. infinite monkeys in a room can write Shakespeare... that doesn't mean that the monkey that finally happens to do so is a genius.

tldr- techbros have reasonable reasons to do what they do - unfortunately, this creates a bias for the successful ones to overestimate the applicability of that method - and trying to solve every problem with the same hammer, regardless of the context, is catastrophic.
Mon, 13 Jan 2025 13:59:56 -0800


- Draft from 2015 on the future of real-time rendering and GPUs. (permalink)

Found this in the drafts of my old blog. I think, considering it's a decade old, it had some pretty decent ideas... Posting here unedited.

- Triangles, rasterization and quads
[links: http://www.drdobbs.com/parallel/rasterization-on-larrabee/217200602], Ryg's trip down G.Pipe and https://fgiesen.wordpress.com/2013/02/10/optimizing-the-basic-rasterizer/] - have been a constant for the past twenty years (with rare exceptions [...
https://en.wikipedia.org/wiki/NV1]).

Partially this is due to the fact that once a standard exists it's tough to break free from it. Even gaming consoles, made with unique hardware and for which we optimize every last cycle, tend to converge towards similar architectures, especially when it comes to decisions that would alter not simply code paths, but the entire logic and methods employed to achieve a given effect. It's a hard sell in modern cross-platform development to have unique hardware that necessitates specific techniques and assets to be used.

But that said, there are undoubtedly very good technical reasons for this success as well, it's not just historical. Yet, periodically we have to ask the question: what would the future hold? What could we possibly imagine that breaks past current GPU architectures? Here I want to note some options and considerations, with the disclaimer that not a lot of thought has gone into their pros and cons and even more importantly, that I am far from being a hardware designer...

When it comes to predicting the future of real-time computer graphics I often like to look at the history of offline rendering as guidance.
Unfortunately in this case I think it can be done only in the loosest sense, as the actual rendering techniques employed in the offline world [link ...REYES... prman] hardly ever had much in common with the kind of rasterization found in GPUs.
Even early CG shots [youtube links...] always had a given "offline" quality that even today is discernible: no aliasing, no visible discretization.
Primitives changed based on the processing power, from solids to NURBS, then to subdivision surfaces, sculpted displacement, and voxels. Even tools and techniques evolved much more radically, but this would be (probably) the topic of another post...

- Raytracing.

Every time one discusses of the future of realtime computer graphics, raytracing is the first thing that most people think about. I wrote about it in the past [...link] - back then I believed it wasn't going to happen in the near future (which is the only timescale I'm comfortable talking about) and still today I think it's not at our doorsteps.
The reason why I have this negative assessment is that I don't think we're near the complexity limits of rasterization: we still can and should handle more geometry, more complex shading (and note that fast raytracing usually imposes some extra burden on shading, exactly what we don't want) and more complex texturing.
From the offline CG history one can see that even there, before reaching given complexity limits, raytracing was not largely employed (the transition has been happening only in the recent few years there).

Raytracing is not trivial to implement efficiently, but once done it has certain benefits in terms of being able to express many lighting algorithms in a simpler, more natural, and accurate way. It might have, on enormously (depth-)complex scenes, even a performance edge, but we are nowhere near, IMHO, to wanting any of these things.
We don't care much (or at all) about making life easier for rendering engineers, and our scenes are nowhere near the scene complexity at which rasterization starts to buckle - if that even exists (especially for non-out-of-core rendering - LODs and culling take care of getting us back to a reasonable complexity for a given viewpoint).

In fact, we are still so lacking in performance - in being able to push complexity - that usually the best graphics comes from carefully chosen and exploited constraints on art production and on the kind of scenes we can represent - more than anything else.

This is even worse if you consider that "simple" raytracing, using mostly coherent rays for shadows or perfect reflections is perfectly useless (again one can look at the history of offline CG) - the real deal comes when we can shoot incoherent rays, which are still hard at the levels of performance needed for realtime graphics.

It is true though that even today many fundamental problems are hard to solve without incoherent visibility (e.g. indirect specular occlusion) and that some of these defects "show" in the final rendered image. There might be a "soft way" to raytracing, starting to use the technique in realtime to incrementally update the kinds of representation we today trace offline (i.e. voxels, uv-space, and so on) for dynamic objects, but I'd bet that's going to be done in software if it's needed, because lots of people will still prefer to solve these issues with more contraints (e.g. more static scenes or limited to what can be streamed).

- What else then?
What could help achieve the complexity goals we seek? One thing that could be discussed is the decoupling of shading and visibility rates. In modern GPUs this is achievable only at the subpixel level by means of MSAA, or to some extent via triangle interpolation, employing hardware tessellation but that incours into several limitations.

I think it's possible to think of several improvements to the rasterizer itself that could not be too crazy: from the simplest, like supporting quads or patches, to derivative-less quad-less rasterization, to more complex schemes [...
http://graphics.stanford.edu/papers/fragmerging/shade_sig10.pdf] to full-blown REYES [...GPU reyes].

To a degree 2x2 shading quads are reasonable: they provide a simple way to obtain derivatives, which are crucial for filtered texture reads, and they impose a very simple limit on the amount of data that needs to be passed in a wave as we will pack always at least four times more samples than primitives. But I still think there is room for improvement.

It also helps that typically on a GPU most of the area is taken by the shader units themselves, caches, texturing units, and all the components that are replicated many times. The rasterizer units are usually much smaller and shared among shader units, so it might be feasible to think of improvements, even adding functionality that could be more specific and infrequently used.

Even more in general this is about the dispatch of shading work, what could be done if we had more options, more flexibility in the way waves are created. Can we rasterize in more than two dimensions? Can we rasterize efficiently points, lines, or quads? Or allow stochastic rasterization [...
http://attila.ac.upc.edu/wiki/images/0/06/Hw_rast_hpg10.pdf TODO...
http://dl.acm.org/citation.cfm?id=1921505].
Point-clouds and splatting could be used to achieve more flexible visibility [...imperfect shadow maps] without sacrificing shading performance.

The other endpoint is interesting as well, controlling what we do with the shaded outputs. Can we lift the limitation of one sample per pixel shader for example? Allowing multiple outputs (e.g. a quad) to be sent to the blending stages would allow a degree of multi-resolution computation in shaders (certain math done per quad, certain per sample).

Computer shaders are another area of incremental improvement: they hold great potential, but so far they are severely limited by the inflexibility of dispatch and inability to reconfigure waves, especially architectures with wide waves, comparatively small register file and per-threadgroup memory severely limit the range of algorithms that are possible to implement efficiently, practically making inter-wave communication useless (this also should be the topic of a separate post...).

Being able to control GPU work from compute shaders (controlling dispatch, rasterizer, blending), being able to schedule waves on the same compute unit, reusing data already loaded in local memory, I would expect some of these capabilities to come in the future. Being able to generate "pixel shader" waves from a compute shader effectively means we could (more easily) implement software GPU rasterization [...TODO...]
Lastly, I think caching schemes will become increasingly complex and dynamic. Baking is not going to go away, even offline pure path tracing is not really the best option. Voxels, brick maps, irregular grids, sparse points, uv-space caching etcetera. How can we make these more efficient? Better compression schemes will be needed. Can there be better support for hierarchical data? Texture fetching that can traverse a mip chain, going up if certain texels are marked invalid until finding a valid sample? Or the ability of conditionally jumping to code that fills an area of a texture, if invalid texels are found. What could we do if we had more memory? SSD could give very large, fast, mostly read only memory areas, could these be directly connected to the CPU/GPUs?
Unfortunately I still think that most likely we will see fast progress in terms of raw compute power and memory, but much less progress on the actual capabilities, mostly due to cross-platform constraints.
Thu, 9 Jan 2025 17:39:53 -0800


[Previous log file] [Home]