An Introduction to the Conjugate Gradient Method Without the Agonizing Pain cs.cmu.edu/~quake-papers/…
👍︎ 13
📰︎ r/math
💬︎
📅︎ Feb 24 2020
🚨︎ report
How does calculating only a few of eigenvalues out of thousands work with conjugate-gradient method?

I read that if you have a large square matrix, say more than 1000x1000 and you want to get its eigenvalues but all you need is just, say, 10 of them, then there is the so-called conjugate gradient method that can save you a significant amount of time of calculating exactly the number of eigenvalues you want instead of all of them. Can someone point me to existing numerical libraries (does BLAS or LAPACK have it) and references?

EDIT: The matrix can be 10^6 x 10^6.

👍︎ 2
💬︎
📅︎ Jan 22 2020
🚨︎ report
Born today : February 2nd - Cornelius Lanczos, Mathematician, Physicist, "developed a number of techniques for mathematical calculations ... Lanczos algorithm for finding eigenvalues, Lanczos approximation for the gamma function, conjugate gradient method for solving systems of linear equations" en.wikipedia.org/wiki/Cor…
👍︎ 2
💬︎
📅︎ Feb 02 2018
🚨︎ report
An Introduction to the Conjugate Gradient Method Without the Agonizing Pain [PDF] cs.cmu.edu/~quake-papers/…
👍︎ 50
📰︎ r/math
💬︎
📅︎ Oct 08 2010
🚨︎ report
Born today : February 2nd - Cornelius Lanczos, Mathematician, Physicist, "developed a number of techniques for mathematical calculations ... Lanczos algorithm for finding eigenvalues, Lanczos approximation for the gamma function, conjugate gradient method for solving systems of linear equations" en.wikipedia.org/wiki/Cor…
👍︎ 2
💬︎
📅︎ Feb 02 2017
🚨︎ report
Painless Introduction to Conjugate Gradient Method (pdf) cs.cmu.edu/~quake-papers/…
👍︎ 9
💬︎
📅︎ Mar 23 2011
🚨︎ report
An Introduction To The Conjugate Gradient Method Without The Agonizing Pain [pdf] cs.cmu.edu/~quake-papers/…
👍︎ 12
💬︎
📅︎ Apr 01 2009
🚨︎ report
Painless Introduction to Conjugate Gradient Method (pdf) cs.cmu.edu/~quake-papers/…
👍︎ 6
💬︎
📅︎ Mar 23 2011
🚨︎ report
How to test if Kalman filter or (Conjugated) Gradient Method can be applied?

I struggle to find explanations or instructions to test if a Kalman filter or CGM can be applied to my function. I understand that my equation should be a linear function with respect to the variables in question. However, in my current equation expressing my model those variables are in the denominator, in square root expressions. That is not so linear. :-/

Are there some tests (differentiable, static, whatnot?) i can perform to confirm or deny that my equation is fit for kalman or CGM?

👍︎ 4
📰︎ r/ECE
💬︎
📅︎ Aug 06 2019
🚨︎ report
Frustrated with my PhD, especially with my advisor. Will appreciate any help…

I am a fourth-year PhD student (international student) at a well-known university in Boston. I work on mathematical optimization. I want to graduate as soon as possible (my advisor won’t let me). I am getting very frustrated with my PhD, during my PhD I have not received any single technical advice/knowledge from him. I've worked on and published several papers with him, I am the first author in our most works. He doesn’t understand or criticize these papers, because I have never seen him say anything technically useful whatsoever. Most of the time he suggests some buzzwords (most of the time these are inappropriate), he suggests some formatting/grammatical adjustments. Most of the time he doesn’t even read the whole paper. I find a paper idea, I do the theoretical analysis, I do the numerical experiments, then give him the paper. Throughout this process, I don’t receive any advice from him. However, I have to explain to him the whole paper top to bottom before submission. I have to prepare/submit my own review/rebuttal without any help from him. During my PhD/paper revision, he asked me the following questions (these are simple questions for people who study optimization and machine learning):

What is a convex function? What is gradient descent, Steepest Descent? Difference between the interior point method and the Simplex? What is Heavy ball momentum/ Nesterov momentum? Why positive definite matrix has positive eigenvalues only? What is linear convergence? What is the conjugate gradient method? etc.

I hate my advisor because he doesn’t have any ethics, he doesn’t know the basics of optimization (he is an imposter). He calls himself Large-Scale Optimization, Artificial Intelligence, Machine Learning, and Causal Inference specialist!!!! He did his post-doc from MIT (his post-doc work is still unpublished, after 6 years !!!!).

During my PhD, I wrote/submitted four research proposals for funding. I wrote two of them completely top to bottom (research idea, objective, tasks, implementation). He is the PI of these proposals, yet he doesn’t have the slightest idea of what we are proposing? When I ask him any technical questions, he suggests me to work hard (he believes I will find the solution). He is not opened to collaborate with other people who know this stuff (I managed two researchers from UCLA and Boston College to collaborate with us, he is not interested in doing this). I sometimes do reviews for him and he submits my reviews almost verbatim. He d

... keep reading on reddit ➡

👍︎ 10
📰︎ r/PhD
💬︎
📅︎ Feb 03 2021
🚨︎ report
OOP VS Optimization VS Decision Making. Which course should I choose?

Hi,

I'm currently enrolled in a M.Sc. course in Data Science & Engineering, and I have to decide which optional course to take among the following (I can choose only 1), please take a brief look at the syllabus.

1) OOP

syllabus:

• Basic features (1 credit) • Object-oriented programming, java, eclipse • Classes, attributes, methods and constructors, objects • Packages and visibility rules • Strings, wrapper classes • Arrays
• Inheritance and interfaces (2 credits) • Inheritance • Abstract classes, interfaces • Functional interfaces, lambda expressions • Exceptions • Generic types
• Standard libraries (3 credits) • Collections: sets, lists, maps • Streams • Files • Dates • Threads • Graphical interfaces, Swing, JavaFX
• Software Engineering principles (2 credits) - Software life cycle - Design using UML - Design Patterns - Configuration management - Testing

2) Numerical Optimization & Stochastic Optimization

syllabus:

• Convex optimization: - gradient descent method; conjugate gradient method - Numerical differentiation - Newton and quasi-Newton methods - Globalization techniques - Alternating direction method of multipliers (ADMM)
• Constrained optimization: - Interior point methods - Projected gradient method - Active set
• Stochastic optimization - Static simulation-based optimization (parametric optimization) - Dynamic simulation-based optimization (control optimization)

3) Decision Making & Optimization

syllabus:

1. Linear programming: modeling techniques, basic concepts of the Simplex Method, and duality (10% of the course).
1. Computational complexity: problem classes P, NP, NP-complete, and CoNP-complete (10% of the course).
1. Exact optimization methods: Branch and Bound, Cutting Planes, and Dynamic Programming (20% of the course).
1. Heuristic optimization methods: greedy algorithms, GRASP, Beam Search, meta-heuristics (Tabu Search, Simulated Annealing, Genetic Algorithms, ACO, VNS, RBS), and math-heuristics (30% of the course).
1. Decision making under uncertainty: Stochastic Programming with recourse, Measures for Stochastic Programming, Progressive Hedging method (10% of the course).
1. Nonlinear Programming: theoretical conditions for unconstrained and constrained optimization, algorithms for unconstrained and constrained optimization (20%).

Of course, the best answer is "it depends on what you have already done, and what you would like to do", so I try to give a brief introduction.

... keep reading on reddit ➡

👍︎ 2
💬︎
👤︎ u/alecki
📅︎ Feb 06 2021
🚨︎ report
[code] Klibanov algorithm for one option and 10mn laps

Here is the implementation in python of the algorithm in this article:

``````#! /usr/bin/python
#----------
# This unusual and intriguing algorithm was originally invented
# by Michael V. Klibanov, Professor, Department of Mathematics and Statistics,
# University of North  Carolina at Charlotte. It is published in the following
# paper:
# M.V. Klibanov, A.V. Kuzhuget and K.V. Golubnichiy,
# "An ill-posed problem for the Black-Scholes equation
# for a profitable forecast of prices of stock options on real market data",
# Inverse Problems, 32 (2016) 015010.
#----------
# Script assumes it's called by crontab, at the opening of the market
#-----
import numpy as np
import pause, datetime
from bs4 import BeautifulSoup
import requests
# Quadratic interpolation of the bid and ask option prices, and linear interpolation in between (https://people.math.sc.edu/kellerlv/Quadratic_Interpolation.pdf)
def funcQuadraticInterpolationCoef(values): # There is 'scipy.interpolate.interp1d' too
y = np.array(values)
A = np.array([[1,0,0],[1,-1,1],[1,-2,4]])
return np.linalg.solve(A,y) # https://en.wikipedia.org/wiki/Polynomial_regression
def funcUab(t,coef):
return coef*t**2 + coef*t + coef
def funcF(s, sa, sb, ua, ub):
return (s-sb)*(ua-ub)/(sa-sb) + ub
# Initialize the volatility and option lists of 3 values
optionBid =  # dummy value to pop in the loop
optionAsk =  # dummy value to pop in the loop
volatility =  # dummy value to pop in the loop
# Initalization for the loop
Nt = 4 # even number greater than 2: 4, 6, ...
Ns = 2 # even number greater than 0: 2, 4, ...
twotau = 2 # not a parameter...
alpha = 0.01 # not a parameter...
dt = twotau / Nt # time grid step
dimA = ( (Nt+1)*(Ns+1), (Nt+1)*(Ns+1) ) # Matrix A dimensions
dimb = ( (Nt+1)*(Ns+1), 1 ) # Vector b dimensions
A = np.zeros( dimA ) # Matrix A
b = np.zeros( dimb ) # Vector b
portfolio = 1000000 # Money 'available'
securityMargin = 0.00083 # EMPIRICAL: needs to be adjusted when taking into account the transaction fees (should rise, see the article p.8)
# Wait 10mn after the opening of the market
datet = datetime.datetime.now()
datet = datetime.datetime(datet.year, datet.month, datet.day, datet.hour, datet.minute + 10)
pause.unt``````
... keep reading on reddit ➡

👍︎ 6
💬︎
📅︎ Nov 29 2020
🚨︎ report
Inkscape is love, inkscape is life.

Hey Inkscape,

I just wanted to make a dedication post to you and how much you have helped me in my studies, career, and hobbies. I would not be where I am without this unbelievable program. From technical illustrations to web dashboards for actual jet engines, this program has been one of my greatest tools.

Check out the images below!

Made this after our advanced engineering design course, it's the conjugate gradient method of finding a max/min. Works with the shadows of zigzag triangles like in the pic.

The Fourier Transform illustrated.

A swallow flew into the lab, so I read up a bit about them and found a paper detailing their flight physics. Great little gliders these things.

A Neon rat

A neon fish

I Like Koi

Website animated with inkscape vector graphics file. The fish swims with the click of your mouse by applying forces which result in SVG transforms.

Playing with Filters and generators provided

Variations.

https://preview.redd.it/wkv9fxjb1dc51.png?width=2400&format=png&auto=webp&s=828c769ea3c7b7dba295887a6616d3749bc00031

These are all backgroundy type stuff. I have thousands of illustrations I've made for school and other projects too. All you need in life is inkscape.

----------- EDIT-----------

I had a request to make a tutorial on how I made the conjugate gradient background. I made a new image in a tutorial I put on YouTube here:

... keep reading on reddit ➡

👍︎ 77
💬︎
📅︎ Jul 22 2020
🚨︎ report
[Nunerical Analysis] Need tips for spotting error in algorythm

I am implementing the conjugate gradient method to the normal equations in matlab and it worked well for general matrixes but the convergence gets completely off track when the matrix is ill conditioned. However I have some examples and data and my algorythm has a much worse behavior than expected. So I was wondering if there were even general things to look for in my code when things like this happen. If it is relevant I've got this kind of behaviour only for ill conditioned complex matrixes.

Edit: typo in the title it was meant to be Numerical

👍︎ 2
💬︎
📅︎ May 31 2020
🚨︎ report
Reference the argmin (minimizing input) for a function

I know there's a number of ways to find and reference the minimum of a function, but how can I reference the argmin? E.g.if it were of the form f(x), how can I reference the minimizing value of x ?

EDIT: I was able to figure out what was going wrong! It came down to me not using "matlabFunction(f)", so nothing would work in the first place. Thanks everyone for making me look to other issues rather than continuing to think I had misinterpreted the purpose of "fminbnd()"

(For later search purposes): Argmin Arg min Minimum argument Function minimizer Algorithm Iterative process Conjugate gradient method Truncated newton method Convex analysis

👍︎ 2
📰︎ r/matlab
💬︎
📅︎ Oct 13 2019
🚨︎ report
Why my shaders are less precisely on the same computations

I'm trying to solve linear system with Conjugate gradient method using compute shaders.Here's code Solver doing it on CPU and GPUSolver doing it using compute shaders. I'm using float variables in both cases.

I got to the point there i getting the answer and it seems to be right, but it's just a little worse than one i getting from CPU solver. After getting result i'm doing some estimate of precision (i guess, sorry rip english). I'm doing this:

``````for (size_t i = 0; i &lt; x.size(); i++)
trueDiff += (xKnown[i] - x[i]) * (xKnown[i] - x[i]);
``````

And this `trueDiff` shows that on every Linear System size GPU doing worse, usually it's something like 1E-5 vs 1E-7

This is might be because i made a algorithmical mistake in shader code, but i already checked it like 4 times and couldn't find any. So is there any particular qualities with float numbers on GPU? Or maybe there is a way to lose accuracy while reading from mapped buffer? Ask me anything if there is not enough information.

👍︎ 4
📰︎ r/opengl
💬︎
📅︎ Jan 13 2019
🚨︎ report
Looking for iterative search algorithm for local minimum, given a differentiable function and sets of measured values

This is for a research project.

My differentiable function is unfortunately not solvable for the variable I am interested in, and I do not know it's true value, either. However, I do know my physical system and its interactions well.

I want to be able to (iteratively) calculate my target value, given a continuous stream of measured values, which will be tainted by noise.

I was thinking of the gradient method, or the conjugated gradient method - how do i use them in this setting? where can i find tutorials for this?

👍︎ 14
📰︎ r/ECE
💬︎
📅︎ Jul 31 2019
🚨︎ report
L-BFGS and neural nets

I've been doing a little bit of reading on optimization (from Nocedal's book) and have some questions about the prevalence of SGD and variants such as Adam for training neural nets.

L-BFGS and other quasi-Newton methods have both theoretical and experimentally verified (PDF) faster convergence. Are there any good reasons training with L-BFGS is much less popular (or at least talked about) than SGD and variants? For the deep learning practitioners, have you ever tried using L-BFGS or other quasi-Newton or conjugate gradient methods?

In a similar vein, has anyone experimented with doing a line search for optimal step size during each gradient descent step? A little searching found nothing more recent than earlier 1990's.

edit: Thanks for all the responses. Sounds like high memory usage from L-BFGS + adequate performance from with SGD with tricks is the reason that L-BFGS isn't typically used. There was a little more focus on the 2011 Stanford paper I referenced than I intended, so I'm going to share some more recent studies on stochastic quasi-Newton optimization for anyone interested:

• http://arxiv.org/abs/1311.2115 - sped up convergence on > 1 million parameter net
• http://arxiv.org/abs/1511.01169 - trained RNN on a few different datasets. Never used more than 10 gradient memory vectors, had mixed results with performance similar to Adam when using ReLU and superior when using tanh.
• http://arxiv.org/abs/1401.7020 - impressive convergence speed and nice framework, but only applied algorithm to convex problems
👍︎ 47
💬︎
📅︎ Mar 25 2016
🚨︎ report
If you had to recommend ONE book for mathematical background?

TL;DR Somewhat experienced practitioner looking for graduate level treatment of topics in math/stats used in ML.

Hi,

I have a CS background, so sometimes I find myself lacking the math/stats knowledge to understand certain topics.

I'm looking for a book to learn things at the graduate/PhD level in these topics:

• Statistics
• Calculus
• Optimization
• Linear algebra

Is there ONE book that is along the lines of "necessary mathematical background for ML" kind of thing?

I build complex NN's in the ML research department of one of the big-4 tech companies. I give my background to clarify that I am not looking for a book to teach me what Gaussian is, how to apply Bayes rule, or how backpropagation works.

To give a few examples, my hope is to gain enough background knowledge to comfortably understand topics such as:

• Statistical learning theory
• VC dimensions and SVMs
• Proof of expressiveness of neural networks.
• Optimization procedures such as conjugate gradient methods
• Eigenvalues/vectors deeply and intuitively.

Thanks!

👍︎ 28
💬︎
📅︎ Feb 13 2017
🚨︎ report
Graduate Numerical Linear Algebra, How Best to Get Started?

I'm an Aerospace Engineering student taking Numerical Linear Algebra this semester. Though I haven't taken a sole Linear Algebra course in undergrad or at the graduate level I have gained familiarity with the topic through my coursework in Vibrations, Robotic Systems, Math Methods in Physics, and Graduate Math Methods in Engineering.

As an engineer most of my math courses have been very much based in example and while it made things easier and quicker to pick up, my formal knowledge of mathematics is lacking.

I have just about every textbook my professor recommended for the course:

• Matrix Computations. Johns Hopkins
• Accuracy and stability of numerical algorithms. Higham
• Analysis of numerical methods. Isaacson and Keller
• Introduction to numerical analysis. Stoer and Bulirsch
• Linear algebra and its applications. Strang
• Numerical linear algebra. Trefethen and Bau

From the outset I want a good resource that can help me pick up on mathematics notation to review as well as fundamental concepts in linear algebra/numerical analysis. Also if anyone can give me advice on what books to start reading now or where to start that would be much appreciated. As you can probably tell I'm nervous about the course, I really want to learn the material and get an A so I'm trying to get started as early as possible learning what I need to know.

The topics we will be covering include;

• Linear Vector Space; Schur decomposition theorem
• Gershgorin Theorem
• Min-Max Theorem (Courant-Fischer)
• Relations Between Spectral Radius and Norms
• LU Decomposition and Partial Pivoting
• Choleski and QR decomposition
• Idempotent and Projection Matrices
• Singular Value Decomposition
• Classical Linear Iterations (Jacobi, Gauss-Seidel, SOR)
• Krylov subspaces and steepest descent method
• Convergence and Conjugate Gradient method and preconditioning
• MINRES and GMRES methods
• Conditioning of eigen-problems
• Power method
• Hessenberg reduction and QR algorithms
• Krylov space methods for eigenvalue problems

P.S. I apologize if this is inappropriate to post here, I really wasn't certain.

👍︎ 5
📰︎ r/math
💬︎
📅︎ Jan 09 2017
🚨︎ report
Overview of Optimization Algorithms

I've recently learned a bit about neural networks and found that there are a couple of alternatives to standard gradient descent. Here is an overview.

edit: I can't change the title, but I was thinking about optimization algorithms for neural networks (mainly multi-layer perceptrons).

• Stochastic: Imgur
• Mini-Batch: Imgur
• Learning Rate Scheduling:
• Momentum: Imgur
• RProp and the mini-batch version RMSProp
• Exponential Decay Learning Rate
• Performance Scheduling
• Newbob Scheduling
• Quickprop

## Alternatives

• Quasi-Newton method
• Nesterov Accelerated Gradient (NAG) (What is a good paper / blog article explaining this one?)
• Adadelta (What is a good paper / blog article explaining this one?)
• Genetic algorithms
• Simulated Annealing
• Twiddle

Are there more? Are there papers which compare them or graphical explanations like this?

👍︎ 17
💬︎
📅︎ Feb 11 2016
🚨︎ report
Good learning material for stochastic gradient descent in the non-convex setting?

Do you guys know of a good overview of stochastic gradient descent and other gradient-based optimization algorithms? Most of the material I can find deals with the convex case, and since I am working on a variational problem I am in the non-convex paradigm. Is Leon Bottou's work the go-to resource? It seems to come up often as a reference.

What are some interesting variants of SGD? I know of natural gradient descent, but are there stochastic parallels to e.g. conjugate gradient methods and the like?

Finally, most arguments for why SGD is beneficial (aside from its generality) seem to hinge on speed considerations. I sort of recall hearing that SGD is more likely to converge to a global minimum, is this a proven statement?

👍︎ 14
💬︎
📅︎ Jan 06 2015
🚨︎ report
Why are deep networks difficult to train, really?

As I understand it, deep learning is basically about neural networks with multiple hidden layers. A long time (decades) ago, such networks were not used because they were hard to train, in contrast to shallow networks. Then people discovered pretraining (autoencoders and such), and got over that roadblock: the "good initialization" + "gradient descent fine-tuning" approach worked.

It is often said that the real reason for the difficulties was not the initialization (or many bad local minima), but the use of first-order methods ("pathological curvature" causes problems). This is solved by using curvature information in e.g. so-called Hessian-free methods (which, amusingly, do use the Hessian).

That does not make sense. Surely someone took L-BFGS or a Conjugate Gradient method and applied it to deep networks back in the bad old days? Wouldn't such methods have removed problems with the "pathological curvature", and people would have noticed? Why did it take decades for the discovery to happen? Or do these methods not work, after all? Is there some special sauce in Hessian-free methods that makes them work where L-BFGS or CG doesn't? If I try L-BFGS or CG with deep networks, will I actually get any problems other than costlier function evaluations compared to HF methods?

👍︎ 31
💬︎
📅︎ Nov 13 2013
🚨︎ report
Matlab homework help

Hi, I need to find the minimum of this function using matlab, the conjugate gradient method and the method of the false position, but i'm having trouble with the symbolic variables. It never ends running and when I pause it, matlab takes me to the sym.m file. Also, the profile summary says most of the time was spent on the 'mupadmex (MEX-file)' function. I appreciate any kind of input. thanks.

👍︎ 4
💬︎
📅︎ Oct 29 2016
🚨︎ report
Different output on surface and laptop

Hi there,

I wrote some code for the conjugate gradients method on my Surface. When i executed it with the prompt on my Computer it gives me a different result, although it's the same code! The other funny thing is that once in a while when i execute the programm on my Surface it gives me the same different result like on my computer. It feels Supernatural to me!

The code: # CG-Verfahren import numpy as np

``````A1 = np.empty([5,5])  	# Konstruktion von A mit m = 5
for i in range(1,6,1):
for j in range(1,6,1):
A1[i-1,j-1] = 1.0 / (i + j - 1.0)

A2 = np.empty([10,10])	# Konstruktion von A mit m = 10
for i in range(1,11,1):
for j in range(1,11,1):
A2[i-1,j-1] = 1.0 / (i + j - 1.0)

b1 = np.empty([5,1])	# Konstruktion von b mit m = 5
for i in range(1,6,1):
for j in range(1,6,1):
b1[i-1] += ((-1.0)**(j-1.0))*(1.0/(i+j-1.0))

b2 = np.empty([10,1])	# Konstruktion von b mit m = 10
for i in range(1,11,1):
for j in range(1,11,1):
b2[i-1] += ((-1.0)**(j-1.0))*(1.0/(i+j-1.0))

x1 = np.empty([5,1])	# Exakte Loesung bei m = 5
for i in range(1,6,1):
x1[i-1] = (-1.0)**(i-1.0)

x2 = np.empty([10,1])	# Exakte Loesung bei m = 10
for i in range(1,11,1):
x2[i-1] = (-1.0)**(i-1.0)

def cg(A,b,z_1,tol,x_i):		# Conjugate Gradients Method inklusive exakter Loesung x
t_n = b - np.dot(A,z_1)
r_n = b - np.dot(A,z_1)
itcount = 1
z_n = z_1
while np.dot(np.transpose(r_n),r_n)[0,0] &gt; tol:
a_n = (np.dot(np.transpose(r_n),r_n)[0,0])/(np.dot(np.transpose(t_n),(np.dot(A,t_n)))[0,0])
z_n = z_n + a_n*t_n
r_n = b - np.dot(A,z_n)
g_n = (-1) * (np.dot(np.transpose(t_n),np.dot(A,r_n))[0,0])/(np.dot(np.transpose(t_n),np.dot(A,t_n))[0,0])
t_n = r_n + g_n * t_n
itcount += 1
else:
print(z_n)
print("Fehler: ")
print(x_i - z_n)
print("Iterationen: ")
print(itcount)

print("CG-Verfahren mit m = 5 und tol = 1e-8: ")
cg(A1,b1,np.array([,,,,]),1e-8,x1)
print("CG-Verfahren mit m = 5 und tol = 1e-13: ")
cg(A1,b1,np.array([,,,,]),1e-13,x1)
print("CG-Verfahren mit m = 10 und tol = 1e-8: ")
cg(A2,b2,np.array([,,,,,,,,,]),1e-8,x2)
print("CG-Verfahren mit m = 10 und tol = 1e-13: ")
cg(A2,b2,np.array([,,,,,,,,,]),1e-13,x2)
``````

Can you tell me where the Problem is?

👍︎ 3
💬︎
📅︎ Dec 09 2014
🚨︎ report
Numerical Analysis and Programming

I've recently become quite interested in numerical methods for PDEs; I've done some work with development of some methods, but everything has so far been done in MATLAB. I really enjoy working in Matlab, but whenever I look at groups working on scientific computing and numerical methods, it seems that they're often looking for students with backgrounds in C++, which I do not have any background in. I do have some basic capabilities in python (although I haven't tried out Scipy or numpy yet); not sure if that is relevant or not.

I'm hoping for a few things: One would be a textbook recommendation for learning C++, ideally one geared towards numerical methods. I have no particular interest in C++ or general software development, so my main goal is really to learn just what I need so that I could be useful in a group that works on scientific computation. I have no clue how much that means I need to learn.

Another would be any recommendations on a numerical analysis/methods for PDE books. I'm already familiar with the basic ideas of the Finite Element Method (I used a book Numerical Treatment of PDEs by Christian Grossman I believe, and I'm getting Brenner and Scott's FEM book). I also have Saad's Iterative Methods book. For my theoretical background, I have plenty of graduate level real analysis and functional analysis, but relatively scarce amounts of numerical analysis. I know the basic theory of Galerkin methods on elliptic PDEs (variational formulation, Lax Milgram, Sobolev spaces/interpolation estimates with polynomials), and a bit about weak formulations for parabolic problems, but that's about it. I'm also familiar with some of the iterative techniques for solving linear systems, such as Gauss-Seidel, Jacobi, SOR, and the conjugate gradient and preconditioned conjugate gradient method.

Third, if there are any numerical methods/scientific computing graduate students or researchers, if you have any advice or suggestions on what sort of skills to develop, things to learn, or general advice, that would be greatly appreciated. I'm currently a graduate student, and most of my work has been theoretical, but I'm really interested in learning more about modeling, simulations, and developing numerical methods for scientific problems.

Thanks!

👍︎ 2
📰︎ r/math
💬︎
📅︎ Jun 11 2013
🚨︎ report
Stochastic gradient descent: from noisy gradients in millions of dimensions for neural network training - how to go to 2nd order methods?

Stochastic gradient descent (SGD) is a basic tool of machine learning - I thought to try to discuss it with mathematicians here - good overview of 1st order methods, overview also of 2nd order methods.

So neural network can be seen as extremely general parametrization with number of parameters often in millions. They are trained by minimization of function defined as sum of some evaluation score for each object over the entire dataset. It is usually done by gradient descent, usually in stochastic way: calculating gradients from subsets ("minibatches") of dataset to better exploit local situation.

Beside problems of standard gradient descent like stucking in a plateau, it happens in huge dimension and we use noisy gradients - need to extract their statistical trends.

A basic way is (exponential moving) average of such noisy gradients (momentum method), and adding some heuristic modifications - such 1st order methods currently dominate neural network training.

There is now ongoing fight to get successful 2nd order methods for smarter choice of step size and and simultaneously modeling multiple directions, e.g. recent K-FAC claims multiple times faster convergence. However, it is quite difficult: full Hessian is huge (like 20TB for 3M parameters), it is very tough to estimate from noisy data, we need to invert Hessian, naive Newton method attracts to saddles - there is a belief that there are exp(dim) of them ...

Any ideas for practical realization of 2nd order methods? Some basic questions:

• As full Hessian is too large, we rather need to choose a subspace where we use 2nd order model (we can simultaneously do gradient descent in the remaining) - how to choose this subspace and update this choice?

• How to effectively estimate Hessian there?

👍︎ 57
📰︎ r/math
💬︎
📅︎ Apr 18 2019
🚨︎ report
Convex Optimization

I was recommended by a recruiter to review some concepts from a Convex Optimization book for a quant researcher interview at a hedge fund. This made me question what kind of optimization problems are frequently encountered as a quant and raised several other questions that I wanted to get some opinions on.

Are optimization problems in this industry frequently convex?

How often are the predictive functions a black box, in which case, you cannot theoretically determine whether your objective function is convex since the gradient is not computable? Although, in this case, I suppose you can shotgun it, and get some kind of empirical evidence of convexity (or lack thereof).

What are the most commonly used optimization methods in this space?

I'm not an optimization expert, but I have used several gradient-based (gradient descent and its variants, Newton's, Gauss-Newton, Conjugate Gradient) and gradient-free methods (Particle swarm, Bayesian optimization). I wonder if I should pick up knowledge for some other methods that might be commonly used or learn the ones previously enumerated in more depth.

👍︎ 10
💬︎
📅︎ Jun 30 2020
🚨︎ report
Anyone ever heard of "pure gradient" method to solve linear systems?

Pretty much what the title says. I've tried searching on Google but it didn't show up. I need to use this for a numerical calculus assignment but I don't understand how it works. So any help/link/book recommendation is appreciated. Thanks!

👍︎ 2
📰︎ r/math
💬︎
📅︎ May 27 2019
🚨︎ report
AP Bio Guide (Units 8 in comments)

# 1) Chemistry of Life

### Content

• Transpiration
• Hydrogen bonds pull water up like string and leave through stoma
• Stomata: leaf pores that allow gas exchange, most are on bottom side of leaf
• Xylem: tube-shaped, nonlining, vascular system, carries water from roots to rest of plant
• Epidermis: outer layer, protects plant
• Phloem: transports food
• Parenchyma: stores food
• Transpiration: evaporation of water from leaves
• Adhesion: polar water molecules adhere to polar surfaces (sides of xylem)
• Cohesion: polar water molecules adhere to each other
• Guard cells: cells surrounding stoma, regulate transpiration through opening and closing stoma
• Turgid vs flaccid guard cells
• Turgid swell caused by potassium ions, water potential decreases, water enters vacuoles of guard cells
• Swelling of guard cells open stomata
• High light levels, high levels of water, low temperature, low CO2 causes opening of stomata
• Water potential: transport of water in plant governed by differences in water potential
• Affected by solute concentration and environmental conditions
• High water potential (high free energy and more water) travels to low water potential
• Hydrophilic = attracts water, hydrophobic = repels water
• Water and its Properties
• Polar molecule due to positive hydrogen and negative oxygen regions
• Negative oxygen of one molecule to positive hydrogen of another water molecule forms a hydrogen bond, which are weak individually but strong together
• Important physical properties of water:
• Cohesion and adhesion: cohesion creates surface tension and they both allow for transpiration
• High specific heat: enables water to absorb and lose heat slowly
• High heat of vaporization: allows much of it to remain liquid
• Nearly universal polar solvent: dissolves a lot of stuff
• Flotation of ice: insulates, transportation
• Biological Macromolecules
• Polymer: long molecule consisting of many similar building blocks linked by covalent bonds
• Monomer: building block of a polymer
• ATP - adenosine triphosphate, energy carrier that uses bonds between phosphates to store energy
• Similar in structure to a ribonucleotide
• Four Types
• Carbohydrates
• Lipids
• Proteins
• Nucleic Acids

https://preview.redd.it/xp12oli61w451.png?width=1098&format=png&auto=webp&s=cc897738989258c67bcc760ba040

... keep reading on reddit ➡

👍︎ 17
💬︎
📅︎ Jun 14 2020
🚨︎ report
Adding scalar to Plots.plot axes

I am wondering wether there's an easy method by which I could add a scalar to the axes of a one dimensional plot(x,y).

The reason is that I have a simple Gaussian Process model which I've performed with zero-mean normalization. I would like to translate back the fit by adding y .+ mean(y) and x .+mean(x) as well as for the fit function but that appears harder than simply doing something like

x.axisvalues = x.axisvalues .+ mean(x) y.axisvalues = y.axisvalues .+ mean(y)

Anyone have any good ideas?

A proper way to translate a GaussianProcess.GP object would do fine as well.

Code attached below:

``````using CSV, DataFrames
using GaussianProcesses
using Statistics

y = a.y.- mean(a.y)
x = a.x .- mean(a.x)

mZero = MeanZero()
kern = SE(0.0,0.0)

logObsNoise=-0.1
gp=GP(x, y, mZero, kern, logObsNoise)

using Optim
optimize!(gp, Optim.ConjugateGradient())   # Optimise the
hyperparameters optimize!(gp,

using Plots
plot(gp; legend=false, fmt=:png)
png("juliagp.png")``````
👍︎ 9
📰︎ r/Julia
💬︎
📅︎ May 03 2020
🚨︎ report
Why do we have so many Matrix Invertibility formulas?

We have:

Newton's method

Cayley–Hamilton method

Eigendecomposition

Cholesky decomposition

Analytic solution

Inversion of 2 × 2 matrices

Inversion of 3 × 3 matrices

Inversion of 4 × 4 matrices

Blockwise inversion

By Neumann series

formulas/methods to invert a matrix; why do we have so many? How did we get to so many?

👍︎ 34
📰︎ r/math
💬︎
📅︎ May 10 2018
🚨︎ report
Can we try to reduce overfitting by analyzing for "outliers" in gradients?

This is just a hypothetical thought that came across my mind and I was wondering if it has any potential.

TLDR 2: What if instead of performing gradient descent the traditional way, we calculate two gradients g_A and g_B from two separate partitions of the training data and backpropagate g_A⋅g_B / (norm(g_A) norm(g_B)) * g_A instead of using just the gradients from one batch.

# Background

Typically, in most deep learning methods, we compute the gradient of some cost function `J` over all samples in a batch, average those gradients, and update our parameters θ by subtracting these gradients (scaled by a learning rate). Obviously, there are many different optimizers that do this update step differently, but irregardless, they're all some form of gradient descent.

# Motivation

When we perform gradient descent, if I understand correctly, we're trying to maximize some form of the log likelihood of the training data (i.e. maximize how well our model parameters explain the dataset). However, we can sometimes overfit the training data if our gradient descent brings us to a solution that's too well fit for the training data, and thus is no longer the optimal solution for "real world" data (like the test set).

# Solution...?

What if during our training process, we account for the fact that our training set doesn't completely represent the "real world" (test) data; that although we can have very similar distributions between the training data and test data, they're not identical.

More specifically, what I'm proposing is that we could split our training dataset into two partitions. Let's call it training set A and training set B. At each step, we compute the gradients of a batch from A (let's call this g_A) and gradients of a batch from B (g_B). Next, instead of just backpropagating these gradients, we analyze the gradients between g_A and g_B and discard any gradients that seem "out of place". One proposal that I was thinking of is what if we backpropagate g_A⋅g_B / (norm(g_A) norm(g_B)) * g_A, where g_A⋅g_B is the inner product of g_A and g_B. Therefore, this is kind of like backpropagating g_A (like normal gradient descent), but scaled by how much this "conjugate gradient" g_B agrees with g_A (which is the g_A⋅g_B / (norm(g_A) norm(g_B)) term).

Using this method, let's say we're near the end of training. If we only had one training set, we would just keep getting our model closer and closer t

... keep reading on reddit ➡

👍︎ 2
💬︎
📅︎ Feb 12 2020
🚨︎ report
[Graduate Numerical Linear Algebra] How Best to Get Started?

I'm an Aerospace Engineering student taking Numerical Linear Algebra this semester. Though I haven't taken a sole Linear Algebra course in undergrad or at the graduate level I have gained familiarity with the topic through my coursework in Vibrations, Robotic Systems, Math Methods in Physics, and Graduate Math Methods in Engineering.

As an engineer most of my math courses have been very much based in example and while it made things easier and quicker to pick up, my formal knowledge of mathematics is lacking.

I have just about every textbook my professor recommended for the course:

• Matrix Computations. Johns Hopkins
• Accuracy and stability of numerical algorithms. Higham
• Analysis of numerical methods. Isaacson and Keller
• Introduction to numerical analysis. Stoer and Bulirsch
• Linear algebra and its applications. Strang
• Numerical linear algebra. Trefethen and Bau

From the outset I want a good resource that can help me pick up on mathematics notation to review as well as fundamental concepts in linear algebra/numerical analysis. Also if anyone can give me advice on what books to start reading now or where to start that would be much appreciated. As you can probably tell I'm nervous about the course, I really want to learn the material and get an A so I'm trying to get started as early as possible learning what I need to know.

The topics we will be covering include;

• Linear Vector Space; Schur decomposition theorem
• Gershgorin Theorem
• Min-Max Theorem (Courant-Fischer)
• Relations Between Spectral Radius and Norms
• LU Decomposition and Partial Pivoting
• Choleski and QR decomposition
• Idempotent and Projection Matrices
• Singular Value Decomposition
• Classical Linear Iterations (Jacobi, Gauss-Seidel, SOR)
• Krylov subspaces and steepest descent method
• Convergence and Conjugate Gradient method and preconditioning
• MINRES and GMRES methods
• Conditioning of eigen-problems
• Power method
• Hessenberg reduction and QR algorithms
• Krylov space methods for eigenvalue problems
👍︎ 3
💬︎
📅︎ Jan 09 2017
🚨︎ report

Please note that this site uses cookies to personalise content and adverts, to provide social media features, and to analyse web traffic. Click here for more information.