Jump to content

Linear Regression

Featured Replies

Hello

Looking for someone to point me in the right direction so I can start researching  a solution to the problem below: 

Problem

Linear regression with high dimensionality - p=29, n = 5000ish, input variables are generally quite highly correlated

When using the model for prediction, data sets regularly have a missing input parameter(s). At the moment I just refit a LSQ solution from the training data with that input deleted. This seems to lead to quite unstable results. Stability is important for my application, more so than absolute accuracy in some senses.  

--
Regularisation (e.g. Ridge) feels like it should help, but (and I'm not formally train in stats) as I understand that will reduce the variance of the model,  with all input variables - and doesn't necessarily achieve anything for model stability where input parameters are deleted.

Thanks in advance. 

Have you looked at Partial Least Squares? It deals nicely with highly correlated variables by projecting them unto a subspace before fitting the model.

  • Author

Thanks - I have very briefly, will have a bit more of a dig into it. 

Archived

This topic is now archived and is closed to further replies.

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.