Обучаю RandomRorest
—
Вот код:
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.8,
random_state=241)
RFC = RandomForestClassifier(n_estimators=37, random_state=241)
RFC.fit(X_train, y_train)
scor_test = []
for predict in RFC.predict_proba(X_test):
x_scor = log_loss(y_test, predict)
scor_test.apend(x_scor)
После выполнения последнего блока, ошибка:
ValueError Traceback (most recent call last)
<ipython-input-152-01347a72f1da> in <module>
1 scor_test = []
2 for predict in RFC.predict_proba(X_test):
----> 3 x_scor = log_loss(y_test, predict)
4 scor_test.apend(x_scor)
~Anaconda3libsite-packagessklearnmetricsclassification.py in log_loss(y_true, y_pred, eps, normalize, sample_weight, labels)
1762 """
1763 y_pred = check_array(y_pred, ensure_2d=False)
-> 1764 check_consistent_length(y_pred, y_true, sample_weight)
1765
1766 lb = LabelBinarizer()
~Anaconda3libsite-packagessklearnutilsvalidation.py in check_consistent_length(*arrays)
233 if len(uniques) > 1:
234 raise ValueError("Found input variables with inconsistent numbers of"
--> 235 " samples: %r" % [int(l) for l in lengths])
236
237
ValueError: Found input variables with inconsistent numbers of samples: [2, 3001]
Found input variables with inconsistent numbers of samples
Где я ошибся?
Дополнительная информация:
y_test.shape - (3001,)
RFC.predict_proba(X_test).shape - (3001, 2)
Может проблема в размерности матриц?
Fairly new to Python but building out my first RF model based on some classification data. I’ve converted all of the labels into int64 numerical data and loaded into X and Y as a numpy array, but I am hitting an error when I am trying to train the models.
Here is what my arrays look like:
>>> X = np.array([[df.tran_cityname, df.tran_signupos, df.tran_signupchannel, df.tran_vmake, df.tran_vmodel, df.tran_vyear]])
>>> Y = np.array(df['completed_trip_status'].values.tolist())
>>> X
array([[[ 1, 1, 2, 3, 1, 1, 1, 1, 1, 3, 1,
3, 1, 1, 1, 1, 2, 1, 3, 1, 3, 3,
2, 3, 3, 1, 1, 1, 1],
[ 0, 5, 5, 1, 1, 1, 2, 2, 0, 2, 2,
3, 1, 2, 5, 5, 2, 1, 2, 2, 2, 2,
2, 4, 3, 5, 1, 0, 1],
[ 2, 2, 1, 3, 3, 3, 2, 3, 3, 2, 3,
2, 3, 2, 2, 3, 2, 2, 1, 1, 2, 1,
2, 2, 1, 2, 3, 1, 1],
[ 0, 0, 0, 42, 17, 8, 42, 0, 0, 0, 22,
0, 22, 0, 0, 42, 0, 0, 0, 0, 11, 0,
0, 0, 0, 0, 28, 17, 18],
[ 0, 0, 0, 70, 291, 88, 234, 0, 0, 0, 222,
0, 222, 0, 0, 234, 0, 0, 0, 0, 89, 0,
0, 0, 0, 0, 40, 291, 131],
[ 0, 0, 0, 2016, 2016, 2006, 2014, 0, 0, 0, 2015,
0, 2015, 0, 0, 2015, 0, 0, 0, 0, 2015, 0,
0, 0, 0, 0, 2016, 2016, 2010]]])
>>> Y
array(['NO', 'NO', 'NO', 'YES', 'NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'NO',
'NO', 'YES', 'NO', 'NO', 'YES', 'NO', 'NO', 'NO', 'NO', 'NO', 'NO',
'NO', 'NO', 'NO', 'NO', 'NO', 'NO', 'NO'],
dtype='|S3')
>>> X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.3)
Traceback (most recent call last):
File "<stdin>", line 1, in <module> File "/Library/Python/2.7/site-packages/sklearn/cross_validation.py", line
2039, in train_test_split
arrays = indexable(*arrays)
File «/Library/Python/2.7/site-packages/sklearn/utils/validation.py», line
206, in indexable
check_consistent_length(*result)
File «/Library/Python/2.7/site-packages/sklearn/utils/validation.py», line
181, in check_consistent_length
» samples: %r» % [int(l) for l in lengths])ValueError: Found input variables with inconsistent numbers of samples: [1, 29]
Describe the bug
Trying to use fit_params
with CalibratedClassifierCV
in v1.1 but receives fail of fit parameters when pass to classifier.
- I have 1000 rows.
- I split it into train and validation, 800 and 200 relatively.
- The validation data part is passed to eval_set parameterr in
fit_params
and I fit with train part which is 800 size. - The train data part is using to do learning and I have cross-val in optimization with
n_splits=5
splits, i.e., I have each of 160 rows (800/5=160).
Finally, I receiveValueError: Found input variables with inconsistent numbers of samples: [640, 1]
and 640 it seems 4/5 of data, so it’s sub-train part in inner cv to evaluate on 1/5 since we have 5 folds.
What I miss here? Where I fail?
See details below.
Steps/Code to Reproduce
# Description
# This code generates pseudo-data for this test. PyTorch is needed.
# In case you use some libs to install in the environment, please run your installation to have additionally pytorch be installed just by this command below to obtain pytorch
# pip install -r requirements.txt -f https://download.pytorch.org/whl/cu111/torch_stable.html
import random import numpy as np import pandas as pd from datetime import datetime from typing import List, Dict, Any from sklearn.pipeline import Pipeline from sklearn.compose import ColumnTransformer from sklearn.impute import SimpleImputer from sklearn.preprocessing import RobustScaler, LabelBinarizer, LabelEncoder, OrdinalEncoder from sklearn.model_selection import KFold, GroupKFold, GridSearchCV, train_test_split from sklearn.calibration import CalibratedClassifierCV from pytorch_tabnet.tab_model import TabNetClassifier import gc import torch torch.cuda.empty_cache() # Generate random data: 20 features, id, label df = pd.DataFrame() size = 1000 df[f'id'] = [k for k in range(size)] for c in range(1,11): df[f'feature{c}_float'] = [random.uniform(-100,100) for k in range(size)] df[f'feature{c}_int'] = [random.randrange(0, 1000, 10) for k in range(size)]; c+=1 df[f'feature{c}_int'] = [random.randrange(-100, 100, 10) for k in range(size)]; c+=1 df[f'feature{c}_int'] = [random.randrange(2015, 2020, 1) for k in range(size)]; c+=1 df[f'feature{c}_int'] = [random.choice([-1, 1, np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_int'] = [random.choice([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_cat'] = [random.choice(["a", "b", "c", "d", "e", np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_cat'] = [random.choice(["red", "blue", "green", np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_cat'] = [random.choice(["yes", "no", "neutral", np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_cat'] = [random.choice(["animal", "human", np.nan]) for k in range(size)]; c+=1 df[f'feature{c}_cat'] = [random.choice(["male", "female", "N/A", np.nan]) for k in range(size)]; c+=1 for col in range(15,20): df[f'feature{col}_cat'] = df[f'feature{col}_cat'].astype('category') # set type for categorical features df['label'] = [random.choice([-1, 0, 1]) for k in range(size)] model_features = set(df.drop(columns=['id','label']).columns) #model_features = set(df.drop(columns=['label']).columns) def make_model_pipeline(model_class, categoricals: List[str], numericals: List[str], drops: List[str], model_parameters: Dict[str, Any]) -> Pipeline: model_preprocessing = ("preprocessing", ColumnTransformer([ ('cat', Pipeline([('imputer', SimpleImputer(strategy='constant')), ('oenc', OrdinalEncoder(handle_unknown ='use_encoded_value',unknown_value = -1)) ]), categoricals), ('num', Pipeline([("scaler", RobustScaler()), ("imputer", SimpleImputer(strategy="median")) ]), numericals), ('drop', 'drop', drops), ], remainder='drop')) calibrated_classifier = ("calibrated_classifier", CalibratedClassifierCV( base_estimator=model_class(**model_parameters), method='isotonic', cv=5)) pipeline = Pipeline([model_preprocessing, calibrated_classifier]) return pipeline x_train, y_train = df.drop(columns='label'), df['label'] # Features drop = sorted(set(x_train.columns) - set(x_train[model_features].columns)) cat = sorted(x_train[model_features].select_dtypes(include=['category']).columns) num = sorted(set(x_train[model_features].columns) - set(cat)) use_features = sorted(set(cat).union(set(num)) - set(drop)) # Folds yearly year = x_train["feature12_int"] # year year_cv = GroupKFold(n_splits=year.nunique()) # Make pipeline model_class = TabNetClassifier model_parameters = { 'n_d': 16, 'n_a': 16, 'n_steps': 5, 'n_independent': 2, 'n_shared': 2, 'clip_value': 2.0, 'gamma': 1.5, 'lambda_sparse': 0.01 } param_grid = { 'n_steps': [3,5], 'momentum': [0.3, 0.5] } opt_pipeline = make_model_pipeline(model_class, cat, num, drop, {k: v for k, v in model_parameters.items() if k not in param_grid}) opt_pipeline[1].base_estimator.set_params( optimizer_fn=torch.optim.Adam, optimizer_params=dict(lr=0.02, weight_decay = 1e-5), scheduler_params = {"gamma": 0.95, "step_size": 20}, scheduler_fn=torch.optim.lr_scheduler.StepLR, epsilon=1e-15 ) param_grid = {'calibrated_classifier__base_estimator__n_steps': [3, 5]} param_grid = {'calibrated_classifier__base_estimator__momentum': [0.3, 0.5]} # Data split print(f"nAll data: {x_train.shape} {y_train.shape}") x_train_prep, x_valid_prep, y_train_prep, y_valid_prep = train_test_split(x_train, y_train, test_size=0.20, random_state=123) # Preprocessing for eval_set le = LabelEncoder() le.fit(y_train_prep) y_train_prep, y_valid_prep = le.transform(y_train_prep), le.transform(y_valid_prep) scc = opt_pipeline.get_params()['preprocessing'].transformers[0][1].named_steps['imputer'] scc.fit(x_train_prep[cat]) oenc = opt_pipeline.get_params()['preprocessing'].transformers[0][1].named_steps['oenc'] sc = opt_pipeline.get_params()['preprocessing'].transformers[1][1].named_steps['scaler'] sc.fit(x_train_prep[num]) imp = opt_pipeline.get_params()['preprocessing'].transformers[1][1].named_steps['imputer'] imp.fit(x_train_prep[num]) def preprocessing(x_t, x_v, cat, num, sc, scc, oenc, imp): # Preprocessing manually to have train/valid split for ANN def prep(prep, data, variables): df = pd.DataFrame(prep.transform(data[variables]), columns=data[variables].columns, index=data[variables].index).values return df # For train and validation x_t[cat] = prep(scc, x_t, cat) oenc.fit(x_t[cat]) x_t[cat] = prep(oenc, x_t, cat) x_t[num] = prep(sc, x_t, num) x_t[num] = prep(imp, x_t, num) x_v[cat] = prep(scc, x_v, cat) x_v[cat] = prep(oenc, x_v, cat) x_v[num] = prep(sc, x_v, num) x_v[num] = prep(imp, x_v, num) return x_t, x_v x_train_prep, x_valid_prep = preprocessing( x_train_prep, x_valid_prep, cat, num, sc, scc, oenc, imp ) # Find best params on whole dataset model = GridSearchCV(estimator=opt_pipeline, param_grid=param_grid, cv=KFold(**inner_cv_params), scoring='balanced_accuracy', refit=False, verbose=2) fit_params = {} fit_params['calibrated_classifier__eval_set']=[(x_valid_prep[use_features].values,y_valid_prep)] fit_params['calibrated_classifier__eval_name']=['valid'] fit_params['calibrated_classifier__max_epochs']=100 fit_params['calibrated_classifier__patience']=10 fit_params['calibrated_classifier__batch_size']=32 fit_params['calibrated_classifier__virtual_batch_size']=16 fit_params['calibrated_classifier__drop_last']=False #fit_params['calibrated_classifier__weights']=np.ones([y_train_prep.size]) / y_train_prep.size model.fit(x_train_prep, y_train_prep, **fit_params) # ----> errors here. # Fit model with best params best_model_parameters = {k.split("__")[-1]: v for k, v in model.best_params_.items()} pipeline = make_model_pipeline(model_class, sorted(set(cat) - set(drop)), sorted(set(num) - set(drop)), [], best_model_parameters) pipeline.fit(x_train_prep, y_train_prep, **fit_params)
Expected Results
No error is expected, smooth learning process.
Actual Results
Traceback (most recent call last):
File "/home/kabartay/sklearn_v1.1_test.py", line 203, in <module>
model.fit(x_train_prep, y_train_prep, **fit_params)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 875, in fit
self._run_search(evaluate_candidates)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 1375, in _run_search
evaluate_candidates(ParameterGrid(self.param_grid))
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 852, in evaluate_candidates
_warn_or_raise_about_fit_failures(out, self.error_score)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 367, in _warn_or_raise_about_fit_failures
raise ValueError(all_fits_failed_message)
ValueError:
All the 10 fits failed.
It is is very likely that your model is misconfigured.
You can try to debug the error by setting error_score='raise'.
Below are more details about the failures:
--------------------------------------------------------------------------------
10 fits failed with the following error:
Traceback (most recent call last):
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
estimator.fit(X_train, y_train, **fit_params)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/pipeline.py", line 382, in fit
self._final_estimator.fit(Xt, y, **fit_params_last_step)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/calibration.py", line 283, in fit
check_consistent_length(y, sample_aligned_params)
File "/home/anaconda3/envs/sklearn11/lib/python3.9/site-packages/sklearn/utils/validation.py", line 383, in check_consistent_length
raise ValueError(
ValueError: Found input variables with inconsistent numbers of samples: [640, 1]
Issues with this
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/utils/validation.py#L365
when we check here
https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/calibration.py#L283
Versions
System: python: 3.9.12 (main, Apr 5 2022, 06:56:58) [GCC 7.5.0] executable: /home/utilisateur/anaconda3/envs/sklearn11/bin/python3 machine: Linux-5.13.0-41-generic-x86_64-with-glibc2.31 Python dependencies: sklearn: 1.1.0 pip: 21.2.4 setuptools: 49.2.0 numpy: 1.21.0 scipy: 1.8.0 Cython: None pandas: 1.1.5 matplotlib: 3.3.4 joblib: 1.0.1 threadpoolctl: 3.1.0 Built with OpenMP: True threadpoolctl info: user_api: openmp internal_api: openmp prefix: libgomp filepath: /home/utilisateur/anaconda3/envs/sklearn11/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0 version: None num_threads: 12 user_api: blas internal_api: openblas prefix: libopenblas filepath: /home/utilisateur/anaconda3/envs/sklearn11/lib/python3.9/site-packages/numpy.libs/libopenblasp-r0-5bebc122.3.13.dev.so version: 0.3.13.dev threading_layer: pthreads architecture: Haswell num_threads: 12 user_api: blas internal_api: openblas prefix: libopenblas filepath: /home/utilisateur/anaconda3/envs/sklearn11/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-8b9e111f.3.17.so version: 0.3.17 threading_layer: pthreads architecture: Haswell num_threads: 12
Answer by Zahir Ballard
What conditions are studless winter tires good in?
,Data Science Stack Exchange is a question and answer site for Data science professionals, Machine Learning specialists, and those interested in learning more about the field. It only takes a minute to sign up.,
Meta
Discuss the workings and policies of this site
,
Why solve a problem twice? Design patterns let you apply existing solutions…
You are running into that error because your X
and Y
don’t have the same length (which is what train_test_split
requires), i.e., X.shape[0] != Y.shape[0]
. Given your current code:
>>> X.shape
(1, 6, 29)
>>> Y.shape
(29,)
Answer by Gordon Harmon
I am trying to create one Machine Learning model using LinearRegression model, but I am getting the below error.,Found input variables with inconsistent numbers of samples: [30, 120] Dec 21, 2020 ,value error found with inconsistent samples Dec 26, 2020 ,66648/valueerror-found-variables-inconsistent-numbers-samples
I am trying to create one Machine Learning model using LinearRegression model, but I am getting the below error.
import pandas as pd
data = pd.read_csv('db.csv')
x = data['TV']
y = data['Sales']
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x,y)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 512, in fit
y_numeric=True, multi_output=True)
File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y
check_consistent_length(X, y)
File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [1, 1000]
Answer by Augustus Beltran
What conditions are studless winter tires good in?
,
Stack Overflow
Public questions & answers
,Thanks for contributing an answer to Stack Overflow!,
Stack Overflow for Teams
Where developers & technologists share private knowledge with coworkers
This is working for me fine. Before reshaping make sure that the arrays are numpy arrays.
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.asarray([ 1994., 1995., 1996., 1997., 1998., 1999.])
y = np.asarray([1.2, 2.3, 3.4, 4.5, 5.6, 6.7])
clf = LinearRegression()
clf.fit(X.reshape(-1,1),y)
clf.predict([1997])
#Output: array([ 4.5])
clf.predict([2001])
#Output: array([ 8.9])
Answer by Kaisley Leblanc
Powered by Discourse, best viewed with JavaScript enabled,the correct format is
x_train, x_test,y_train, y_test = train_test_split(x,y,test_size=0.2,random_state = 0),The length of x and y are exact the same. However when I use train_test_split to split it, it somehow become different. why?
I am so desperate now,x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.2,random_state = 0)
I try to do the linear regression but when I used the test_train_split to create a set of training data. It went wrong. But when I eliminate the function, only use the origin one without splitting. It went well. I don’t understand why the error comes and how to fix it. The code and the origin data set look like this:
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
x = visual_data.loc[:,('Relative Humidity AVG', 'Solar Radiation AVG', 'Temperature AVG', 'Wind Speed Daily AVG')]#loc vs iloc:We must convert the boolean Series into a numpy array. loc gets rows (or columns) with particular labels from the index. iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
y = visual_data.loc[:,('pecentage_of_success')]
print(x.shape)
print(y.shape)
x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.2,random_state = 0)
print(x_train.shape)
print(y_train.shape)
linreg = LinearRegression()
model = linreg.fit(x_train,y_train)
Result:
(464, 4)
(464,)
(371, 4)
(93, 4)
---> 13 model = linreg.fit(x_train,y_train)
ValueError: Found input variables with inconsistent numbers of samples: [371, 93]
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics
x = visual_data.loc[:,('Relative Humidity AVG', 'Solar Radiation AVG', 'Temperature AVG', 'Wind Speed Daily AVG')]#loc vs iloc:We must convert the boolean Series into a numpy array. loc gets rows (or columns) with particular labels from the index. iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
y = visual_data.loc[:,('pecentage_of_success')]
print(x.shape)
print(y.shape)
x_train, y_train, x_test, y_test = train_test_split(x,y,test_size=0.2,random_state = 0)
print(x_train.shape)
print(y_train.shape)
linreg = LinearRegression()
model = linreg.fit(x,y)
Answer by Alessandra Riley
ValueError: Found input variables with inconsistent numbers of samples: [1, 1000],
Resources
Blogs
Tutorials
Interview Questions
Sample Resumes
Webinars
Community
,Resources
Blogs
Tutorials
Interview Questions
Sample Resumes
Webinars
Community
, COBIT® is a trademark of ISACA® registered in the United States and other countries.
I am trying to create one Machine Learning model using LinearRegression model, but I am getting the below error.
import pandas as pddata = pd.read_csv('db.csv')x = data['TV']y = data['Sales']from sklearn.linear_model import LinearRegressionmodel = LinearRegression()model.fit(x,y)Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/linear_model/base.py", line 512, in fit y_numeric=True, multi_output=True) File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 531, in check_X_y check_consistent_length(X, y) File "/user/Python.framework/Versions/3.6/lib/python3.6/site-packages/sklearn/utils/validation.py", line 181, in check_consistent_length " samples: %r" % [int(l) for l in lengths])ValueError: Found input variables with inconsistent numbers of samples: [1, 1000]
Answer by Magnolia Jordan
View a Printable Version,
Mark all forums read
# Importing Libraries
import numpy as np
import pandas as pd
# Import dataset
dataset = pd.read_csv("../output.tsv", delimiter = 't')
# library to clean data
import re
# Natural Language Tool Kit
import nltk
nltk.download('stopwords')
# to remove stopword
from nltk.corpus import stopwords
# for Stemming propose
from nltk.stem.porter import PorterStemmer
# Initialize empty array
# to append clean text
corpus = []
# 1000 (reviews) rows to clean
for i in range(0, 5):
# column : "Review", row ith
review = re.sub('[^a-zA-Z]', ' ', dataset['Review'][i])
# convert all cases to lower cases
review = review.lower()
# split to array(default delimiter is " ")
review = review.split()
# creating PorterStemmer object to
# take main stem of each word
ps = PorterStemmer()
# loop for stemming each word
# in string array at ith row
review = [ps.stem(word) for word in review
if not word in set(stopwords.words('english'))]
# rejoin all string array elements
# to create back into a string
review = ' '.join(review)
# append each string to create
# array of clean text
corpus.append(review)
# Creating the Bag of Words model
from sklearn.feature_extraction.text import CountVectorizer
# To extract max 1500 feature.
# "max_features" is attribute to
# experiment with to get better results
cv = CountVectorizer(max_features = 9)
# X contains corpus (dependent variable)
X = cv.fit_transform(corpus).toarray()
# y contains answers if review
# is positive or negative
y = dataset.iloc[:, 1].values
# Splitting the dataset into
# the Training set and Test set
from sklearn.model_selection import train_test_split
dataset.dropna(inplace=True)
print(X.shape)
print(y.shape)
# experiment with "test_size"
# to get better results
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.25)
print(X_train.shape)
print(y_train.shape)
Answer by Foster Adams
ValueError: Found input variables with inconsistent numbers of samples: [143, 426], ValueError: Found input variables with inconsistent numbers of samples: [143, 426] ,How can I fix this error it throws? ValueError: Found input variables with inconsistent numbers of samples:[143, 426],Running random forest algorithm with one variable
#split the data set into independent (X) and dependent (Y) data sets
X = df.iloc[:,2:31].values
Y = df.iloc[:,1].values
#split the data qet into 75% training and 25% testing
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size = 0.25, random_state = 0)
#scale the data (feature scaling)
sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_train = sc.fit_transform(X_test)
#Using Logistic Regression Algorithm to the Training Set
classifier = LogisticRegression(random_state = 0)
classifier.fit(X_train, Y_train)
and the shape of X_train, Y_train:
X_train.shape
(143, 29)
Y_train.shape
(426,)
The following code is written in python to run the liner regression algorithm on a given set of data. Two columns were chosen, namely X1 and and Y1 were chosen on which linear regression was to be performed. The code used for the same was
On executing the above code the following error was encountered.
The above error generally comes when the X and Y have different number of samples. But in this case the error appeared even though the number of samples in X_train and Y_train were same as shown in the output.
The problem with the code is that we converted the given data into numpy arrays, but it was required to convert them to numpy matrix to be able to pass it to fit. Thus we modify the reading of data to
While doing the train test split we will need to transpose the matrix
Passing these test data to the fit function should not throw the value error encountered previously. The modified code thus would be