The brute force method of finding the nearest of N
points to a given point is O(N)
— you’d have to check each point.
In contrast, if the N
points are stored in a KD-tree, then finding the nearest point is on average O(log(N))
.
There is also the additional one-time cost of building the KD-tree, which requires O(N)
time.
If you need to repeat this process N
times, then the brute force method is O(N**2)
and the kd-tree method is O(N*log(N))
.
Thus, for large enough N
, the KD-tree will beat the brute force method.
See here for more on nearest neighbor algorithms (including KD-tree).
Below (in the function using_kdtree
) is a way to compute the great circle arclengths of nearest neighbors using scipy.spatial.kdtree
.
scipy.spatial.kdtree
uses the Euclidean distance between points, but there is a formula for converting Euclidean chord distances between points on a sphere to great circle arclength (given the radius of the sphere).
So the idea is to convert the latitude/longitude data into cartesian coordinates, use a KDTree
to find the nearest neighbors, and then apply the great circle distance formula to obtain the desired result.
Here are some benchmarks. Using N = 100
, using_kdtree
is 39x faster than the orig
(brute force) method.
In [180]: %timeit using_kdtree(data)
100 loops, best of 3: 18.6 ms per loop
In [181]: %timeit using_sklearn(data)
1 loop, best of 3: 214 ms per loop
In [179]: %timeit orig(data)
1 loop, best of 3: 728 ms per loop
For N = 10000
:
In [5]: %timeit using_kdtree(data)
1 loop, best of 3: 2.78 s per loop
In [6]: %timeit using_sklearn(data)
1 loop, best of 3: 1min 15s per loop
In [7]: %timeit orig(data)
# untested; too slow
Since using_kdtree
is O(N log(N))
and orig
is O(N**2)
, the factor by
which using_kdtree
is faster than orig
will grow as N
, the length of
data
, grows.
import numpy as np
import scipy.spatial as spatial
import pandas as pd
import sklearn.neighbors as neighbors
from math import radians, cos, sin, asin, sqrt
R = 6367
def using_kdtree(data):
"Based on https://stackoverflow.com/q/43020919/190597"
def dist_to_arclength(chord_length):
"""
https://en.wikipedia.org/wiki/Great-circle_distance
Convert Euclidean chord length to great circle arc length
"""
central_angle = 2*np.arcsin(chord_length/(2.0*R))
arclength = R*central_angle
return arclength
phi = np.deg2rad(data['Latitude'])
theta = np.deg2rad(data['Longitude'])
data['x'] = R * np.cos(phi) * np.cos(theta)
data['y'] = R * np.cos(phi) * np.sin(theta)
data['z'] = R * np.sin(phi)
tree = spatial.KDTree(data[['x', 'y','z']])
distance, index = tree.query(data[['x', 'y','z']], k=2)
return dist_to_arclength(distance[:, 1])
def orig(data):
def distance(lon1, lat1, lon2, lat2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
c = 2 * asin(sqrt(a))
km = R * c
return km
shortest_distance = []
for i in range(len(data)):
distance1 = []
for j in range(len(data)):
if i == j: continue
distance1.append(distance(data['Longitude'][i], data['Latitude'][i],
data['Longitude'][j], data['Latitude'][j]))
shortest_distance.append(min(distance1))
return shortest_distance
def using_sklearn(data):
"""
Based on https://stackoverflow.com/a/45127250/190597 (Jonas Adler)
"""
def distance(p1, p2):
"""
Calculate the great circle distance between two points
on the earth (specified in decimal degrees)
"""
lon1, lat1 = p1
lon2, lat2 = p2
# convert decimal degrees to radians
lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
# haversine formula
dlon = lon2 - lon1
dlat = lat2 - lat1
a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
c = 2 * np.arcsin(np.sqrt(a))
km = R * c
return km
points = data[['Longitude', 'Latitude']]
nbrs = neighbors.NearestNeighbors(n_neighbors=2, metric=distance).fit(points)
distances, indices = nbrs.kneighbors(points)
result = distances[:, 1]
return result
np.random.seed(2017)
N = 1000
data = pd.DataFrame({'Latitude':np.random.uniform(-90,90,size=N),
'Longitude':np.random.uniform(0,360,size=N)})
expected = orig(data)
for func in [using_kdtree, using_sklearn]:
result = func(data)
assert np.allclose(expected, result)
Условие наверняка как-то попроще можно записать, но сходу не придумал:
def near_points(x, y, arr):
return [(a,b) for a,b in arr if abs(a-x) <= 1 and abs(b-y) <= 1 and (x != a or y != b)]
points = [(x,y) for x in range(1,11) for y in range(1,11)]
print(near_points(5, 6, points))
print(near_points(10, 1, points))
print(near_points(5, 10, points))
Вывод:
[(4, 5), (4, 6), (4, 7), (5, 5), (5, 7), (6, 5), (6, 6), (6, 7)]
[(9, 1), (9, 2), (10, 2)]
[(4, 9), (4, 10), (5, 9), (6, 9), (6, 10)]
P.S. Упростить выражение можно так, но будет ли так быстрее и понятнее — не уверен:
if 0 < (a-x) ** 2 + (b-y) ** 2 <= 2
А с функцией min
тут не получится, потому что она выдаёт только одно значение. Можно было бы поискать минимальное расстояние через неё, но вы его и так знаете — это 1 по одной из координат (и 1 или 0 по другой).
Приветствую
Имеется словарь с (название_места:GPS-координаты):
places = {
'area-1':'55.753003, 37.619778',
'area-2':'55.811848, 37.604804',
'area-3':'55.745428, 37.742819',
'area-4':'55.692442, 37.621970',
'area-5':'55.753157, 37.493567',
'area-6':'55.750348, 37.897092',
'area-7':'55.752298, 37.980246',
'area-8':'55.750348, 38.098048',
'area-9':'55.750348, 38.271286',
'area-10':'55.748398, 38.427201',
'area-11':'55.746448, 38.605636',
'area-12':'55.746448, 38.766748'
}
Как из произвольной координаты (названия места) выбрать ближайшие четыре по коодинатам?
All your code could be rewritten as:
from numpy import random
from scipy.spatial import distance
def closest_node(node, nodes):
closest_index = distance.cdist([node], nodes).argmin()
return nodes[closest_index]
a = random.randint(1000, size=(50000, 2))
some_pt = (1, 2)
closest_node(some_pt, a)
You can just write randint(1000)
instead of randint(0, 1000)
, the documentation of randint
says:
If
high
isNone
(the default), then results are from[0, low)
.
You can use the size
argument to randint
instead of the loop and two function calls. So:
a = []
for x in range(50000):
a.append((np.random.randint(0,1000),np.random.randint(0,1000)))
Becomes:
a = np.random.randint(1000, size=(50000, 2))
It’s also much faster (twenty times faster in my tests).
More importantly, scipy
has the scipy.spatial.distance
module that contains the cdist
function:
cdist(XA, XB, metric='euclidean', p=2, V=None, VI=None, w=None)
Computes distance between each pair of the two collections of inputs.
So calculating the distance
in a loop is no longer needed.
You use the for loop also to find the position of the minimum, but this can be done with the argmin
method of the ndarray
object.
Therefore, your closest_node
function can be defined simply as:
from scipy.spatial.distance import cdist
def closest_node(node, nodes):
return nodes[cdist([node], nodes).argmin()]
I’ve compared the execution times of all the closest_node
functions defined in this question:
Original:
1 loop, best of 3: 1.01 sec per loop
Jaime v1:
100 loops, best of 3: 3.32 msec per loop
Jaime v2:
1000 loops, best of 3: 1.62 msec per loop
Mine:
100 loops, best of 3: 2.07 msec per loop
All vectorized functions perform hundreds of times faster than the original solution.
cdist
is outperformed only by the second function by Jaime, but only slightly.
Certainly cdist
is the simplest.
Предположим, у меня есть список координат x, y, как показано ниже:
A = [(26, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]
И у меня есть координаты x, y точки, как показано ниже:
leftbottom = (0, 238)
Теперь я хочу найти ближайшую к точке leftbottom
точку в списке A
.
Как я могу сделать это наиболее эффективно?
5 ответов
Лучший ответ
У Numpy есть полезная функция: norm.
import numpy as np
A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]
A = np.array(A)
leftbottom = np.array((0,238))
distances = np.linalg.norm(A-leftbottom, axis=1)
min_index = np.argmin(distances)
print(f"the closest point is {A[min_index]}, at a distance of {distances[min_index]}")
Результат:
the closest point is [20 63], at a distance of 176.13914953808538
0
Ben2209
17 Фев 2021 в 09:16
Вы можете получить ближайшую координату просто в python.
Предположим, что левый нижний имеет тот же формат с A.
leftbottom = [(x, y)]
import numpy as np
diffs = np.abs(np.array(A)-np.array(leftbottom))
dists = np.sum(dists,axis=1) #l1-distance
closest_point_index = np.argmin(dists)
0
Algopark
17 Фев 2021 в 09:25
Вы можете использовать numpy
:
import numpy as np
A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]
p = (0, 238)
xy = np.array(A).T
# euclidean distance
d = ( (xy[0] - p[0]) ** 2 + (xy[1] - p[1]) ** 2) ** 0.5
closest_idx = np.argmin(d)
closest = A[closest_idx]
print(closest)
(20, 63)
0
dzang
17 Фев 2021 в 09:14
Вот встроенное решение, использующее min()
над списком точек, где аргумент key
представляет собой расстояние от каждой точки до точки target
, вычисленное с помощью math.hypot
:
import math
points = [(26, 80), (23, 24), (22, 63), (2, 63)]
target = (1, 63)
print(min(points, key=lambda point: math.hypot(target[1]-point[1], target[0]-point[0])))
В этом примере будет напечатан (2, 63)
.
0
Tomerikoo
17 Фев 2021 в 10:05
Если вы ищете решение без использования numpy, возможно, это поможет вам
from math import sqrt
def min_distance(x, y, iterable):
list_of_distances = list(map(lambda t: sqrt(pow(t[0]-x,2)+pow(t[1]-y,2)),iterable))
min_res = min(list_of_distances)
index_of_min = list_of_distances.index(min_res)
return iterable[index_of_min]
A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63),(21, 63), (20, 63), (22, 62), (27, 63)]
a = min_distance(0, 238, A)
print(a)
0
morningcoffe
17 Фев 2021 в 09:48