Как найти ближайшую точку python - Исправление недочетов и поиск решений вместе с Examum.ru

The brute force method of finding the nearest of N points to a given point is O(N) — you’d have to check each point.
In contrast, if the N points are stored in a KD-tree, then finding the nearest point is on average O(log(N)).
There is also the additional one-time cost of building the KD-tree, which requires O(N) time.

If you need to repeat this process N times, then the brute force method is O(N**2) and the kd-tree method is O(N*log(N)).
Thus, for large enough N, the KD-tree will beat the brute force method.

See here for more on nearest neighbor algorithms (including KD-tree).

Below (in the function using_kdtree) is a way to compute the great circle arclengths of nearest neighbors using scipy.spatial.kdtree.

scipy.spatial.kdtree uses the Euclidean distance between points, but there is a formula for converting Euclidean chord distances between points on a sphere to great circle arclength (given the radius of the sphere).
So the idea is to convert the latitude/longitude data into cartesian coordinates, use a KDTree to find the nearest neighbors, and then apply the great circle distance formula to obtain the desired result.

Here are some benchmarks. Using N = 100, using_kdtree is 39x faster than the orig (brute force) method.

In [180]: %timeit using_kdtree(data)
100 loops, best of 3: 18.6 ms per loop

In [181]: %timeit using_sklearn(data)
1 loop, best of 3: 214 ms per loop

In [179]: %timeit orig(data)
1 loop, best of 3: 728 ms per loop

For N = 10000:

In [5]: %timeit using_kdtree(data)
1 loop, best of 3: 2.78 s per loop

In [6]: %timeit using_sklearn(data)
1 loop, best of 3: 1min 15s per loop

In [7]: %timeit orig(data)
# untested; too slow

Since using_kdtree is O(N log(N)) and orig is O(N**2), the factor by
which using_kdtree is faster than orig will grow as N, the length of
data, grows.

import numpy as np
import scipy.spatial as spatial
import pandas as pd
import sklearn.neighbors as neighbors
from math import radians, cos, sin, asin, sqrt

R = 6367

def using_kdtree(data):
    "Based on https://stackoverflow.com/q/43020919/190597"
    def dist_to_arclength(chord_length):
        """
        https://en.wikipedia.org/wiki/Great-circle_distance
        Convert Euclidean chord length to great circle arc length
        """
        central_angle = 2*np.arcsin(chord_length/(2.0*R)) 
        arclength = R*central_angle
        return arclength

    phi = np.deg2rad(data['Latitude'])
    theta = np.deg2rad(data['Longitude'])
    data['x'] = R * np.cos(phi) * np.cos(theta)
    data['y'] = R * np.cos(phi) * np.sin(theta)
    data['z'] = R * np.sin(phi)
    tree = spatial.KDTree(data[['x', 'y','z']])
    distance, index = tree.query(data[['x', 'y','z']], k=2)
    return dist_to_arclength(distance[:, 1])

def orig(data):
    def distance(lon1, lat1, lon2, lat2):
        """
        Calculate the great circle distance between two points 
        on the earth (specified in decimal degrees)
        """
        # convert decimal degrees to radians 
        lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])
        # haversine formula 
        dlon = lon2 - lon1 
        dlat = lat2 - lat1 
        a = sin(dlat/2.0)**2 + cos(lat1) * cos(lat2) * sin(dlon/2.0)**2
        c = 2 * asin(sqrt(a)) 
        km = R * c
        return km

    shortest_distance = []
    for i in range(len(data)):
        distance1 = []
        for j in range(len(data)):
            if i == j: continue
            distance1.append(distance(data['Longitude'][i], data['Latitude'][i], 
                                      data['Longitude'][j], data['Latitude'][j]))
        shortest_distance.append(min(distance1))
    return shortest_distance


def using_sklearn(data):
    """
    Based on https://stackoverflow.com/a/45127250/190597 (Jonas Adler)
    """
    def distance(p1, p2):
        """
        Calculate the great circle distance between two points
        on the earth (specified in decimal degrees)
        """
        lon1, lat1 = p1
        lon2, lat2 = p2
        # convert decimal degrees to radians
        lon1, lat1, lon2, lat2 = map(np.radians, [lon1, lat1, lon2, lat2])
        # haversine formula
        dlon = lon2 - lon1
        dlat = lat2 - lat1
        a = np.sin(dlat/2)**2 + np.cos(lat1) * np.cos(lat2) * np.sin(dlon/2)**2
        c = 2 * np.arcsin(np.sqrt(a))
        km = R * c
        return km
    points = data[['Longitude', 'Latitude']]
    nbrs = neighbors.NearestNeighbors(n_neighbors=2, metric=distance).fit(points)
    distances, indices = nbrs.kneighbors(points)
    result = distances[:, 1]
    return result

np.random.seed(2017)
N = 1000
data = pd.DataFrame({'Latitude':np.random.uniform(-90,90,size=N), 
                     'Longitude':np.random.uniform(0,360,size=N)})

expected = orig(data)
for func in [using_kdtree, using_sklearn]:
    result = func(data)
    assert np.allclose(expected, result)

Источник

Условие наверняка как-то попроще можно записать, но сходу не придумал:

def near_points(x, y, arr):
    return [(a,b) for a,b in arr if abs(a-x) <= 1 and abs(b-y) <= 1 and (x != a or y != b)]

points = [(x,y) for x in range(1,11) for y in range(1,11)]

print(near_points(5, 6, points))
print(near_points(10, 1, points))
print(near_points(5, 10, points))

Вывод:

[(4, 5), (4, 6), (4, 7), (5, 5), (5, 7), (6, 5), (6, 6), (6, 7)]
[(9, 1), (9, 2), (10, 2)]
[(4, 9), (4, 10), (5, 9), (6, 9), (6, 10)]

P.S. Упростить выражение можно так, но будет ли так быстрее и понятнее — не уверен:

if 0 < (a-x) ** 2 + (b-y) ** 2 <= 2

А с функцией min тут не получится, потому что она выдаёт только одно значение. Можно было бы поискать минимальное расстояние через неё, но вы его и так знаете — это 1 по одной из координат (и 1 или 0 по другой).

Источник

Приветствую

Имеется словарь с (название_места:GPS-координаты):

places = {
	'area-1':'55.753003, 37.619778',
	'area-2':'55.811848, 37.604804',
	'area-3':'55.745428, 37.742819',
	'area-4':'55.692442, 37.621970',
	'area-5':'55.753157, 37.493567',
	'area-6':'55.750348, 37.897092',
	'area-7':'55.752298, 37.980246',
	'area-8':'55.750348, 38.098048',
	'area-9':'55.750348, 38.271286',
	'area-10':'55.748398, 38.427201',
	'area-11':'55.746448, 38.605636',
	'area-12':'55.746448, 38.766748'
}

Как из произвольной координаты (названия места) выбрать ближайшие четыре по коодинатам?

Источник

All your code could be rewritten as:

from numpy import random
from scipy.spatial import distance

def closest_node(node, nodes):
    closest_index = distance.cdist([node], nodes).argmin()
    return nodes[closest_index]

a = random.randint(1000, size=(50000, 2))

some_pt = (1, 2)

closest_node(some_pt, a)

You can just write randint(1000) instead of randint(0, 1000), the documentation of randint says:

If high is None (the default), then results are from [0, low).

You can use the size argument to randint instead of the loop and two function calls. So:

a = []
for x in range(50000):
    a.append((np.random.randint(0,1000),np.random.randint(0,1000)))

Becomes:

a = np.random.randint(1000, size=(50000, 2))

It’s also much faster (twenty times faster in my tests).

More importantly, scipy has the scipy.spatial.distance module that contains the cdist function:

cdist(XA, XB, metric='euclidean', p=2, V=None, VI=None, w=None)

Computes distance between each pair of the two collections of inputs.

So calculating the distance in a loop is no longer needed.

You use the for loop also to find the position of the minimum, but this can be done with the argmin method of the ndarray object.

Therefore, your closest_node function can be defined simply as:

from scipy.spatial.distance import cdist

def closest_node(node, nodes):
    return nodes[cdist([node], nodes).argmin()]

I’ve compared the execution times of all the closest_node functions defined in this question:

Original:
1 loop, best of 3: 1.01 sec per loop

Jaime v1:
100 loops, best of 3: 3.32 msec per loop

Jaime v2:
1000 loops, best of 3: 1.62 msec per loop

Mine:
100 loops, best of 3: 2.07 msec per loop

All vectorized functions perform hundreds of times faster than the original solution.

cdist is outperformed only by the second function by Jaime, but only slightly.
Certainly cdist is the simplest.

Источник

Предположим, у меня есть список координат x, y, как показано ниже:

A = [(26, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]

И у меня есть координаты x, y точки, как показано ниже:

leftbottom = (0, 238)

Теперь я хочу найти ближайшую к точке leftbottom точку в списке A.

Как я могу сделать это наиболее эффективно?

5 ответов

Лучший ответ

У Numpy есть полезная функция: norm.

import numpy as np
A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]
A = np.array(A)
leftbottom = np.array((0,238))
distances = np.linalg.norm(A-leftbottom, axis=1)
min_index = np.argmin(distances)
print(f"the closest point is {A[min_index]}, at a distance of {distances[min_index]}")

Результат:

the closest point is [20 63], at a distance of 176.13914953808538

Ben2209
17 Фев 2021 в 09:16

Вы можете получить ближайшую координату просто в python.

Предположим, что левый нижний имеет тот же формат с A.

leftbottom = [(x, y)]

import numpy as np
diffs = np.abs(np.array(A)-np.array(leftbottom))
dists = np.sum(dists,axis=1) #l1-distance
closest_point_index = np.argmin(dists)

Algopark
17 Фев 2021 в 09:25

Вы можете использовать numpy:

import numpy as np

A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63), (21, 63), (20, 63), (22, 62), (27, 63)]
p = (0, 238)

xy = np.array(A).T

# euclidean distance
d = ( (xy[0] - p[0]) ** 2 + (xy[1] - p[1]) ** 2) ** 0.5

closest_idx = np.argmin(d)
closest = A[closest_idx]

print(closest)

(20, 63)

dzang
17 Фев 2021 в 09:14

Вот встроенное решение, использующее min() над списком точек, где аргумент key представляет собой расстояние от каждой точки до точки target, вычисленное с помощью math.hypot:

import math

points = [(26, 80), (23, 24), (22, 63), (2, 63)] 
target = (1, 63)

print(min(points, key=lambda point: math.hypot(target[1]-point[1], target[0]-point[0])))

В этом примере будет напечатан (2, 63).

Tomerikoo
17 Фев 2021 в 10:05

Если вы ищете решение без использования numpy, возможно, это поможет вам

  from math import sqrt
  def min_distance(x, y, iterable):
       list_of_distances = list(map(lambda t: sqrt(pow(t[0]-x,2)+pow(t[1]-y,2)),iterable))
       min_res = min(list_of_distances)
       index_of_min = list_of_distances.index(min_res)
       return iterable[index_of_min]
   
   A = [(26, 63), (25, 63), (24, 63), (23, 63), (22, 63),(21, 63), (20, 63), (22, 62), (27, 63)]
   
   
  a = min_distance(0, 238, A)
  print(a)

morningcoffe
17 Фев 2021 в 09:48

Источник